Re: [Standards] Exact hint for Result Set Management
On 12 July 2018 at 12:24, Florian Schmaus wrote: > On 12.07.2018 12:39, Kevin Smith wrote: > > On 12 Jul 2018, at 11:23, Matthew Wild wrote: > >> > >> On 11 July 2018 at 18:25, Florian Schmaus wrote: > >>> On 11.07.2018 18:01, Matthew Wild wrote: > On 11 July 2018 at 16:33, Florian Schmaus wrote: > > I recently submitted PR #672 to the xeps repo > > > > https://github.com/xsf/xeps/pull/672 > > > > to make users of RSM, like MAM, aware whether the result is exact or > > not. It received some scepticism from the council members in today's > > council meeting. I am to blame here as I thought the abstract > motivation > > in the commit message was enough. It appears it wasn't. > > > > While I think multiple applications could exploit that information, > my > > particular motivation was MAM. Consider the scenario where you have a > > master archive and a local archive. The local archive may have > multiple > > holes at unknown locations. Now you want to sync your local archive > from > > the master using MAM/RSM. > > I'm not keen on this solution for the premise you've given. > > I don't believe that when using MAM correctly you would ever end up > with "holes at unknown locations" in your local archive. I don't think > that encouraging people to use a "bisection algorithm" is the right > thing to do. > >>> > >>> So you don't want MAM users to be able to efficiently sync archives > with > >>> multiple holes by a simple change because you do not want MAM to be > used > >>> in scenarios where this could happen? > >> > >> Just adding this flag will not make servers implement it, so it's > >> going to add code and still need a fallback. > > > > And, as specified (optional but with no default or meaning for a missing > flag) it seems unhelpful > > As Georg mentioned yesterday, the default is exact="maybe". There is > also a sentence explaining the semantic of the missing hint: > > https://github.com/xsf/xeps/pull/672/files#diff- > fd691aeb84210578723b940e9881ab7eR200 > > > and as it adds a SHOULD, in a Draft XEP, with no namespace bump or > > discovery, it’s adding ambiguity and confusion.. > > I don't think that this is true, but we certainly can talk about making > it just a recommendation if it is a blocker. > > There's no difference. >From RFC 2119: 3. SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course. > - Florian > > > ___ > Standards mailing list > Info: https://mail.jabber.org/mailman/listinfo/standards > Unsubscribe: standards-unsubscr...@xmpp.org > ___ > > ___ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___
[Standards] Exact hint for Result Set Management
On 12.07.2018 15:20, Matthew Wild wrote: > On 12 July 2018 at 13:13, Florian Schmaus wrote: >> I'm well aware why RSM allows inexact results. That is not what this PR >> is about. >> >> I simply think it is an oversight that RSM does not signal if the >> results are exact or not. > > Then I think you didn't quite consider my points? I hope I did, sorry if not. It appears you consider two concepts the same which I don't consider to be equal: - Returning the exact result at the point in time, versus returning just an approximation. - The possibility that the data changes in the future. > The count can change at any time. So it may have been exact at the > time of the query, but even by the time the client receives the count > across the wire that count could be incorrect. What use is it for the > client to know that the count was "exact" at some unknown point in > recent history? Possibly not much for those cases. But there are cases where it comes in useful for the client to know that the numbers are exact: For example when syncing data. C: How many entries are between X and Y? S: Exactly 500. C: Great that is my view of the data too and I know that the entries between X and Y can not change, hence I know that my data is in sync. vs. C: How many entries are between X and Y? S: Approximate 500. Now C sees that the information is useless even if it also has 500 entries between X and Y, because the count given by S is (possibly) approximate. So C goes home dealing with his grief (probably with the help of a bottle scotch). - Florian signature.asc Description: OpenPGP digital signature ___ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___
Re: [Standards] Exact hint for Result Set Management
On 12 July 2018 at 13:13, Florian Schmaus wrote: > I'm well aware why RSM allows inexact results. That is not what this PR > is about. > > I simply think it is an oversight that RSM does not signal if the > results are exact or not. Then I think you didn't quite consider my points? The count can change at any time. So it may have been exact at the time of the query, but even by the time the client receives the count across the wire that count could be incorrect. What use is it for the client to know that the count was "exact" at some unknown point in recent history? Regards, Matthew ___ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___
Re: [Standards] Exact hint for Result Set Management
On 12.07.2018 12:23, Matthew Wild wrote: > On 11 July 2018 at 18:25, Florian Schmaus wrote: >> On 11.07.2018 18:01, Matthew Wild wrote: >>> On 11 July 2018 at 16:33, Florian Schmaus wrote: >> So you don't want MAM users to be able to efficiently sync archives with >> multiple holes by a simple change because you do not want MAM to be used >> in scenarios where this could happen? > > Just adding this flag will not make servers implement it, so it's > going to add code and still need a fallback. True, but at least we have a specified way to signal that the returned numbers are exact then. > I believe that ambiguity in how to implement protocols properly is a > big problem we have. XEPs tend to describe only the wire protocol, and > if the author of the XEP even had a data model in mind, it's rarely > (if ever) clear. For example it took me a long time to realise that > pubsub's model is an (optionally capped) ordered key->value store, and > that's made clear precisely nowhere in XEP-0060. >> From a generic, non MAM-specific point-of-view, RSM is eventually used >> to sync data, and for that you often want to now if the RSM metadata is >> exact or not. My MAM example is just one illustration of that. It always >> appeared like an afterthought that RSM does not allow the RSM data >> originator to signal if the numbers are exact or not. The proposed >> change tries to fix that. > > I think the intention of RSM's vague numbers is that they were used > for things like UI (progress bars, etc.) hints only. I'm well aware why RSM allows inexact results. That is not what this PR is about. I simply think it is an oversight that RSM does not signal if the results are exact or not. That is all I want to address. My suggestion in current PR does that in a backwards compatible way. If there would be a RSM namespace bump imminent, then we could do it right and make the hint mandatory. But I don't think that there is one pending, and I believe the exact hint does not justify a namespace bump. - Florian signature.asc Description: OpenPGP digital signature ___ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___
Re: [Standards] Exact hint for Result Set Management
чт, 12 июл. 2018 г. в 15:26, Matthew Wild : > What you are describing - a local archive with multiple holes that the > implementation is unaware of - is not a state that I see any such > optimal correctly implemented MAM usage getting into. Actually, it's a typical state if a server has server-side search support. You might search for 'address', receive multiple results, go to three presented entries, fetch nearby history from an archive, end with three patches of data and holes in between them. Fighting such holes is not that easy (though somewhat possible) with a current MAM. Definitely could use a better solution. -- Ненахов Андрей Директор ООО "Редсолюшн" (Челябинск) (351) 750-50-04 http://www.redsolution.ru ___ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___
Re: [Standards] Exact hint for Result Set Management
On 12.07.2018 12:39, Kevin Smith wrote: > On 12 Jul 2018, at 11:23, Matthew Wild wrote: >> >> On 11 July 2018 at 18:25, Florian Schmaus wrote: >>> On 11.07.2018 18:01, Matthew Wild wrote: On 11 July 2018 at 16:33, Florian Schmaus wrote: > I recently submitted PR #672 to the xeps repo > > https://github.com/xsf/xeps/pull/672 > > to make users of RSM, like MAM, aware whether the result is exact or > not. It received some scepticism from the council members in today's > council meeting. I am to blame here as I thought the abstract motivation > in the commit message was enough. It appears it wasn't. > > While I think multiple applications could exploit that information, my > particular motivation was MAM. Consider the scenario where you have a > master archive and a local archive. The local archive may have multiple > holes at unknown locations. Now you want to sync your local archive from > the master using MAM/RSM. I'm not keen on this solution for the premise you've given. I don't believe that when using MAM correctly you would ever end up with "holes at unknown locations" in your local archive. I don't think that encouraging people to use a "bisection algorithm" is the right thing to do. >>> >>> So you don't want MAM users to be able to efficiently sync archives with >>> multiple holes by a simple change because you do not want MAM to be used >>> in scenarios where this could happen? >> >> Just adding this flag will not make servers implement it, so it's >> going to add code and still need a fallback. > > And, as specified (optional but with no default or meaning for a missing > flag) it seems unhelpful As Georg mentioned yesterday, the default is exact="maybe". There is also a sentence explaining the semantic of the missing hint: https://github.com/xsf/xeps/pull/672/files#diff-fd691aeb84210578723b940e9881ab7eR200 > and as it adds a SHOULD, in a Draft XEP, with no namespace bump or > discovery, it’s adding ambiguity and confusion.. I don't think that this is true, but we certainly can talk about making it just a recommendation if it is a blocker. - Florian signature.asc Description: OpenPGP digital signature ___ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___
Re: [Standards] Exact hint for Result Set Management
On 12 Jul 2018, at 11:23, Matthew Wild wrote: > > On 11 July 2018 at 18:25, Florian Schmaus wrote: >> On 11.07.2018 18:01, Matthew Wild wrote: >>> On 11 July 2018 at 16:33, Florian Schmaus wrote: I recently submitted PR #672 to the xeps repo https://github.com/xsf/xeps/pull/672 to make users of RSM, like MAM, aware whether the result is exact or not. It received some scepticism from the council members in today's council meeting. I am to blame here as I thought the abstract motivation in the commit message was enough. It appears it wasn't. While I think multiple applications could exploit that information, my particular motivation was MAM. Consider the scenario where you have a master archive and a local archive. The local archive may have multiple holes at unknown locations. Now you want to sync your local archive from the master using MAM/RSM. >>> >>> I'm not keen on this solution for the premise you've given. >>> >>> I don't believe that when using MAM correctly you would ever end up >>> with "holes at unknown locations" in your local archive. I don't think >>> that encouraging people to use a "bisection algorithm" is the right >>> thing to do. >> >> So you don't want MAM users to be able to efficiently sync archives with >> multiple holes by a simple change because you do not want MAM to be used >> in scenarios where this could happen? > > Just adding this flag will not make servers implement it, so it's > going to add code and still need a fallback. And, as specified (optional but with no default or meaning for a missing flag) it seems unhelpful and as it adds a SHOULD, in a Draft XEP, with no namespace bump or discovery, it’s adding ambiguity and confusion.. > What you are describing - a local archive with multiple holes that the > implementation is unaware of - is not a state that I see any such > optimal correctly implemented MAM usage getting into. OTOH, a local archive with multiple *known* holes is easy to get into and we need to ensure this case is covered - but this doesn’t need this change to RSM. > Therefore it's not a problem I want to solve, because it will only add > to confusion about the best and easiest way to implement MAM. +1. >> From a generic, non MAM-specific point-of-view, RSM is eventually used >> to sync data, and for that you often want to now if the RSM metadata is >> exact or not. My MAM example is just one illustration of that. It always >> appeared like an afterthought that RSM does not allow the RSM data >> originator to signal if the numbers are exact or not. The proposed >> change tries to fix that. > > I think the intention of RSM's vague numbers is that they were used > for things like UI (progress bars, etc.) hints only. > > One of the reasons for this is that RSM is designed to work with > dynamic result sets. For example you might request disco#items of a > MUC server, but rooms will be added/removed while paging through the > results. RSM is carefully designed so that you will never receive > duplicates, but an item that was present when you started paging will > not be included in the results if it was removed before you reached > its page. That's why is not accurate and not meant to be used > for sync purposes. > > MAM is a special case because normally[*] XEP-0313 explicitly forbids > adding or removing items in the middle of the result set. This is not > true of most other things that RSM would be used for (disco items, > pubsub, etc.), and therefore I think this flag would basically only > work for MAM. And then all my reasoning above therefore applies. Different, but yet in MAM you probably don’t want to count results accurately either, even though you could, and will probably be returning *very* approximate values here to avoid flooring the archive server (whatever form it takes). > [*] the server setting stable='false' is an exception here - an aspect > of the XEP I'm not keen on, but it was deemed necessary for some > environments. Sync is simply impossible with such a server. You can sync with such a server, but only those results that have become stable - the idea here is that if you have a clustered server doing something eventually convergentish you don’t want to refuse to answer MAM queries until the archive results are perfectly synched, so you can answer with some unstable results on the basis that that’s good enough for many use cases. Although this is getting offtopic somewhat. /K ___ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___
Re: [Standards] Exact hint for Result Set Management
On 11 July 2018 at 18:25, Florian Schmaus wrote: > On 11.07.2018 18:01, Matthew Wild wrote: >> On 11 July 2018 at 16:33, Florian Schmaus wrote: >>> I recently submitted PR #672 to the xeps repo >>> >>> https://github.com/xsf/xeps/pull/672 >>> >>> to make users of RSM, like MAM, aware whether the result is exact or >>> not. It received some scepticism from the council members in today's >>> council meeting. I am to blame here as I thought the abstract motivation >>> in the commit message was enough. It appears it wasn't. >>> >>> While I think multiple applications could exploit that information, my >>> particular motivation was MAM. Consider the scenario where you have a >>> master archive and a local archive. The local archive may have multiple >>> holes at unknown locations. Now you want to sync your local archive from >>> the master using MAM/RSM. >> >> I'm not keen on this solution for the premise you've given. >> >> I don't believe that when using MAM correctly you would ever end up >> with "holes at unknown locations" in your local archive. I don't think >> that encouraging people to use a "bisection algorithm" is the right >> thing to do. > > So you don't want MAM users to be able to efficiently sync archives with > multiple holes by a simple change because you do not want MAM to be used > in scenarios where this could happen? Just adding this flag will not make servers implement it, so it's going to add code and still need a fallback. I believe that ambiguity in how to implement protocols properly is a big problem we have. XEPs tend to describe only the wire protocol, and if the author of the XEP even had a data model in mind, it's rarely (if ever) clear. For example it took me a long time to realise that pubsub's model is an (optionally capped) ordered key->value store, and that's made clear precisely nowhere in XEP-0060. So for MAM I wanted to focus on the specific use-cases that the protocol was designed for, and to present a clear way to correctly implement them (I am well aware that the XEP does not reach this goal currently). What you are describing - a local archive with multiple holes that the implementation is unaware of - is not a state that I see any such optimal correctly implemented MAM usage getting into. Therefore it's not a problem I want to solve, because it will only add to confusion about the best and easiest way to implement MAM. > Even if we would live in a world where such MAM archives are never going > to happen, adding the exact hint to RSM is worthwhile. I've no objection to RSM taking its own course, I won't object to such an enhancement if it's the right thing for RSM. Only if it's done on the sole basis as something that is desirable for MAM. > From a generic, non MAM-specific point-of-view, RSM is eventually used > to sync data, and for that you often want to now if the RSM metadata is > exact or not. My MAM example is just one illustration of that. It always > appeared like an afterthought that RSM does not allow the RSM data > originator to signal if the numbers are exact or not. The proposed > change tries to fix that. I think the intention of RSM's vague numbers is that they were used for things like UI (progress bars, etc.) hints only. One of the reasons for this is that RSM is designed to work with dynamic result sets. For example you might request disco#items of a MUC server, but rooms will be added/removed while paging through the results. RSM is carefully designed so that you will never receive duplicates, but an item that was present when you started paging will not be included in the results if it was removed before you reached its page. That's why is not accurate and not meant to be used for sync purposes. MAM is a special case because normally[*] XEP-0313 explicitly forbids adding or removing items in the middle of the result set. This is not true of most other things that RSM would be used for (disco items, pubsub, etc.), and therefore I think this flag would basically only work for MAM. And then all my reasoning above therefore applies. Regards, Matthew [*] the server setting stable='false' is an exception here - an aspect of the XEP I'm not keen on, but it was deemed necessary for some environments. Sync is simply impossible with such a server. ___ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___
Re: [Standards] Exact hint for Result Set Management
On 11.07.2018 18:01, Matthew Wild wrote: > On 11 July 2018 at 16:33, Florian Schmaus wrote: >> I recently submitted PR #672 to the xeps repo >> >> https://github.com/xsf/xeps/pull/672 >> >> to make users of RSM, like MAM, aware whether the result is exact or >> not. It received some scepticism from the council members in today's >> council meeting. I am to blame here as I thought the abstract motivation >> in the commit message was enough. It appears it wasn't. >> >> While I think multiple applications could exploit that information, my >> particular motivation was MAM. Consider the scenario where you have a >> master archive and a local archive. The local archive may have multiple >> holes at unknown locations. Now you want to sync your local archive from >> the master using MAM/RSM. > > I'm not keen on this solution for the premise you've given. > > I don't believe that when using MAM correctly you would ever end up > with "holes at unknown locations" in your local archive. I don't think > that encouraging people to use a "bisection algorithm" is the right > thing to do. So you don't want MAM users to be able to efficiently sync archives with multiple holes by a simple change because you do not want MAM to be used in scenarios where this could happen? Even if we would live in a world where such MAM archives are never going to happen, adding the exact hint to RSM is worthwhile. From a generic, non MAM-specific point-of-view, RSM is eventually used to sync data, and for that you often want to now if the RSM metadata is exact or not. My MAM example is just one illustration of that. It always appeared like an afterthought that RSM does not allow the RSM data originator to signal if the numbers are exact or not. The proposed change tries to fix that. - Florian signature.asc Description: OpenPGP digital signature ___ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___
Re: [Standards] Exact hint for Result Set Management
On 11 July 2018 at 16:33, Florian Schmaus wrote: > I recently submitted PR #672 to the xeps repo > > https://github.com/xsf/xeps/pull/672 > > to make users of RSM, like MAM, aware whether the result is exact or > not. It received some scepticism from the council members in today's > council meeting. I am to blame here as I thought the abstract motivation > in the commit message was enough. It appears it wasn't. > > While I think multiple applications could exploit that information, my > particular motivation was MAM. Consider the scenario where you have a > master archive and a local archive. The local archive may have multiple > holes at unknown locations. Now you want to sync your local archive from > the master using MAM/RSM. I'm not keen on this solution for the premise you've given. I don't believe that when using MAM correctly you would ever end up with "holes at unknown locations" in your local archive. I don't think that encouraging people to use a "bisection algorithm" is the right thing to do. If this is a problem you are facing, let's go back to the basics and figure out how you end up with holes at unknown locations in your archive. And we can fix that. Regards, Matthew ___ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___
[Standards] Exact hint for Result Set Management
I recently submitted PR #672 to the xeps repo https://github.com/xsf/xeps/pull/672 to make users of RSM, like MAM, aware whether the result is exact or not. It received some scepticism from the council members in today's council meeting. I am to blame here as I thought the abstract motivation in the commit message was enough. It appears it wasn't. While I think multiple applications could exploit that information, my particular motivation was MAM. Consider the scenario where you have a master archive and a local archive. The local archive may have multiple holes at unknown locations. Now you want to sync your local archive from the master using MAM/RSM. If you don't know whether or not the MAM RSM results are exact, you need resort to basically re-syncing the complete archive from the master. But if you know that the results are exact, you could use count-only MAM RSM queries (XEP-0059 § 2.7) and the unique-and-stable IDs of the archived messages to effectively determine the holes and fill them up, using a simple bisection algorithm which compares the message count with the expect unique-and-stable archive message ID. This saves a lot of roundtrips and especially transferred data when syncing the archive. I also like to point out that the changes in #672 are backwards compatible. Not namespace bump required. I'm off to barbecue, but I'm still looking forward to your feedback. - Florian signature.asc Description: OpenPGP digital signature ___ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org ___