Re: [Standards] Exact hint for Result Set Management

2018-07-12 Thread Dave Cridland
On 12 July 2018 at 12:24, Florian Schmaus  wrote:

> On 12.07.2018 12:39, Kevin Smith wrote:
> > On 12 Jul 2018, at 11:23, Matthew Wild  wrote:
> >>
> >> On 11 July 2018 at 18:25, Florian Schmaus  wrote:
> >>> On 11.07.2018 18:01, Matthew Wild wrote:
>  On 11 July 2018 at 16:33, Florian Schmaus  wrote:
> > I recently submitted PR #672 to the xeps repo
> >
> > https://github.com/xsf/xeps/pull/672
> >
> > to make users of RSM, like MAM, aware whether the result is exact or
> > not. It received some scepticism from the council members in today's
> > council meeting. I am to blame here as I thought the abstract
> motivation
> > in the commit message was enough. It appears it wasn't.
> >
> > While I think multiple applications could exploit that information,
> my
> > particular motivation was MAM. Consider the scenario where you have a
> > master archive and a local archive. The local archive may have
> multiple
> > holes at unknown locations. Now you want to sync your local archive
> from
> > the master using MAM/RSM.
> 
>  I'm not keen on this solution for the premise you've given.
> 
>  I don't believe that when using MAM correctly you would ever end up
>  with "holes at unknown locations" in your local archive. I don't think
>  that encouraging people to use a "bisection algorithm" is the right
>  thing to do.
> >>>
> >>> So you don't want MAM users to be able to efficiently sync archives
> with
> >>> multiple holes by a simple change because you do not want MAM to be
> used
> >>> in scenarios where this could happen?
> >>
> >> Just adding this flag will not make servers implement it, so it's
> >> going to add code and still need a fallback.
> >
> > And, as specified (optional but with no default or meaning for a missing
> flag) it seems unhelpful
>
> As Georg mentioned yesterday, the default is exact="maybe". There is
> also a sentence explaining the semantic of the missing hint:
>
> https://github.com/xsf/xeps/pull/672/files#diff-
> fd691aeb84210578723b940e9881ab7eR200
>
> > and as it adds a SHOULD, in a Draft XEP, with no namespace bump or
> > discovery, it’s adding ambiguity and confusion..
>
> I don't think that this is true, but we certainly can talk about making
> it just a recommendation if it is a blocker.
>
>
There's no difference.

>From RFC 2119:

3. SHOULD   This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.



> - Florian
>
>
> ___
> Standards mailing list
> Info: https://mail.jabber.org/mailman/listinfo/standards
> Unsubscribe: standards-unsubscr...@xmpp.org
> ___
>
>
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


[Standards] Exact hint for Result Set Management

2018-07-12 Thread Florian Schmaus
On 12.07.2018 15:20, Matthew Wild wrote:
> On 12 July 2018 at 13:13, Florian Schmaus  wrote:
>> I'm well aware why RSM allows inexact results. That is not what this PR
>> is about.
>>
>> I simply think it is an oversight that RSM does not signal if the
>> results are exact or not.
> 
> Then I think you didn't quite consider my points?

I hope I did, sorry if not. It appears you consider two concepts the
same which I don't consider to be equal:
- Returning the exact result at the point in time, versus returning just
  an approximation.
- The possibility that the data changes in the future.


> The count can change at any time. So it may have been exact at the
> time of the query, but even by the time the client receives the count
> across the wire that count could be incorrect. What use is it for the
> client to know that the count was "exact" at some unknown point in
> recent history?

Possibly not much for those cases.

But there are cases where it comes in
useful for the client to know that the numbers are exact: For example
when syncing data.

C: How many entries are between X and Y?
S: Exactly 500.
C: Great that is my view of the data too and I know that the entries
between X and Y can not change, hence I know that my data is in sync.

vs.

C: How many entries are between X and Y?
S: Approximate 500.
Now C sees that the information is useless even if it also has 500
entries between X and Y, because the count given by S is (possibly)
approximate. So C goes home dealing with his grief (probably with the
help of a bottle scotch).

- Florian



signature.asc
Description: OpenPGP digital signature
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Exact hint for Result Set Management

2018-07-12 Thread Matthew Wild
On 12 July 2018 at 13:13, Florian Schmaus  wrote:
> I'm well aware why RSM allows inexact results. That is not what this PR
> is about.
>
> I simply think it is an oversight that RSM does not signal if the
> results are exact or not.

Then I think you didn't quite consider my points?

The count can change at any time. So it may have been exact at the
time of the query, but even by the time the client receives the count
across the wire that count could be incorrect. What use is it for the
client to know that the count was "exact" at some unknown point in
recent history?

Regards,
Matthew
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Exact hint for Result Set Management

2018-07-12 Thread Florian Schmaus
On 12.07.2018 12:23, Matthew Wild wrote:
> On 11 July 2018 at 18:25, Florian Schmaus  wrote:
>> On 11.07.2018 18:01, Matthew Wild wrote:
>>> On 11 July 2018 at 16:33, Florian Schmaus  wrote:
>> So you don't want MAM users to be able to efficiently sync archives with
>> multiple holes by a simple change because you do not want MAM to be used
>> in scenarios where this could happen?
> 
> Just adding this flag will not make servers implement it, so it's
> going to add code and still need a fallback.

True, but at least we have a specified way to signal that the returned
numbers are exact then.

> I believe that ambiguity in how to implement protocols properly is a
> big problem we have. XEPs tend to describe only the wire protocol, and
> if the author of the XEP even had a data model in mind, it's rarely
> (if ever) clear. For example it took me a long time to realise that
> pubsub's model is an (optionally capped) ordered key->value store, and
> that's made clear precisely nowhere in XEP-0060.


>> From a generic, non MAM-specific point-of-view, RSM is eventually used
>> to sync data, and for that you often want to now if the RSM metadata is
>> exact or not. My MAM example is just one illustration of that. It always
>> appeared like an afterthought that RSM does not allow the RSM data
>> originator to signal if the numbers are exact or not. The proposed
>> change tries to fix that.
> 
> I think the intention of RSM's vague numbers is that they were used
> for things like UI (progress bars, etc.) hints only.

I'm well aware why RSM allows inexact results. That is not what this PR
is about.

I simply think it is an oversight that RSM does not signal if the
results are exact or not. That is all I want to address. My suggestion
in current PR does that in a backwards compatible way. If there would be
a RSM namespace bump imminent, then we could do it right and make the
hint mandatory. But I don't think that there is one pending, and I
believe the exact hint does not justify a namespace bump.

- Florian



signature.asc
Description: OpenPGP digital signature
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Exact hint for Result Set Management

2018-07-12 Thread Ненахов Андрей
чт, 12 июл. 2018 г. в 15:26, Matthew Wild :
> What you are describing - a local archive with multiple holes that the
> implementation is unaware of - is not a state that I see any such
> optimal correctly implemented MAM usage getting into.

Actually, it's a typical state if a server has server-side search
support. You might search for 'address', receive multiple results, go
to three presented entries, fetch nearby history from an archive, end
with three patches of data and holes in between them. Fighting such
holes is not that easy (though somewhat possible) with a current MAM.
Definitely could use a better solution.

-- 
Ненахов Андрей
Директор ООО "Редсолюшн" (Челябинск)
(351) 750-50-04
http://www.redsolution.ru
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Exact hint for Result Set Management

2018-07-12 Thread Florian Schmaus
On 12.07.2018 12:39, Kevin Smith wrote:
> On 12 Jul 2018, at 11:23, Matthew Wild  wrote:
>>
>> On 11 July 2018 at 18:25, Florian Schmaus  wrote:
>>> On 11.07.2018 18:01, Matthew Wild wrote:
 On 11 July 2018 at 16:33, Florian Schmaus  wrote:
> I recently submitted PR #672 to the xeps repo
>
> https://github.com/xsf/xeps/pull/672
>
> to make users of RSM, like MAM, aware whether the result is exact or
> not. It received some scepticism from the council members in today's
> council meeting. I am to blame here as I thought the abstract motivation
> in the commit message was enough. It appears it wasn't.
>
> While I think multiple applications could exploit that information, my
> particular motivation was MAM. Consider the scenario where you have a
> master archive and a local archive. The local archive may have multiple
> holes at unknown locations. Now you want to sync your local archive from
> the master using MAM/RSM.

 I'm not keen on this solution for the premise you've given.

 I don't believe that when using MAM correctly you would ever end up
 with "holes at unknown locations" in your local archive. I don't think
 that encouraging people to use a "bisection algorithm" is the right
 thing to do.
>>>
>>> So you don't want MAM users to be able to efficiently sync archives with
>>> multiple holes by a simple change because you do not want MAM to be used
>>> in scenarios where this could happen?
>>
>> Just adding this flag will not make servers implement it, so it's
>> going to add code and still need a fallback.
> 
> And, as specified (optional but with no default or meaning for a missing 
> flag) it seems unhelpful 

As Georg mentioned yesterday, the default is exact="maybe". There is
also a sentence explaining the semantic of the missing hint:

https://github.com/xsf/xeps/pull/672/files#diff-fd691aeb84210578723b940e9881ab7eR200

> and as it adds a SHOULD, in a Draft XEP, with no namespace bump or
> discovery, it’s adding ambiguity and confusion..

I don't think that this is true, but we certainly can talk about making
it just a recommendation if it is a blocker.

- Florian



signature.asc
Description: OpenPGP digital signature
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Exact hint for Result Set Management

2018-07-12 Thread Kevin Smith
On 12 Jul 2018, at 11:23, Matthew Wild  wrote:
> 
> On 11 July 2018 at 18:25, Florian Schmaus  wrote:
>> On 11.07.2018 18:01, Matthew Wild wrote:
>>> On 11 July 2018 at 16:33, Florian Schmaus  wrote:
 I recently submitted PR #672 to the xeps repo
 
 https://github.com/xsf/xeps/pull/672
 
 to make users of RSM, like MAM, aware whether the result is exact or
 not. It received some scepticism from the council members in today's
 council meeting. I am to blame here as I thought the abstract motivation
 in the commit message was enough. It appears it wasn't.
 
 While I think multiple applications could exploit that information, my
 particular motivation was MAM. Consider the scenario where you have a
 master archive and a local archive. The local archive may have multiple
 holes at unknown locations. Now you want to sync your local archive from
 the master using MAM/RSM.
>>> 
>>> I'm not keen on this solution for the premise you've given.
>>> 
>>> I don't believe that when using MAM correctly you would ever end up
>>> with "holes at unknown locations" in your local archive. I don't think
>>> that encouraging people to use a "bisection algorithm" is the right
>>> thing to do.
>> 
>> So you don't want MAM users to be able to efficiently sync archives with
>> multiple holes by a simple change because you do not want MAM to be used
>> in scenarios where this could happen?
> 
> Just adding this flag will not make servers implement it, so it's
> going to add code and still need a fallback.

And, as specified (optional but with no default or meaning for a missing flag) 
it seems unhelpful and as it adds a SHOULD, in a Draft XEP, with no namespace 
bump or discovery, it’s adding ambiguity and confusion..

> What you are describing - a local archive with multiple holes that the
> implementation is unaware of - is not a state that I see any such
> optimal correctly implemented MAM usage getting into.

OTOH, a local archive with multiple *known* holes is easy to get into and we 
need to ensure this case is covered - but this doesn’t need this change to RSM.

> Therefore it's not a problem I want to solve, because it will only add
> to confusion about the best and easiest way to implement MAM.

+1.

>> From a generic, non MAM-specific point-of-view, RSM is eventually used
>> to sync data, and for that you often want to now if the RSM metadata is
>> exact or not. My MAM example is just one illustration of that. It always
>> appeared like an afterthought that RSM does not allow the RSM data
>> originator to signal if the numbers are exact or not. The proposed
>> change tries to fix that.
> 
> I think the intention of RSM's vague numbers is that they were used
> for things like UI (progress bars, etc.) hints only.
> 
> One of the reasons for this is that RSM is designed to work with
> dynamic result sets. For example you might request disco#items of a
> MUC server, but rooms will be added/removed while paging through the
> results. RSM is carefully designed so that you will never receive
> duplicates, but an item that was present when you started paging will
> not be included in the results if it was removed before you reached
> its page. That's why  is not accurate and not meant to be used
> for sync purposes.
> 
> MAM is a special case because normally[*] XEP-0313 explicitly forbids
> adding or removing items in the middle of the result set. This is not
> true of most other things that RSM would be used for (disco items,
> pubsub, etc.), and therefore I think this flag would basically only
> work for MAM. And then all my reasoning above therefore applies.

Different, but yet in MAM you probably don’t want to count results accurately 
either, even though you could, and will probably be returning *very* 
approximate values here to avoid flooring the archive server (whatever form it 
takes).

> [*] the server setting stable='false' is an exception here - an aspect
> of the XEP I'm not keen on, but it was deemed necessary for some
> environments. Sync is simply impossible with such a server.

You can sync with such a server, but only those results that have become stable 
- the idea here is that if you have a clustered server doing something 
eventually convergentish you don’t want to refuse to answer MAM queries until 
the archive results are perfectly synched, so you can answer with some unstable 
results on the basis that that’s good enough for many use cases. Although this 
is getting offtopic somewhat.

/K
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Exact hint for Result Set Management

2018-07-12 Thread Matthew Wild
On 11 July 2018 at 18:25, Florian Schmaus  wrote:
> On 11.07.2018 18:01, Matthew Wild wrote:
>> On 11 July 2018 at 16:33, Florian Schmaus  wrote:
>>> I recently submitted PR #672 to the xeps repo
>>>
>>> https://github.com/xsf/xeps/pull/672
>>>
>>> to make users of RSM, like MAM, aware whether the result is exact or
>>> not. It received some scepticism from the council members in today's
>>> council meeting. I am to blame here as I thought the abstract motivation
>>> in the commit message was enough. It appears it wasn't.
>>>
>>> While I think multiple applications could exploit that information, my
>>> particular motivation was MAM. Consider the scenario where you have a
>>> master archive and a local archive. The local archive may have multiple
>>> holes at unknown locations. Now you want to sync your local archive from
>>> the master using MAM/RSM.
>>
>> I'm not keen on this solution for the premise you've given.
>>
>> I don't believe that when using MAM correctly you would ever end up
>> with "holes at unknown locations" in your local archive. I don't think
>> that encouraging people to use a "bisection algorithm" is the right
>> thing to do.
>
> So you don't want MAM users to be able to efficiently sync archives with
> multiple holes by a simple change because you do not want MAM to be used
> in scenarios where this could happen?

Just adding this flag will not make servers implement it, so it's
going to add code and still need a fallback.

I believe that ambiguity in how to implement protocols properly is a
big problem we have. XEPs tend to describe only the wire protocol, and
if the author of the XEP even had a data model in mind, it's rarely
(if ever) clear. For example it took me a long time to realise that
pubsub's model is an (optionally capped) ordered key->value store, and
that's made clear precisely nowhere in XEP-0060.

So for MAM I wanted to focus on the specific use-cases that the
protocol was designed for, and to present a clear way to correctly
implement them (I am well aware that the XEP does not reach this goal
currently).

What you are describing - a local archive with multiple holes that the
implementation is unaware of - is not a state that I see any such
optimal correctly implemented MAM usage getting into.

Therefore it's not a problem I want to solve, because it will only add
to confusion about the best and easiest way to implement MAM.

> Even if we would live in a world where such MAM archives are never going
> to happen, adding the exact hint to RSM is worthwhile.

I've no objection to RSM taking its own course, I won't object to such
an enhancement if it's the right thing for RSM. Only if it's done on
the sole basis as something that is desirable for MAM.

> From a generic, non MAM-specific point-of-view, RSM is eventually used
> to sync data, and for that you often want to now if the RSM metadata is
> exact or not. My MAM example is just one illustration of that. It always
> appeared like an afterthought that RSM does not allow the RSM data
> originator to signal if the numbers are exact or not. The proposed
> change tries to fix that.

I think the intention of RSM's vague numbers is that they were used
for things like UI (progress bars, etc.) hints only.

One of the reasons for this is that RSM is designed to work with
dynamic result sets. For example you might request disco#items of a
MUC server, but rooms will be added/removed while paging through the
results. RSM is carefully designed so that you will never receive
duplicates, but an item that was present when you started paging will
not be included in the results if it was removed before you reached
its page. That's why  is not accurate and not meant to be used
for sync purposes.

MAM is a special case because normally[*] XEP-0313 explicitly forbids
adding or removing items in the middle of the result set. This is not
true of most other things that RSM would be used for (disco items,
pubsub, etc.), and therefore I think this flag would basically only
work for MAM. And then all my reasoning above therefore applies.

Regards,
Matthew

[*] the server setting stable='false' is an exception here - an aspect
of the XEP I'm not keen on, but it was deemed necessary for some
environments. Sync is simply impossible with such a server.
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Exact hint for Result Set Management

2018-07-11 Thread Florian Schmaus
On 11.07.2018 18:01, Matthew Wild wrote:
> On 11 July 2018 at 16:33, Florian Schmaus  wrote:
>> I recently submitted PR #672 to the xeps repo
>>
>> https://github.com/xsf/xeps/pull/672
>>
>> to make users of RSM, like MAM, aware whether the result is exact or
>> not. It received some scepticism from the council members in today's
>> council meeting. I am to blame here as I thought the abstract motivation
>> in the commit message was enough. It appears it wasn't.
>>
>> While I think multiple applications could exploit that information, my
>> particular motivation was MAM. Consider the scenario where you have a
>> master archive and a local archive. The local archive may have multiple
>> holes at unknown locations. Now you want to sync your local archive from
>> the master using MAM/RSM.
> 
> I'm not keen on this solution for the premise you've given.
> 
> I don't believe that when using MAM correctly you would ever end up
> with "holes at unknown locations" in your local archive. I don't think
> that encouraging people to use a "bisection algorithm" is the right
> thing to do.

So you don't want MAM users to be able to efficiently sync archives with
multiple holes by a simple change because you do not want MAM to be used
in scenarios where this could happen?

Even if we would live in a world where such MAM archives are never going
to happen, adding the exact hint to RSM is worthwhile.

From a generic, non MAM-specific point-of-view, RSM is eventually used
to sync data, and for that you often want to now if the RSM metadata is
exact or not. My MAM example is just one illustration of that. It always
appeared like an afterthought that RSM does not allow the RSM data
originator to signal if the numbers are exact or not. The proposed
change tries to fix that.

- Florian



signature.asc
Description: OpenPGP digital signature
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Exact hint for Result Set Management

2018-07-11 Thread Matthew Wild
On 11 July 2018 at 16:33, Florian Schmaus  wrote:
> I recently submitted PR #672 to the xeps repo
>
> https://github.com/xsf/xeps/pull/672
>
> to make users of RSM, like MAM, aware whether the result is exact or
> not. It received some scepticism from the council members in today's
> council meeting. I am to blame here as I thought the abstract motivation
> in the commit message was enough. It appears it wasn't.
>
> While I think multiple applications could exploit that information, my
> particular motivation was MAM. Consider the scenario where you have a
> master archive and a local archive. The local archive may have multiple
> holes at unknown locations. Now you want to sync your local archive from
> the master using MAM/RSM.

I'm not keen on this solution for the premise you've given.

I don't believe that when using MAM correctly you would ever end up
with "holes at unknown locations" in your local archive. I don't think
that encouraging people to use a "bisection algorithm" is the right
thing to do.

If this is a problem you are facing, let's go back to the basics and
figure out how you end up with holes at unknown locations in your
archive. And we can fix that.

Regards,
Matthew
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


[Standards] Exact hint for Result Set Management

2018-07-11 Thread Florian Schmaus
I recently submitted PR #672 to the xeps repo

https://github.com/xsf/xeps/pull/672

to make users of RSM, like MAM, aware whether the result is exact or
not. It received some scepticism from the council members in today's
council meeting. I am to blame here as I thought the abstract motivation
in the commit message was enough. It appears it wasn't.

While I think multiple applications could exploit that information, my
particular motivation was MAM. Consider the scenario where you have a
master archive and a local archive. The local archive may have multiple
holes at unknown locations. Now you want to sync your local archive from
the master using MAM/RSM.

If you don't know whether or not the MAM RSM results are exact, you need
resort to basically re-syncing the complete archive from the master. But
if you know that the results are exact, you could use count-only MAM RSM
queries (XEP-0059 § 2.7) and the unique-and-stable IDs of the archived
messages to effectively determine the holes and fill them up, using a
simple bisection algorithm which compares the message count with the
expect unique-and-stable archive message ID. This saves a lot of
roundtrips and especially transferred data when syncing the archive.

I also like to point out that the changes in #672 are backwards
compatible. Not namespace bump required.

I'm off to barbecue, but I'm still looking forward to your feedback.

- Florian



signature.asc
Description: OpenPGP digital signature
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___