Re: [Standards] XEP-136 and XEP-59 implementation comments

Peter Saint-Andre Mon, 24 Mar 2008 19:08:07 -0700

Alexander Tsvyashchenko wrote:
> 
> Hello Peter,
> 
> Sorry for delay with the answer :-(


And my apologies for the further delay.

> I'll discuss here both your comments and changes to XEP-136 (v 0.15).
> 
> To keep the message short (well, a kind of ;-) I do not comment those
> changes you've made already to XEP which I'm completely OK with, so I
> say "thanks for listening to my feedback!" for all of them here ;-)

Great. Let's come to agreement on the remaining issues so that we can
move this spec forward. :)

>>> Duplicate items
>>> ---------------
> 
> ... skipped ...
> 
>>> Proposal: change "10. Replication" item by removing references to
>>> <after>
>>> and <last> element and stating that start replication date should be
>>> specified
>>> using "start" attribute of "modified" command with additional note that
>>> the collections with changed time exactly equal to "start" time are NOT
>>> included
>>> in the result (thus, "start" will effectively work as "after").
> 
> ... skipped ...
> 
>> +1 to that modification.
> 
> Looking at XEP-136 v 0.15 I see you've added "start" element, but it
> seems that there's still one thing that remains for fixing "duplicate
> items" issue:
> i.e. "9.4 Replication" -> "The server MUST set the content of the
> <last/> element to the UTC time". This is exactly the reason why
> duplicate items issue happens, so I believe this phrase should be
> removed, so that client treats <last/> element as opaque.
> 
> It may be a good idea though to say instead that this ID should be
> persistent, so that client can re-use it upon the next query even if it
> happens not immediately after the previous one and even if there are
> some changes to the output since then - this is quite easy from
> implementation point of view (for example current implementation in
> mod_archive_odbc should satisfy that even without being written with
> this requirement in mind, I believe), and those implementations that are
> ready to neglect the possibility of "duplicate items" may just use UTC
> there, as it also satisfies this requirement.

Agreed.

My working copy now has:

"The XML character data of the <last/> element is a unique, persistent
identifier created by the server, which MUST be treated as opaque by the
client."

>>> XEP-59: detecting the change
>>> ============================
> 
> ... skipped ...
> 
>>> Proposal: add to RSM result the tag "changed", which, when present,
>>> indicates the datetime of the most recent change of the items
>>> affected by
>>> the query. It typically shouldn't be that problematic to compute this
>>> value
>>> (certainly it wasn't for XEP-136 implementation), and it can be made
>>> optional,
>>> as it is done with "index" if in some cases it's hard to calculate it.
>>
>> Can this be handled via the 'version' attribute in archiving?
> 
> Well, for my initial idea of client-side implementation, which included
> collections indexing at client side, this seems to be not enough, I
> think. 'version' attribute indeed covers issues with single collection
> change, but that's not enough for more general case, as there's still no
> way for the client to know that some changes happened other than by
> performing replication, and this can be quite costly, as then
> replication has to be performed after every request to make sure things
> didn't change during the request, otherwise it's possible that local
> cache is filled with inconsistent info.
> 
> However, now I tend to think that supporting local collections indexing
> was not such a good idea anyway as its implementation is quite complex
> and fragile, so I'm not sure that for me personally this is a real issue
> anymore, because if collections shouldn't be indexed locally there's no
> need in keeping them consistent.

Yes I think you're right about the complexity and fragility of local
collections indexing.

> But <changed/> element might still be useful if somebody else decides to
> go that route, and also for other cases such as you've described with
> searching.
> 
> However, if you agree to proceed with some kind of "change notification"
> item for RSM, I think that my original <changed/> element proposal
> should be upgraded to versioning-like scheme: so, instead of including
> UTC, it should include just opaque integer which is increased when items
> in requesting range are changed, but not necessarily by +1 for each
> change. In this way it's still possible to include UTC by converting it
> to integer first, or use any more reliable way of versioning if that's
> applicable.

You mean like XEP-0237? :-)

>>> Resource modification when auto archiving
> 
> ... skipped ...
> 
>>> Of course, the possibility here would be to just drop all resources from
>>> JIDs
>>> and store only bare JIDs, but that seems to be too limiting and
>>> inconvenient.
>>
>> I'm not so sure.
> 
> Well, I do not have that much experience with multiple resources usage,
> but for me it seems that dropping resource altogether looks like a bad
> idea due to at least several reasons:
> 
> 1. By that we increase possibility of collections collisions. As due to
> XEP-136 each collection has to be uniquely identified by "with" and
> "start" if we strip resource there's higher probability of two
> collections colliding by these attributes; while it's highly unlikely
> I'm going to have two different conversations with the same person under
> the same client started at the same time, it's more likely to happen if
> resources are not used, so in fact these could be two or more different
> clients.

That's true for IM use cases between human users, but no one ever said
that XMPP was limited to human users.

> 2. I believe that resource may be an important part of information about
> conversation in some cases, i.e. a kind of "where exactly did this
> conversation happened?"

Agreed.

>> There is also the case of sending a message to the bare JID and the
>> receiving server sends that message to all resources. Then the recipient
>> could reply from multiple resources, thus starting multiple
>> conversations! I'm not sure how to handle that. Probably it's best to
>> save each conversation separately but each conversation / collection has
>> the same start message (however they might have different threads).
> 
> Hm, in fact that seems for me to be quite complex case 

Yes it is, and rather messy, too.

> and, most likely,
> I have not enough knowledge to judge what is the best option here. So
> everything written below are just some random (more or less) thoughts ...
> 
> From the client side, if <thread/> elements are used, one possibility,
> it seems, is to use "parent" attribute according to XEP-201 in all
> children conversations pointing to the "root" <thread> element of the
> original message and use different <thread> elements for each conversation.
> 
> For XEP-136 this can be mapped to storing all these conversations
> separately in different collections, store original message in its own
> collection and put links from all children collections to this parent
> collection (probably there has to be "parent" element in collections
> linking besides "prev/next" then?)

Heh, that's creative. It might work best.

> Other possibility seems to be to treat these conversations as a kind of
> special "group chat" ;-) Then everything just has to be stored into
> single collection, but for differentiating between different parties
> probably some attribute to messages should be added, similar to "name"
> for groupchats.

I like that less.

> To be true, I think that this issue is out of the scope of XEP-136 - I
> would expect that <thread> behavior either should be specified by
> XEP-201 (or somewhere else) or left as implementation-defined; on the
> other hand, XEP-136, probably, should just take into account <thread>
> values and use them for its business like described above.

Agreed. So perhaps we need to say how threads are used in archiving. I
see that we've left that out so far.

> So, for me it looks like the following could be specified in XEP-136:
> 
> 1) If no <thread> element exist, server may use its own
> implementation-defined strategies for mapping messages and conversations
> to collections and also may treat resources in implementation-defined way.
> 
> Maybe some heuristic can be suggested such as the one I described in my
> first letter for "conversations tracking", but I doubt anything 100%
> reliable can be proposed.
> 
> 2) If <thread> element is present, the mapping is exactly 1 <-> 1 (one
> thread element to one collection). If "parent" attribute is present for
> thread - the link should be created of type "parent" to the appropriate
> collection.

That seems reasonable.

> Resources can be treated as follows: when receiving first message with
> full JID it's allowed to overwrite previous bare JID of collection by
> new, full JID; if previous JID was already full and the new one is also
> full, and differs from the previous one - assume that we have
> "multi-resource" case, modify collection's JID to bare one and forbid
> all its further overwrites.

That, too, seems reasonable.

>>> Duplicate messages times
>>> ------------------------
>>>
>>> In "5.3 Uploading Messages to a Collection" it's specified that "If the
>>> collection already exists then the server
>>> MUST append the messages to the existing collection." However, it's not
>>> said
>>> what should be done if time for some of the messages is equal to time of
>>> those
>>> messages existing already in collection.
>>>
>>> I assume that from "append the messages" clause it follows that
>>> duplicate entities
>>> should be created, but it could be good to mention to avoid ambiguities.
>>
>> By "duplicate entities" do you mean <from/> or <to/> elements with the
>> same dateTime?
> 
> Yes. As I said I think it's more or less deducible what the required
> behavior here is, but probably it can be useful to clarify it in specs,
> as at least I had some doubts thinking about it. Maybe it's just me,
> though ;-)

OK, I'll add a note about that.

>>> List collections for Bare JID / Domain
>>> --------------------------------------
>>>
>>> There seems to be no way to list collections solely for service JID,
>>> as according to XEP-136 it's treated as domain JID request.
>>>
>>> For example, when trying to list all collections for icq.example.com
>>> you will get instead all collections of all users at icq.example.com
>>> - even
>>> if you wanted to receive collections ONLY for icq.example.com
>>>
>>> I do not think this is major problem, as it can be filtered out on
>>> client side -
>>> the only drawback is high amount of extra traffic, so, probably, it can
>>> be left as it is, but adding some notice in specification on that
>>> subject
>>> could be nice.
>>
>> Hmm. That's the matching process we use in Multi-User Chat (XEP-0045)
>> and Privacy Lists (XEP-0016) and so on. I don't see this as a big
>> problem (you don't really chat with services directly), but it we find
>> out that it causes problems in reality we can fix it later.
> 
> Well, in fact I think I've found already one case when this is a
> problem, not only for collections listing, but also for their removal
> and for preferences storing, see my message:
> 
> http://mail.jabber.org/pipermail/standards/2007-November/017205.html
> 
> Basically, current approach means we have no real control over the
> messages with bare/domain JIDs: so I can nor delete messages from/to
> icq.example.com transport, neither forbid auto-archiving them without
> affecting all messages to all ICQ users.

Yes, but do you exchange messages with icq.example.com? We have the same
problem in MUC rooms -- you can't block all users at example.com from
joining the room without at the same time blocking example.com itself
from joining the room. Is that a big problem? I don't think so. We use
the same matching method in MUC, privacy lists, and now also message
archiving. If we want to fix that, I suggest that we fix it everywhere.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [Standards] XEP-136 and XEP-59 implementation comments

Reply via email to