Alexander Tsvyashchenko wrote: > > Hello Peter, > > Sorry for delay with the answer :-(
And my apologies for the further delay. > I'll discuss here both your comments and changes to XEP-136 (v 0.15). > > To keep the message short (well, a kind of ;-) I do not comment those > changes you've made already to XEP which I'm completely OK with, so I > say "thanks for listening to my feedback!" for all of them here ;-) Great. Let's come to agreement on the remaining issues so that we can move this spec forward. :) >>> Duplicate items >>> --------------- > > ... skipped ... > >>> Proposal: change "10. Replication" item by removing references to >>> <after> >>> and <last> element and stating that start replication date should be >>> specified >>> using "start" attribute of "modified" command with additional note that >>> the collections with changed time exactly equal to "start" time are NOT >>> included >>> in the result (thus, "start" will effectively work as "after"). > > ... skipped ... > >> +1 to that modification. > > Looking at XEP-136 v 0.15 I see you've added "start" element, but it > seems that there's still one thing that remains for fixing "duplicate > items" issue: > i.e. "9.4 Replication" -> "The server MUST set the content of the > <last/> element to the UTC time". This is exactly the reason why > duplicate items issue happens, so I believe this phrase should be > removed, so that client treats <last/> element as opaque. > > It may be a good idea though to say instead that this ID should be > persistent, so that client can re-use it upon the next query even if it > happens not immediately after the previous one and even if there are > some changes to the output since then - this is quite easy from > implementation point of view (for example current implementation in > mod_archive_odbc should satisfy that even without being written with > this requirement in mind, I believe), and those implementations that are > ready to neglect the possibility of "duplicate items" may just use UTC > there, as it also satisfies this requirement. Agreed. My working copy now has: "The XML character data of the <last/> element is a unique, persistent identifier created by the server, which MUST be treated as opaque by the client." >>> XEP-59: detecting the change >>> ============================ > > ... skipped ... > >>> Proposal: add to RSM result the tag "changed", which, when present, >>> indicates the datetime of the most recent change of the items >>> affected by >>> the query. It typically shouldn't be that problematic to compute this >>> value >>> (certainly it wasn't for XEP-136 implementation), and it can be made >>> optional, >>> as it is done with "index" if in some cases it's hard to calculate it. >> >> Can this be handled via the 'version' attribute in archiving? > > Well, for my initial idea of client-side implementation, which included > collections indexing at client side, this seems to be not enough, I > think. 'version' attribute indeed covers issues with single collection > change, but that's not enough for more general case, as there's still no > way for the client to know that some changes happened other than by > performing replication, and this can be quite costly, as then > replication has to be performed after every request to make sure things > didn't change during the request, otherwise it's possible that local > cache is filled with inconsistent info. > > However, now I tend to think that supporting local collections indexing > was not such a good idea anyway as its implementation is quite complex > and fragile, so I'm not sure that for me personally this is a real issue > anymore, because if collections shouldn't be indexed locally there's no > need in keeping them consistent. Yes I think you're right about the complexity and fragility of local collections indexing. > But <changed/> element might still be useful if somebody else decides to > go that route, and also for other cases such as you've described with > searching. > > However, if you agree to proceed with some kind of "change notification" > item for RSM, I think that my original <changed/> element proposal > should be upgraded to versioning-like scheme: so, instead of including > UTC, it should include just opaque integer which is increased when items > in requesting range are changed, but not necessarily by +1 for each > change. In this way it's still possible to include UTC by converting it > to integer first, or use any more reliable way of versioning if that's > applicable. You mean like XEP-0237? :-) >>> Resource modification when auto archiving > > ... skipped ... > >>> Of course, the possibility here would be to just drop all resources from >>> JIDs >>> and store only bare JIDs, but that seems to be too limiting and >>> inconvenient. >> >> I'm not so sure. > > Well, I do not have that much experience with multiple resources usage, > but for me it seems that dropping resource altogether looks like a bad > idea due to at least several reasons: > > 1. By that we increase possibility of collections collisions. As due to > XEP-136 each collection has to be uniquely identified by "with" and > "start" if we strip resource there's higher probability of two > collections colliding by these attributes; while it's highly unlikely > I'm going to have two different conversations with the same person under > the same client started at the same time, it's more likely to happen if > resources are not used, so in fact these could be two or more different > clients. That's true for IM use cases between human users, but no one ever said that XMPP was limited to human users. > 2. I believe that resource may be an important part of information about > conversation in some cases, i.e. a kind of "where exactly did this > conversation happened?" Agreed. >> There is also the case of sending a message to the bare JID and the >> receiving server sends that message to all resources. Then the recipient >> could reply from multiple resources, thus starting multiple >> conversations! I'm not sure how to handle that. Probably it's best to >> save each conversation separately but each conversation / collection has >> the same start message (however they might have different threads). > > Hm, in fact that seems for me to be quite complex case Yes it is, and rather messy, too. > and, most likely, > I have not enough knowledge to judge what is the best option here. So > everything written below are just some random (more or less) thoughts ... > > From the client side, if <thread/> elements are used, one possibility, > it seems, is to use "parent" attribute according to XEP-201 in all > children conversations pointing to the "root" <thread> element of the > original message and use different <thread> elements for each conversation. > > For XEP-136 this can be mapped to storing all these conversations > separately in different collections, store original message in its own > collection and put links from all children collections to this parent > collection (probably there has to be "parent" element in collections > linking besides "prev/next" then?) Heh, that's creative. It might work best. > Other possibility seems to be to treat these conversations as a kind of > special "group chat" ;-) Then everything just has to be stored into > single collection, but for differentiating between different parties > probably some attribute to messages should be added, similar to "name" > for groupchats. I like that less. > To be true, I think that this issue is out of the scope of XEP-136 - I > would expect that <thread> behavior either should be specified by > XEP-201 (or somewhere else) or left as implementation-defined; on the > other hand, XEP-136, probably, should just take into account <thread> > values and use them for its business like described above. Agreed. So perhaps we need to say how threads are used in archiving. I see that we've left that out so far. > So, for me it looks like the following could be specified in XEP-136: > > 1) If no <thread> element exist, server may use its own > implementation-defined strategies for mapping messages and conversations > to collections and also may treat resources in implementation-defined way. > > Maybe some heuristic can be suggested such as the one I described in my > first letter for "conversations tracking", but I doubt anything 100% > reliable can be proposed. > > 2) If <thread> element is present, the mapping is exactly 1 <-> 1 (one > thread element to one collection). If "parent" attribute is present for > thread - the link should be created of type "parent" to the appropriate > collection. That seems reasonable. > Resources can be treated as follows: when receiving first message with > full JID it's allowed to overwrite previous bare JID of collection by > new, full JID; if previous JID was already full and the new one is also > full, and differs from the previous one - assume that we have > "multi-resource" case, modify collection's JID to bare one and forbid > all its further overwrites. That, too, seems reasonable. >>> Duplicate messages times >>> ------------------------ >>> >>> In "5.3 Uploading Messages to a Collection" it's specified that "If the >>> collection already exists then the server >>> MUST append the messages to the existing collection." However, it's not >>> said >>> what should be done if time for some of the messages is equal to time of >>> those >>> messages existing already in collection. >>> >>> I assume that from "append the messages" clause it follows that >>> duplicate entities >>> should be created, but it could be good to mention to avoid ambiguities. >> >> By "duplicate entities" do you mean <from/> or <to/> elements with the >> same dateTime? > > Yes. As I said I think it's more or less deducible what the required > behavior here is, but probably it can be useful to clarify it in specs, > as at least I had some doubts thinking about it. Maybe it's just me, > though ;-) OK, I'll add a note about that. >>> List collections for Bare JID / Domain >>> -------------------------------------- >>> >>> There seems to be no way to list collections solely for service JID, >>> as according to XEP-136 it's treated as domain JID request. >>> >>> For example, when trying to list all collections for icq.example.com >>> you will get instead all collections of all users at icq.example.com >>> - even >>> if you wanted to receive collections ONLY for icq.example.com >>> >>> I do not think this is major problem, as it can be filtered out on >>> client side - >>> the only drawback is high amount of extra traffic, so, probably, it can >>> be left as it is, but adding some notice in specification on that >>> subject >>> could be nice. >> >> Hmm. That's the matching process we use in Multi-User Chat (XEP-0045) >> and Privacy Lists (XEP-0016) and so on. I don't see this as a big >> problem (you don't really chat with services directly), but it we find >> out that it causes problems in reality we can fix it later. > > Well, in fact I think I've found already one case when this is a > problem, not only for collections listing, but also for their removal > and for preferences storing, see my message: > > http://mail.jabber.org/pipermail/standards/2007-November/017205.html > > Basically, current approach means we have no real control over the > messages with bare/domain JIDs: so I can nor delete messages from/to > icq.example.com transport, neither forbid auto-archiving them without > affecting all messages to all ICQ users. Yes, but do you exchange messages with icq.example.com? We have the same problem in MUC rooms -- you can't block all users at example.com from joining the room without at the same time blocking example.com itself from joining the room. Is that a big problem? I don't think so. We use the same matching method in MUC, privacy lists, and now also message archiving. If we want to fix that, I suggest that we fix it everywhere. Peter -- Peter Saint-Andre https://stpeter.im/
smime.p7s
Description: S/MIME Cryptographic Signature
