Hello Peter,
Sorry for delay with the answer :-(
I'll discuss here both your comments and changes to XEP-136 (v 0.15).
To keep the message short (well, a kind of ;-) I do not comment those
changes you've made already to XEP which I'm completely OK with, so I
say "thanks for listening to my feedback!" for all of them here ;-)
Duplicate items
---------------
... skipped ...
Proposal: change "10. Replication" item by removing references to <after>
and <last> element and stating that start replication date should be
specified
using "start" attribute of "modified" command with additional note that
the collections with changed time exactly equal to "start" time are NOT
included
in the result (thus, "start" will effectively work as "after").
... skipped ...
+1 to that modification.
Looking at XEP-136 v 0.15 I see you've added "start" element, but it
seems that there's still one thing that remains for fixing "duplicate
items" issue:
i.e. "9.4 Replication" -> "The server MUST set the content of the
<last/> element to the UTC time". This is exactly the reason why
duplicate items issue happens, so I believe this phrase should be
removed, so that client treats <last/> element as opaque.
It may be a good idea though to say instead that this ID should be
persistent, so that client can re-use it upon the next query even if
it happens not immediately after the previous one and even if there
are some changes to the output since then - this is quite easy from
implementation point of view (for example current implementation in
mod_archive_odbc should satisfy that even without being written with
this requirement in mind, I believe), and those implementations that
are ready to neglect the possibility of "duplicate items" may just use
UTC there, as it also satisfies this requirement.
XEP-59: detecting the change
============================
... skipped ...
Proposal: add to RSM result the tag "changed", which, when present,
indicates the datetime of the most recent change of the items affected by
the query. It typically shouldn't be that problematic to compute this value
(certainly it wasn't for XEP-136 implementation), and it can be made
optional,
as it is done with "index" if in some cases it's hard to calculate it.
Can this be handled via the 'version' attribute in archiving?
Well, for my initial idea of client-side implementation, which
included collections indexing at client side, this seems to be not
enough, I think. 'version' attribute indeed covers issues with single
collection change, but that's not enough for more general case, as
there's still no way for the client to know that some changes happened
other than by performing replication, and this can be quite costly, as
then replication has to be performed after every request to make sure
things didn't change during the request, otherwise it's possible that
local cache is filled with inconsistent info.
However, now I tend to think that supporting local collections
indexing was not such a good idea anyway as its implementation is
quite complex and fragile, so I'm not sure that for me personally this
is a real issue anymore, because if collections shouldn't be indexed
locally there's no need in keeping them consistent.
But <changed/> element might still be useful if somebody else decides
to go that route, and also for other cases such as you've described
with searching.
However, if you agree to proceed with some kind of "change
notification" item for RSM, I think that my original <changed/>
element proposal should be upgraded to versioning-like scheme: so,
instead of including UTC, it should include just opaque integer which
is increased when items in requesting range are changed, but not
necessarily by +1 for each change. In this way it's still possible to
include UTC by converting it to integer first, or use any more
reliable way of versioning if that's applicable.
Resource modification when auto archiving
... skipped ...
Of course, the possibility here would be to just drop all resources from
JIDs
and store only bare JIDs, but that seems to be too limiting and
inconvenient.
I'm not so sure.
Well, I do not have that much experience with multiple resources
usage, but for me it seems that dropping resource altogether looks
like a bad idea due to at least several reasons:
1. By that we increase possibility of collections collisions. As due
to XEP-136 each collection has to be uniquely identified by "with" and
"start" if we strip resource there's higher probability of two
collections colliding by these attributes; while it's highly unlikely
I'm going to have two different conversations with the same person
under the same client started at the same time, it's more likely to
happen if resources are not used, so in fact these could be two or
more different clients.
2. I believe that resource may be an important part of information
about conversation in some cases, i.e. a kind of "where exactly did
this conversation happened?"
There is also the case of sending a message to the bare JID and the
receiving server sends that message to all resources. Then the recipient
could reply from multiple resources, thus starting multiple
conversations! I'm not sure how to handle that. Probably it's best to
save each conversation separately but each conversation / collection has
the same start message (however they might have different threads).
Hm, in fact that seems for me to be quite complex case and, most
likely, I have not enough knowledge to judge what is the best option
here. So everything written below are just some random (more or less)
thoughts ...
From the client side, if <thread/> elements are used, one
possibility, it seems, is to use "parent" attribute according to
XEP-201 in all children conversations pointing to the "root" <thread>
element of the original message and use different <thread> elements
for each conversation.
For XEP-136 this can be mapped to storing all these conversations
separately in different collections, store original message in its own
collection and put links from all children collections to this parent
collection (probably there has to be "parent" element in collections
linking besides "prev/next" then?)
Other possibility seems to be to treat these conversations as a kind
of special "group chat" ;-) Then everything just has to be stored into
single collection, but for differentiating between different parties
probably some attribute to messages should be added, similar to "name"
for groupchats.
To be true, I think that this issue is out of the scope of XEP-136 - I
would expect that <thread> behavior either should be specified by
XEP-201 (or somewhere else) or left as implementation-defined; on the
other hand, XEP-136, probably, should just take into account <thread>
values and use them for its business like described above.
So, for me it looks like the following could be specified in XEP-136:
1) If no <thread> element exist, server may use its own
implementation-defined strategies for mapping messages and
conversations to collections and also may treat resources in
implementation-defined way.
Maybe some heuristic can be suggested such as the one I described in
my first letter for "conversations tracking", but I doubt anything
100% reliable can be proposed.
2) If <thread> element is present, the mapping is exactly 1 <-> 1 (one
thread element to one collection). If "parent" attribute is present
for thread - the link should be created of type "parent" to the
appropriate collection.
Resources can be treated as follows: when receiving first message with
full JID it's allowed to overwrite previous bare JID of collection by
new, full JID; if previous JID was already full and the new one is
also full, and differs from the previous one - assume that we have
"multi-resource" case, modify collection's JID to bare one and forbid
all its further overwrites.
Duplicate messages times
------------------------
In "5.3 Uploading Messages to a Collection" it's specified that "If the
collection already exists then the server
MUST append the messages to the existing collection." However, it's not
said
what should be done if time for some of the messages is equal to time of
those
messages existing already in collection.
I assume that from "append the messages" clause it follows that
duplicate entities
should be created, but it could be good to mention to avoid ambiguities.
By "duplicate entities" do you mean <from/> or <to/> elements with the
same dateTime?
Yes. As I said I think it's more or less deducible what the required
behavior here is, but probably it can be useful to clarify it in
specs, as at least I had some doubts thinking about it. Maybe it's
just me, though ;-)
List collections for Bare JID / Domain
--------------------------------------
There seems to be no way to list collections solely for service JID,
as according to XEP-136 it's treated as domain JID request.
For example, when trying to list all collections for icq.example.com
you will get instead all collections of all users at icq.example.com - even
if you wanted to receive collections ONLY for icq.example.com
I do not think this is major problem, as it can be filtered out on
client side -
the only drawback is high amount of extra traffic, so, probably, it can
be left as it is, but adding some notice in specification on that subject
could be nice.
Hmm. That's the matching process we use in Multi-User Chat (XEP-0045)
and Privacy Lists (XEP-0016) and so on. I don't see this as a big
problem (you don't really chat with services directly), but it we find
out that it causes problems in reality we can fix it later.
Well, in fact I think I've found already one case when this is a
problem, not only for collections listing, but also for their removal
and for preferences storing, see my message:
http://mail.jabber.org/pipermail/standards/2007-November/017205.html
Basically, current approach means we have no real control over the
messages with bare/domain JIDs: so I can nor delete messages from/to
icq.example.com transport, neither forbid auto-archiving them without
affecting all messages to all ICQ users.
Thanks for your feedback, and sorry for taking so long to reply!
NP, thanks for taking care of that!
Good luck! Alexander
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.