Hello Olivier,
Quoting Olivier Goffart <[EMAIL PROTECTED]>:
Who changed it?
---------------
Typically the client will perform replication when it has some local cache
for collections / messages, to synchronize its cache with server one.
Therefore, it makes sense that client also use this cache for caching those
collections client uploads.
However, implementing it strictly according to XEP-136 means that client
has no way to determine if the changes received in replication were done
by this client or not - so, it will have to re-fetch entire collection even
if <changed> item in replication results was caused by upload from itself,
thus basically downloading the same collection it just uploaded on the
server, which is stored already in local cache.
Can't the client first synchronize his cache before uploading ?
No, this is not guaranteed to work - if one client checks that there
were no changes to this collection before its uploading, some other
client may change it after uploading, and first client still does not
have any way to determine if the "modified" item in further
replication response is caused by its own upload - or smb's else change.
But the ressource is not a valuable identifier.
You can connect from another client or elsewhere with the same ressource name
(not in the same time of course)
Yes, but as far as I understand you cannot have the same resource
bound to several different sessions, no? If this my assumption is
correct, then using resource still guarantees uniqueness during one
session - therefore it's enough to perform replication once, at
session start, and further during the session rely on "by" attribute.
But in fact, after thinking about it somewhat more, it seems that even
"by" attribute solution I proposed is not enough, as when client
uploads collection - it effectively overrides "by" value, so even if
it performed replication right before uploading - it's possible that
other client will modify collection between first client's replication
and uploading, thus causing first client to loose these changes when
it will perform next replication, as it will see only it's own "by"
modified item.
Besides that, even if choosing to neglect this possibility of change
between replication and upload (which is risky at least due to
automatic archiving), performing replication each time before
uploading seems to be a kind of overhead I personally would like to
avoid, if possible.
Therefore, it seems that we need another approach to solve this problem.
What about adding very simple & primitive versioning to collections?
Suppose that we require that each collection holds is version as
integer number,
where initial upload has version number 0 and all subsequent uploads
(or changes by server due to auto-archiving) increase this number by 1?
Then, if we include version number in "modified" response, such as
<changed with='[EMAIL PROTECTED]/chamber'
start='1469-07-21T02:56:15Z'
version='3'/>
and in collection retrieval response in chat tag such as
<chat xmlns='http://www.xmpp.org/extensions/xep-0136.html#ns'
with='[EMAIL PROTECTED]/chamber'
start='1469-07-21T02:56:15Z'
subject='She speaks!'
version='3'>
it provides easy & efficient way to track all changes in collection
and determine if client has the last version or not. Even if client
does not know whether the collection it uploads exists or not - this
will still work, as the client may just assume the version is 0 when
uploading collection and record this version in cache, and later when
it sees "modified" item for this collection it just verifies if it is
equal to cached version or not, if not - it needs to download it once
more.
The same holds for cases when client has this collection in cache
already: it just uploads new version and increments locally its
version by 1, if later it sees "modified" result which is not equal to
local version - it means other changes happened and collection needs
to be downloaded.
One note here is that version number should be internally hold for
"removed" items also (though it's not necessary to display this number
during "modified" response for removed collections) and reused when
collection is re-created, as if someone removes collection and later
re-creates it - versioning should be kept continuous so that other
clients can detect the change. However, this doesn't seem to be a real
problem, as info about "removed" items has to be kept anyway for
"modified" responses.
For me this seems to be superior to "by" solution as it is simple,
does not involve overheads and should cover all cases.
What do you think about this solution?
I think file format is implementation detail, and should NOT be part of that
XEP at all.
That section should be removed.
Maybe it can be part of a separate XEP later (there are already other im log
specification elsewhere anyway) or extention to XEP-0227, but it's not
really related
Well, I'm not really familiar with approaches & practices for XEP standards,
so I do not have my own opinion on that. The only thing is that having
at least some standard at least somewhere is nice thing, as it may
improve interoperability.
Good luck! Alexander