https://bugs.freedesktop.org/show_bug.cgi?id=77272
Patrick Ohly <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|medium |highest --- Comment #2 from Patrick Ohly <[email protected]> --- Here's a potential (and not 100% correct!) algorithm for transferring a complete address book: uint16 used = GetSize() # not the same as maximum offset! uint16 start = choose_start() uint16 chunksize = choose_chunk_size() uint16 i for (i = start; i < used; i += chunksize) { PullAll( Offset = i, MaxCount = chunksize) } for (i = 0; i < start; i += chunksize) { PullAll( Offset = i, MaxCount = min(chunksize, start - 1) } Note that GetSize() is specified as returning the number of entries in the selected phonebook object that are actually used (i.e. indexes that correspond to non-NULL entries). This is relevant if contacts get deleted after starting the session. In that case, the algorithm above will not necessarily read all contacts. Here's an example: offsets #0 till #99, with contacts #10 till #19 deleted chunksize = 10 GetSize() = 90 => this will request offsets #0 till #89, missing contacts #90 till #99 I think this can be fixed with an additional PullAll, leading to: for (i = start; i < used; i += chunksize) { PullAll( Offset = i, MaxCount = chunksize) } PullAll(Offset = i) # not MaxCount! for (i = 0; i < start; i += chunksize) { PullAll( Offset = i, MaxCount = min(chunksize, start - 1) } The additional PullAll() is meant to read all contacts at the end which would not be covered otherwise. Now the other problem: MaxCount means "read chunksize contacts starting at #i". Therefore the algorithm above will end up reading contacts multiple times occasionally. Example: offsets #0 till #99, with contact #0 deleted chunksize = 10 GetSize() = 98 PullAll(Offset = 0, MaxCount = 10) => returns 10 contacts #1 till #10 (inclusive) PullAll(Offset = 10, MaxCount = 10) => returns 10 contacts #10 till #19 => contact #10 appears twice in the result The duplicate cannot be filtered out easily because the UID is not reliable. This could be addressed by keeping a hash of each contact and discarding those who are exact matches for already seen contacts. It's easier to accept the duplicate and remove it during the next sync. There are two more aspects that I chose to ignore above: how to implement the choice of start offset and chunk size. Start offset could be random (no persistent state needed) or could continue where the last sync left off. The latter will require a write after each PullAll() (in case of unexpected shutdowns), even if nothing ever changes. Is that acceptable? Probably not. I prefer choosing randomly. The chunk size depends on the size of the average contact. Make it too small, and we end up generating lots of individual transfers. Make it too large (say 1000), and we still have chunks that never transfer completely. We could tune the chunk size so that on average, each transfer has a certain size in bytes. TODO: how large? Once we have such a target size in bytes, perhaps we can let the algorithm adjust the chunk size dynamically: start small (100?), then increase or decrease depending on the observed size of the returned contacts. -- You are receiving this mail because: You are on the CC list for the bug.
_______________________________________________ Syncevolution-issues mailing list [email protected] https://lists.syncevolution.org/mailman/listinfo/syncevolution-issues
