Hi, I've be reviewing the reconciliation algorithm and code (org.syncany.operations.DatabaseReconciliator). The code is quite complicated, so I've focused on the algorithm. Please note that what follows is my understanding of the algorithm, not Philipp's description. I hope I did not make major mistakes.
If you've been following Syncany history, you know that handling conflicts or, more generally, concurrency, was a main difficulty. Philipp redesigned this part during the "reboot". I think the current design is sound but needs improvement, and I explain why here. One of the main design assumptions of Syncacy is that the storage is dumb, removing the need for a central server. This means that syncany is a distributed system and faces all the corresponding difficulties. For the file content themselves, it is not a big deal because everything is uploaded packed into multichunks which have almost guaranteed unique names (while Syncany is not using UUID, it's the same idea https://en.wikipedia.org/wiki/UUID). So when clients of a unique repository are uploading things to the repository, no collision can happen. In addition, it's easy to guarantee atomicity of one upload/download either because the backend already guarantees it (S3) or because one can use the classical move trick (upload to file.tmp and then ask the backend to rename file.tmp to file). So the only remaining issue is the database: each client has a local database which describes the full state of the repository (file metadata, mapping between files and chunks, mapping between chunks and multichunks, history of files, etc.). Somehow, the clients have to agree on this database. The purpose of the DatabaseReconciliator is to make sure they do (do not confuse this with concurrent modifications of the files, by the way). The first trick is that a client is given a unique name (again uuid like) and that it will upload its modifications of dabatase has deltas named after this unique name and with a version number. For instance, if we have a client A, its first upload will be of the form db-A-1. As for the multichunks, no risk of overwriting there. The second trick is the use of a vector clock (https://en.wikipedia.org/wiki/Vector_clock). Each database delta contains a vector clock which maps each client known to the producer of the delta to the last known version of its database. For instance if A and B share a repository created and first populated by A, the first database is db-A-1 and its vector clock is (A=>1). Assuming B makes the second upload, it has first to download db-A-1. It then uploads db-B-1 with vector clock (A=>1,B=>1). This vector demonstrates a causal relation between db-A-1 and db-B-1 in the sense that the latter happens after the former, because db-B-1 was produced knowing db-A-1. As long as two databases delta are not uploaded concurrently, everything is quite simple: each new delta happens after the previous ones, and the history is in fact linear. Now because of the dumb storage, there is no simple way to prevent concurrent uploads of database delta without impairing the performances and/or risking starvation (in particular given the only eventual consistency of S3). So we can end up with the following situation: A create the repo and upload db-A-1 (A=>1) B makes a change and upload db-B-1 (A=>1,B=>1) A download the changes and upload db-A-2 (A=>2,B=>1) B upload concurrently some changes to db-B-2 (A=>1,B=>2) The two vector clocks are not comparable and there is no causal relation between db-A-2 and db-B-2. In other words, when C tries to clone the repository, it cannot infer a consensual state for the database. It will download db-A-1 and applies the changes from db-B-1, but then it is stuck because of the non causal relationship. In the current implementation of the reconciliation, C (or any other client) uses a deterministic tie breaking algorithm to choose a winner. The algorithm uses the timestamp of db-A-2 and db-B-2 as recorded in those deltas (this is the local time of A and B when they created the delta). If by any chance the timestamps are identical, then the repository ids are used to break the tie. So for instance here, the official linearized/trimmed history becomes db-A-1 -> db-B-1 -> db-A-2 and db-B-b2 is discarded. Because the tie breaking algorithm is deterministic, all clients will agree on this linearized/trimmed history. Assume for instance that C has downloaded the repository and uploaded its own changes. We have now a new delta, db-C-1 (A=>2,B=>1,C=>1). Notice the B=>1 as db-B-2 was discarded. When A comes back, it will consider the history to be db-A-1 -> db-B-1 -> db-A-2 -> db-C-1 and will know how to apply db-C-1 to take into account the modifications. When B does the same things, it will also agree on this history and will discover that its upload where discarded. It has to take actions to be able to upload this again. In my opinion, this works but is suboptimal as it assumes conflicts even if none are present. Also, most of the literature on distributed systems seem to agree on the need for a reconciliation to happen in order to fix the problem. In the above case, it means producing a delta that has both db-A-2 and db-B-2 in its causal history. I think it's important to do that for several reasons: 1) when a client loose, its content modifications are in the storage (it has uploaded the new multichunks before uploading the database delta), yet they are not available for the other clients. This is a waste of resources. 2) the only way to the modifications done on a looser to become available is for the looser to reconnect latter to the storage, while the needed content is already in the database. This prevents fast conflict resolutions and is a typical starvation situation. 3) because the conflict is solved latter, the history walking has to be done on each client, leading to some useless complication. Also, the history remains non causal and strongly tied to the deterministic tie breaking rule, as opposed to solely based on vector clocks (which have guaranteed properties). 4) As long as there is a dangling conflict, the history cannot be compressed. The resolution of the conflict in each client implies downloading multiple delta files. 5) while I'm confident the algorithm is sound, it is also original (as far as I know) and I would feel more secure with a more classical solution. So I would: - completely remove the --force option that allows uploading something without downloading previous databases before. This is a recipe for failure and I'm not even sure this cannot break the system beyond repair. - implement a user guided conflict resolution when the vector clocks cannot be used to guarantee a linear history. For instance in the case described above, we have C with db-A-2 and db-B-2 that are not causally related. Knowing that, C will first apply to db-B-1 (the last version in the linear history) all updates from db-A-2 and db-B-2 that do not conflict (that is which concerns different files). I'm confident this will be the more general case. Then for all conflicting updates, the user will be notified (notice that for text files standard merging algorithms could be applied automatically as git does). In the end, C will push a db-C-1 (A=>2, B=>2, C=>1) which will be a sort of merge commit. Once this is pushed, C can remove from the storage db-A-2 and db-B-2 as db-C-1 supersedes those. And then it will be allowed to upload it local changes, for instance. Obviously this can be done also by either A or B. In any case, the only operation allowed at that time would be the merge operation, so if another client (including A and B) tries to upload something, it will first need to implement the reconciliation between A and B (no --force!). While we cannot guarantee progress in this case, the need for a human intervention in case of real conflict will probably limit any risk of endless loop. However, as pointed out in the Harmony work of Pierce et al. (http://www.cis.upenn.edu/~bcpierce/papers/nway-tr.pdf), two identical conflict resolutions done by different clients will be difficult to detect. So we will need in this case to implement some form of waiting/locking to ensure someone actually win the race to resolve the conflict. Maybe the client writes to a .resolution file with its name in it. Someone is guaranteed to win the race to this file as uploads are atomic. When one has written to it, it starts to upload its solution. At the end of the upload, it verifies it actually won the race (in order to accommodate for eventual consistency). It this is the case, it commits (i.e. move in the case of ftp like remote) its solution. If not, it removes it and wait to the winner to complete the reconciliation process. Comments? Best regards, Fabrice -- Mailing list: https://launchpad.net/~syncany-team Post to : [email protected] Unsubscribe : https://launchpad.net/~syncany-team More help : https://help.launchpad.net/ListHelp

