Hi, On 6 March 2010 00:16, MP <[email protected]> wrote: > While API 0.6 have implemented object versioning, preventing > accidentally overwriting someone else's changes, with introduction of > atomic uploads now I see many problems with duplicate data. > > These come often with imports of data or generally if someone uploads > any new data without modifying any existing data (like if someone just > traces hundreds of buildings from ortophoto, or alike ....) > > Since in JOSM (and possibly in other tools) the atomic upload is the > default method, that user presses some "upload" button and in few > seconds all the changes are uploaded to the server, which then starts > processing it (this could take some time for larger changes) and once > it is finished, it will send new node ID's back to the editor. > > Unfortunately, sometimes while waiting for server to process the > uploaded data, the connection will timeout, so the user sees some > error message - thinking the upload failed, he presses "upload" > again, starting to push new copy of all the objects to the server. > Later, the server want to return ID's from first upload, but nobody is > listening on the orher end anymore. > > Ultimate result is sometimes having 2 to 4 identical copies of some > data, sometimes it is thousands of duplicate nodes and ways. > > Suggestion for one possible countermeasure: > after server receives complete succesful atomic upload from user, > compute SHA1, MD5, or some other checksum of the uploaded XML. Store > it and if user tries uploading exactly the same thing again (because > he thinks the upload have failed, which is not true), send him just > some error message instead, like: "You have already uploaded this > data".
This sounds like a good idea to me. Perhaps it should only be employed for diff uploads with only <create>'s, for all other cases a re-upload will fail with a conflict. An identical measure can be implemented in the client such as JOSM. Only the uploads with solely new objects need to be extra cautious, but even for other uploads JOSM could admittedly be better at treating network errors, for example by looking at the last open changeset and retrieving the new IDs and versions of objects which should have been in the server response. I have a very experimental script that generates the server response based on the content uploaded and the corresponding changeset as downloaded from the api, which I use for bulk uploads, at http://svn.openstreetmap.org/applications/utils/import/bulkupload/change2diff2.py It only works if the changeset contains only the single diff and it makes other significant assumptions. Generally if you're not uploading through a proxy and the diff is not in conflict with existing data (for example because it only creates new objects) I notice that it will always hit the database if 100% of the xml is uploaded, i.e. once the last byte has been sent out the api never cancels the commit, if on the contrary not all bytes were sent out, the api will not be able to parse it as xml, so it's deterministic. Cheers _______________________________________________ talk mailing list [email protected] http://lists.openstreetmap.org/listinfo/talk

