muchsync files renames

2015-08-22 Thread Amadeusz Żołnowski
Hi,

I am testing muchsync-2 and it looks to me that files names across
machines are different.  Moreover when syncing again after
initialization it seems muchsync is working on something.  I have
canceled this and rerun muchsync.  notmuch reported lots of files
renames on server.  What and why it happens?


Kind regards,

-- 
Amadeusz Żołnowski


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-08-22 Thread David Mazieres
Amadeusz Żołnowski aide...@aidecoe.name writes:

 Hi,

 I am testing muchsync-2 and it looks to me that files names across
 machines are different.  Moreover when syncing again after
 initialization it seems muchsync is working on something.  I have
 canceled this and rerun muchsync.  notmuch reported lots of files
 renames on server.  What and why it happens?

What muchsync specifically synchronizes for messages in the mapping:

(directory, SHA-1-hash, link-count)

So if a directory contains two copies of a file on one machine, it will
end up with two copies on the other machine.  However, the file names
themselves are not the same, but rather are created in accordance with
the maildir spec.  (Note SHA-1 wouldn't be my first choice of hash
function, but notmuch already uses this for messages with long message
IDs, so I figured I'd just be consistent with existing practice.)

In terms of what muchsync is working on, you can run it with - on
both sides to get an idea, as in muchsync - server -.  Better
yet, you can just run it on one side with muchsync -.  You'll get
a lot of output, so maybe run it inside the script command to save the
output.maybe run it inside the script command to save the output.  If
you have enabled maildir.synchronize_flags, it could be that notmuch is
initially renaming all of your files, in which case muchsync needs to
re-hash them to make sure they haven't changed.

How did you cancel muchsync?  If you send it a single SIGINT or SIGTERM,
it attempts to clean up after itself.  However, upon multiple signals or
other signals, it immediately exits.  Muchsync is conservative about
updating the database, to avoid missing tags or files that have been
changed.  It always updates the notmuch database first, then its own
sqlite database with a version number.  That means if you kill muchsync,
some number of files may get picked up as changed again even though
really they were just copied from a peer.

To mitigate this problem, the muchsync client syncs the database every
10 seconds, so that in theory you should only get 10 seconds of extra
work from killing the client.  However, the server does not sync
periodically, on the assumption that it is more likely to read an EOF
than get killed, although currently it doesn't appear to commit any
pending transactions to the sqlite database upon EOF, which may be an
oversight.

So to summarize:

  * File names are not the same across machine, only file contents and
directory structure.

  * Give muchsync lots of -v options to see what it is doing.

  * Try to avoid killing muchsync.  Doing so is safe, but likely to
generate extra work in the form of phantom renames or tag changes
that get synchronized even though they don't need to be.

  * Possibly the server should handle EOF more gracefully and commit any
pending transactions, or the client should periodically send a
commit command to the server.

If you think something is wrong, I can help you figure it out, but I
need to know what maildir.synchronize_flags is set to on each replica,
what you mean by canceled, and roughly what was happening when you
canceled (uploading or downloading).

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch