Synchronization success stories?

2014-04-16 Thread Tilmann Singer
David Bremner  writes:
>> With a reused ssh connection this is sufficiently fast for me (<2s).  If
>> there is interest I can clean up the script of hardcoded paths etc. and
>> put it on github.
>
> Sure, sounds at least as good as what I am using. Also, syncmaildir
> recently did something pretty annoying for upward compatibility, so in
> the long term I'm interested in alternatives.

I've put the ruby script with a README on github:
https://github.com/til/notmuch-rsync


Til
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 489 bytes
Desc: not available
URL: 



Re: Synchronization success stories?

2014-04-15 Thread Tilmann Singer
David Bremner da...@tethera.net writes:
 With a reused ssh connection this is sufficiently fast for me (2s).  If
 there is interest I can clean up the script of hardcoded paths etc. and
 put it on github.

 Sure, sounds at least as good as what I am using. Also, syncmaildir
 recently did something pretty annoying for upward compatibility, so in
 the long term I'm interested in alternatives.

I've put the ruby script with a README on github:
https://github.com/til/notmuch-rsync


Til


pgpZ41nxhzQwK.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Synchronization success stories?

2014-04-13 Thread Tilmann Singer
David Mazieres  writes:
> What happens if you get a message that's been stuck in a queue for a few
> days and has an old Date: header?

It would be missed.  I have set the timespan to look backwards for new
mail to one month to be a bit safer against the stuck-in-queue cases,
but mails with older Date: headers would definitely get missed.

The current output of notmuch count "*" is the same on both the client
and the server, so it seems I didn't run into this problem yet (maybe I
was just lucky).

> Or if you get new messages that have
> the same Message-ID as old ones?

Is that even possible?  I thought that notmuch guarantees the uniqueness
of indexed message ids.  The only reference I could find without trying
to read the code was this thread id:87mwyz3s9d.fsf at star.eba from 2012,
which supports the assumption.

>> Synchronization of the notmuch tags database is only necessary when I
>> switch between different client computers, which happens less
>> frequently.
>
> Do you use a laptop everywhere?  I've found that for switching between
> my desktop machine at home, my laptop on the train, and my desktop at
> work (which amounts to five switches a day), the notmuch dump time is
> painfully slow--like well over 10 seconds for 100,000 messages.  Hook
> that into notmuch-poll and you have a recipe for hanging emacs every
> time you type "G".

I have one laptop and one desktop and switch between them almost daily,
and run a hibernate script that does notmuch dump + git push, and a
resume script that does git pull + notmuch restore.  For hibernate /
resume the speed of those operations is acceptable, but I wouldn't want
to incur that wait for every time checking for new mail.

Here is how long they take (on a machine with an SSD, which certainly
helps):

$ time notmuch dump --format=batch-tag | sort > /tmp/notmuch.dump
real0m3.643s
user0m3.593s
sys 0m0.140s
$ time notmuch restore < /tmp/notmuch.dump
real0m3.719s
user0m3.357s
sys 0m0.357s
$ notmuch count 
117118



Til
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 489 bytes
Desc: not available
URL: 



Synchronization success stories?

2014-04-13 Thread Tilmann Singer
I have experimented with offlineimap, syncmaildir and rsync.  The
append-only approach of notmuch makes synchronization of the mail corpus
simpler, so there are lots of options.  With ssh access to the server, I
found rsync to be conceptually the simplest, but it turned out to be too
slow for me (with ~110k mails) when frequently checking for new mails.

What I have settled with is a hacked together ruby script that uses the
notmuch command line both on the server and on the client to determine
unsynced mails, and then runs rsync explicitely for the necessary files.

The notmuch index on the server is only used to find new files for this
synchronization process, and is different from the notmuch indexes I
have on my client machines.

A prerequisite for this is of course ssh access and the ability to set
up notmuch on the server.

The steps performed on a sync run are roughly like this:

- local: notmuch new
- local: notmuch search --output=messages ..
- remote: notmuch new
- remote: notmuch search --output=messages ..
- compare search results
- run rsync for mails that only exist locally
  (using notmuch search --output=files to get the filenames)
- run rsync for mails that only exist remotely
  (using notmuch search --output=files to get the filenames)

With a reused ssh connection this is sufficiently fast for me (<2s).  If
there is interest I can clean up the script of hardcoded paths etc. and
put it on github.

Synchronization of the notmuch tags database is only necessary when I
switch between different client computers, which happens less
frequently. Like David I have a dump file committed to git for that. I
found it useful to sort the output before adding it to git, to avoid
huge unreadable diffs:

notmuch dump --format=batch-tag | sort > /path/to/notmuch.dump


Til
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 489 bytes
Desc: not available
URL: 



Re: Synchronization success stories?

2014-04-13 Thread Tilmann Singer
I have experimented with offlineimap, syncmaildir and rsync.  The
append-only approach of notmuch makes synchronization of the mail corpus
simpler, so there are lots of options.  With ssh access to the server, I
found rsync to be conceptually the simplest, but it turned out to be too
slow for me (with ~110k mails) when frequently checking for new mails.

What I have settled with is a hacked together ruby script that uses the
notmuch command line both on the server and on the client to determine
unsynced mails, and then runs rsync explicitely for the necessary files.

The notmuch index on the server is only used to find new files for this
synchronization process, and is different from the notmuch indexes I
have on my client machines.

A prerequisite for this is of course ssh access and the ability to set
up notmuch on the server.

The steps performed on a sync run are roughly like this:

- local: notmuch new
- local: notmuch search --output=messages some time ago..now
- remote: notmuch new
- remote: notmuch search --output=messages some time ago..now
- compare search results
- run rsync for mails that only exist locally
  (using notmuch search --output=files to get the filenames)
- run rsync for mails that only exist remotely
  (using notmuch search --output=files to get the filenames)

With a reused ssh connection this is sufficiently fast for me (2s).  If
there is interest I can clean up the script of hardcoded paths etc. and
put it on github.

Synchronization of the notmuch tags database is only necessary when I
switch between different client computers, which happens less
frequently. Like David I have a dump file committed to git for that. I
found it useful to sort the output before adding it to git, to avoid
huge unreadable diffs:

notmuch dump --format=batch-tag | sort  /path/to/notmuch.dump


Til


pgpkJPsZdB3wk.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Synchronization success stories?

2014-04-13 Thread Tilmann Singer
David Mazieres dm-list-email-notm...@scs.stanford.edu writes:
 What happens if you get a message that's been stuck in a queue for a few
 days and has an old Date: header?

It would be missed.  I have set the timespan to look backwards for new
mail to one month to be a bit safer against the stuck-in-queue cases,
but mails with older Date: headers would definitely get missed.

The current output of notmuch count * is the same on both the client
and the server, so it seems I didn't run into this problem yet (maybe I
was just lucky).

 Or if you get new messages that have
 the same Message-ID as old ones?

Is that even possible?  I thought that notmuch guarantees the uniqueness
of indexed message ids.  The only reference I could find without trying
to read the code was this thread id:87mwyz3s9d@star.eba from 2012,
which supports the assumption.

 Synchronization of the notmuch tags database is only necessary when I
 switch between different client computers, which happens less
 frequently.

 Do you use a laptop everywhere?  I've found that for switching between
 my desktop machine at home, my laptop on the train, and my desktop at
 work (which amounts to five switches a day), the notmuch dump time is
 painfully slow--like well over 10 seconds for 100,000 messages.  Hook
 that into notmuch-poll and you have a recipe for hanging emacs every
 time you type G.

I have one laptop and one desktop and switch between them almost daily,
and run a hibernate script that does notmuch dump + git push, and a
resume script that does git pull + notmuch restore.  For hibernate /
resume the speed of those operations is acceptable, but I wouldn't want
to incur that wait for every time checking for new mail.

Here is how long they take (on a machine with an SSD, which certainly
helps):

$ time notmuch dump --format=batch-tag | sort  /tmp/notmuch.dump
real0m3.643s
user0m3.593s
sys 0m0.140s
$ time notmuch restore  /tmp/notmuch.dump
real0m3.719s
user0m3.357s
sys 0m0.357s
$ notmuch count 
117118



Til


pgp9eSTUw7Arc.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch