Synchronization success stories?

2014-04-11 Thread Brian Sniffen
I'm thrilled by using notmuch to manage my mail.  Low-latency search is
very important to me.  But I use computers in a couple of
places---several of which are laptops.  Has anyone stories to share of
successful multi-computer notmuch sync, for a corpus of a
quarter-million messages or so?  

I've tried offlineimap---it (and my Exchange sever) get grouchy with
mailboxes of that size.  I tried keeping ~/Maildir/ in Google Drive; it
took weeks to do the initial sync and I gave up.

I'm trying bittorrent-sync now, with no obivous failures.

-Brian

-- 
Brian Sniffen
Information Security
Akamai Technologies
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


notmuch-hello buffer slow doe to slow query

2014-04-11 Thread Nils Dagsson Moskopp
Hello,

If notmuch-hello includes a saved search with a slow query, switching to
a notmuch-hello buffer is very slow due to notmuch-mode updating counts
for search results. mjw1009 suggested "(setq notmuch-hello-auto-refresh
nil)", which stops the counting and works around the problem.

Fundamentally, the problem is a slow query. On my laptop (Thinkpad T60),
many things are pretty much instant, even though I have a HDD, no SSD:

> ; time notmuch count 'tag:inbox and tag:list'
> 25452
> 0.02user 0.00system 0:00.03elapsed 72%CPU (0avgtext+0avgdata 3852maxresident)k
> 0inputs+0outputs (0major+1135minor)pagefaults 0swaps

However, from-queries take their time:

> ; time notmuch count 'not tag:replied and to:n...@dieweltistgarnichtso.net'
> 5328
> 0.10user 0.15system 0:14.14elapsed 1%CPU (0avgtext+0avgdata 3472maxresident)k
> 157544inputs+0outputs (0major+1039minor)pagefaults 0swaps

mjw1009 can reproduce if the from-query contains an "@" and thinks the
problem may be "something deeper down in notmuch (actually probably in
xapian)".


Greetings,
-- 
Nils Dagsson Moskopp // erlehmann



pgpxCpPcJjxH7.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Add configurable changed tag to messages that have been changed on disk

2014-04-11 Thread David Bremner
dm-list-email-notm...@scs.stanford.edu writes:

> Gaute Hope  writes:

> Exactly.  It could be a tick, or just the current time of day if your
> clock does not go backwards.  (I'd be willing to do a full scan if the
> clock ever goes backwards.)  The advantage of time is that you don't
> have to synchronously update some counter.

I think I'd lean towards global time so that one could use it to resolve
conflicts between changes to multiple copies of the database.

> Making sure the write-operations update the time should be easy.  Most
> or all of the changes are probably funneled through
> _notmuch_message_sync.  Worst case, there are only 9 places in the
> source code that make use of a Xapian:WritableDatabase, so I'm pretty
> confident total changes wouldn't be much more than 50 lines of code.

Maybe. Don't forget upgrading the database, updating the test suite, and
presumably some changes to the CLI so the new mtime can actually be
used. Not to be discouraging ;).

> I would do it myself if there were any kind of indication that such a
> change could be upstreamed.  I brought this up in January, 2011, and
> didn't get a huge amount of interest in the ctime idea.  But I was also
> a lot less focused on what I needed.  Now that I have a working
> distributed setup and am actually using notmuch for my mail, I have a
> much better understanding of what is needed.

In the ensuing time, nothing better has developed for tag
synchronization (my pet use case) so maybe it's time to pursue this
again.  It would be good to have some preliminary idea about the time
and space costs of adding document mtimes.  I guess database bloat
should not be too bad, since it's only 64bits (?) per mail message.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Synchronization success stories?

2014-04-11 Thread David Bremner
Brian Sniffen  writes:

> I'm thrilled by using notmuch to manage my mail.  Low-latency search is
> very important to me.  But I use computers in a couple of
> places---several of which are laptops.  Has anyone stories to share of
> successful multi-computer notmuch sync, for a corpus of a
> quarter-million messages or so?  

I use syncmaildir to sync the actual messages, and a copy of the output
of "notmuch dump" in git to sync the metadata.

It works OK. A bit slow; depends how often you need to fetch new mail.

d
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] News for emacs saved-searches change.

2014-04-11 Thread Tomi Ollila
On Wed, Apr 09 2014, Mark Walters  wrote:

> ---
> The important point is that the changed search variable is not forward
> compatible (it *is* backwards compatible): that is previous version of
> notmuch-emacs will be unusable with a new style notmuch-saved-search
> variable.

the above part could be before '---' so that it is added to the commit
message, too. 

>
> Best wishes
>
> Mark
>
>
>
>  NEWS |   17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/NEWS b/NEWS
> index d4f4ea4..8aa4182 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -15,6 +15,23 @@ Command-Line Interface
>  Emacs Interface
>  ---
>  
> +Changed format for saved searches.
> +
> +  The format for `notmuch-saved-searches` has changed, but old style
> +  saved searches are still supported. The new style means that a saved
> +  search can store the desired sort order for the search, and it can
> +  store a separate query to use for generating the count notmuch
> +  shows.
> +
> +  The variable is fully customizable and any configuration done
> +  through customize should `just work', with the additional options

I'm afraid the `just work' work badly when contained in markdown page
(perhaps *just work*?).

Tomi


> +  mentioned above. For manual customization see the documentation for
> +  `notmuch-saved-searches`.
> +
> +  IMPORTANT: a new style notmuch-saved-searches variable will break
> +  previous versions of notmuch-emacs (even search will not work); to
> +  fix remove the customization for notmuch-saved-searches.
> +
>  Bug fix for saved searches with newlines in them.
>  
>Split lines confuse `notmuch count --batch`, so we remove embedded
> -- 
> 1.7.10.4
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH v2 0/5] emacs: hello: convert saved-searches to plists

2014-04-11 Thread David Bremner
Mark Walters  writes:

> This is v2 of the series; v1 is at
> id:1396733065-32602-1-git-send-email-markwalters1...@gmail.com

pushed,

d
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] News for emacs saved-searches change.

2014-04-11 Thread David Bremner
Mark Walters  writes:

> ---
> The important point is that the changed search variable is not forward
> compatible (it *is* backwards compatible): that is previous version of
> notmuch-emacs will be unusable with a new style notmuch-saved-search
> variable.

pushed, with that paragraph as commit message
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] News for emacs saved-searches change.

2014-04-11 Thread David Bremner
Tomi Ollila  writes:
>
> I'm afraid the `just work' work badly when contained in markdown page
> (perhaps *just work*?).
>

sorry missed that. Care to fixup my mess? ;)

d
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Add configurable changed tag to messages that have been changed on disk

2014-04-11 Thread dm-list-email-notmuch
David Bremner  writes:

>> Exactly.  It could be a tick, or just the current time of day if your
>> clock does not go backwards.  (I'd be willing to do a full scan if the
>> clock ever goes backwards.)  The advantage of time is that you don't
>> have to synchronously update some counter.
>
> I think I'd lean towards global time so that one could use it to resolve
> conflicts between changes to multiple copies of the database.

I, too, would prefer to use time.  However, I'm doubtful it would help
resolve conflicts.  On the plus side, I'm not sure it is even needed to
resolve conflicts.  My mail synchronizer has an algorithm for resolving
conflicts that always works without human intervention and in my limited
experience does exactly what I want:

   * If there's a conflict between two replicas, ensure that each
 maildir ends up with the maximum number of the number copies of the
 message in each of the two databases being reconciled.  [Example:
 If replica A deletes a message and replica B moves it from folder
 INBOX to folder SPAM, you end up with a copy in spam.  If replica A
 moves a message to folder IMPORTANT and replica B moves it to SPAM,
 then you get two hard links to the same file, one in IMPORTANT and
 one in SPAM.]

   * If there's a conflict and two replicas have different tags on the
 same message, then the tags in notmuch's new.tags directive get
 logically ANDed, while all other tags get logically ORed.

Granted, I've only been using this system for a week.  On the other
hand, all I was doing was starting to test something I had written, yet
it ended up being so much better than my old system that I couldn't go
back and ended up using my system in production far earlier than
anticipated...

>> Making sure the write-operations update the time should be easy.  Most
>> or all of the changes are probably funneled through
>> _notmuch_message_sync.  Worst case, there are only 9 places in the
>> source code that make use of a Xapian:WritableDatabase, so I'm pretty
>> confident total changes wouldn't be much more than 50 lines of code.
>
> Maybe. Don't forget upgrading the database, updating the test suite, and
> presumably some changes to the CLI so the new mtime can actually be
> used. Not to be discouraging ;).

The CLI is trivial.  We'll just add another search keyword ctime
analogous to date.

As far as updating the test suite, etc., it's almost certain that the
core notmuch developers would be unsatisfied with whatever I've done,
since the code base is very clean and has a very uniform style.  So when
I say I'd want some "indication that such a change could be upstreamed,"
I mean more specifically that someone would be willing to shepherd the
process of getting the code into shape.

> In the ensuing time, nothing better has developed for tag
> synchronization (my pet use case) so maybe it's time to pursue this
> again.

I do have something pretty good for tag synchronization.  It requires a
full database scan each time to detect changes, but I've heavily
optimized it to be very fast by skipping over the notmuch library and
directly scanning the underlying Xapian Btrees.  Currently my bottleneck
is indexing messages (e.g., running notmuch new or calling
notmuch_database_add_message), which are painfully slow on 32-bit
machines.  (Unfortunately my mail server is a 32-bit machine.)

To give you an idea, on a 32 bit machine, if I get a handful of new mail
(e.g., 6 messages), running "notmuch new" takes 19 seconds, while
scanning the database to check for renames and changed tags adds another
1.4 seconds.  On a 64-bit machine, "notmuch new" might take 1 second,
while scanning the database adds 350 msec.

So full database scan's might not be the end of the world.  The biggest
performance bottleneck at this point is notmuch's painful indexing
performance.  It kills me that it takes 10 minutes to index 100,000 mail
messages on a 16-core machine with 48 GiB of RAM.  But the library is
non-reentrant and allocates thread IDs in such a way that it's hard to
create parallel databases and later merge them.  Basically I can't
figure out how to make productive use of more than one CPU core even
when synchronizing across 1GB Ethernet!

It's pretty beta, but my intention is to open-source my code, so glad
for beta testers if you are interested in testing tag synchronization.

> It would be good to have some preliminary idea about the time
> and space costs of adding document mtimes.  I guess database bloat
> should not be too bad, since it's only 64bits (?) per mail message.

Plus a Btree to index it, so figure at least 24 bytes per message.
Another issue is that values are always brought into memory with a
document, so it will consume more RAM.  But yeah, I don't think it
should be that bad.

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Synchronization success stories?

2014-04-11 Thread David Mazieres
David Bremner  writes:

> Brian Sniffen  writes:
>
>> I'm thrilled by using notmuch to manage my mail.  Low-latency search is
>> very important to me.  But I use computers in a couple of
>> places---several of which are laptops.  Has anyone stories to share of
>> successful multi-computer notmuch sync, for a corpus of a
>> quarter-million messages or so?  
>
> I use syncmaildir to sync the actual messages, and a copy of the output
> of "notmuch dump" in git to sync the metadata.
>
> It works OK. A bit slow; depends how often you need to fetch new mail.

If you want to see my solution, it is here:

http://www.scs.stanford.edu/~dm/muchsync-0.tar.gz

I'm a little embarrassed by this code, as I just started to test it a
week ago then instantly became completely dependent on it.  I will
probably change the name (from muchsync to syncmuch) and the database
format before releasing.  But if you feel like beta-testing and giving
me feedback, have a look.

Beware that if you have been using notmuch dump, you may become
instantly hooked on my solution...

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


notmuch-hello buffer slow doe to slow query

2014-04-11 Thread Nils Dagsson Moskopp
Hello,

If notmuch-hello includes a saved search with a slow query, switching to
a notmuch-hello buffer is very slow due to notmuch-mode updating counts
for search results. mjw1009 suggested "(setq notmuch-hello-auto-refresh
nil)", which stops the counting and works around the problem.

Fundamentally, the problem is a slow query. On my laptop (Thinkpad T60),
many things are pretty much instant, even though I have a HDD, no SSD:

> ; time notmuch count 'tag:inbox and tag:list'
> 25452
> 0.02user 0.00system 0:00.03elapsed 72%CPU (0avgtext+0avgdata 3852maxresident)k
> 0inputs+0outputs (0major+1135minor)pagefaults 0swaps

However, from-queries take their time:

> ; time notmuch count 'not tag:replied and to:nils at dieweltistgarnichtso.net'
> 5328
> 0.10user 0.15system 0:14.14elapsed 1%CPU (0avgtext+0avgdata 3472maxresident)k
> 157544inputs+0outputs (0major+1039minor)pagefaults 0swaps

mjw1009 can reproduce if the from-query contains an "@" and thinks the
problem may be "something deeper down in notmuch (actually probably in
xapian)".


Greetings,
-- 
Nils Dagsson Moskopp // erlehmann
<http://dieweltistgarnichtso.net>
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20140411/1b60ff05/attachment.pgp>


[PATCH] Add configurable changed tag to messages that have been changed on disk

2014-04-11 Thread David Bremner
dm-list-email-notmuch at scs.stanford.edu writes:

> Gaute Hope  writes:

> Exactly.  It could be a tick, or just the current time of day if your
> clock does not go backwards.  (I'd be willing to do a full scan if the
> clock ever goes backwards.)  The advantage of time is that you don't
> have to synchronously update some counter.

I think I'd lean towards global time so that one could use it to resolve
conflicts between changes to multiple copies of the database.

> Making sure the write-operations update the time should be easy.  Most
> or all of the changes are probably funneled through
> _notmuch_message_sync.  Worst case, there are only 9 places in the
> source code that make use of a Xapian:WritableDatabase, so I'm pretty
> confident total changes wouldn't be much more than 50 lines of code.

Maybe. Don't forget upgrading the database, updating the test suite, and
presumably some changes to the CLI so the new mtime can actually be
used. Not to be discouraging ;).

> I would do it myself if there were any kind of indication that such a
> change could be upstreamed.  I brought this up in January, 2011, and
> didn't get a huge amount of interest in the ctime idea.  But I was also
> a lot less focused on what I needed.  Now that I have a working
> distributed setup and am actually using notmuch for my mail, I have a
> much better understanding of what is needed.

In the ensuing time, nothing better has developed for tag
synchronization (my pet use case) so maybe it's time to pursue this
again.  It would be good to have some preliminary idea about the time
and space costs of adding document mtimes.  I guess database bloat
should not be too bad, since it's only 64bits (?) per mail message.


Synchronization success stories?

2014-04-11 Thread David Bremner
Brian Sniffen  writes:

> I'm thrilled by using notmuch to manage my mail.  Low-latency search is
> very important to me.  But I use computers in a couple of
> places---several of which are laptops.  Has anyone stories to share of
> successful multi-computer notmuch sync, for a corpus of a
> quarter-million messages or so?  

I use syncmaildir to sync the actual messages, and a copy of the output
of "notmuch dump" in git to sync the metadata.

It works OK. A bit slow; depends how often you need to fetch new mail.

d


[PATCH] News for emacs saved-searches change.

2014-04-11 Thread Tomi Ollila
On Wed, Apr 09 2014, Mark Walters  wrote:

> ---
> The important point is that the changed search variable is not forward
> compatible (it *is* backwards compatible): that is previous version of
> notmuch-emacs will be unusable with a new style notmuch-saved-search
> variable.

the above part could be before '---' so that it is added to the commit
message, too. 

>
> Best wishes
>
> Mark
>
>
>
>  NEWS |   17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/NEWS b/NEWS
> index d4f4ea4..8aa4182 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -15,6 +15,23 @@ Command-Line Interface
>  Emacs Interface
>  ---
>  
> +Changed format for saved searches.
> +
> +  The format for `notmuch-saved-searches` has changed, but old style
> +  saved searches are still supported. The new style means that a saved
> +  search can store the desired sort order for the search, and it can
> +  store a separate query to use for generating the count notmuch
> +  shows.
> +
> +  The variable is fully customizable and any configuration done
> +  through customize should `just work', with the additional options

I'm afraid the `just work' work badly when contained in markdown page
(perhaps *just work*?).

Tomi


> +  mentioned above. For manual customization see the documentation for
> +  `notmuch-saved-searches`.
> +
> +  IMPORTANT: a new style notmuch-saved-searches variable will break
> +  previous versions of notmuch-emacs (even search will not work); to
> +  fix remove the customization for notmuch-saved-searches.
> +
>  Bug fix for saved searches with newlines in them.
>  
>Split lines confuse `notmuch count --batch`, so we remove embedded
> -- 
> 1.7.10.4


[PATCH v2 0/5] emacs: hello: convert saved-searches to plists

2014-04-11 Thread David Bremner
Mark Walters  writes:

> This is v2 of the series; v1 is at
> id:1396733065-32602-1-git-send-email-markwalters1009 at gmail.com

pushed,

d


[PATCH] News for emacs saved-searches change.

2014-04-11 Thread David Bremner
Mark Walters  writes:

> ---
> The important point is that the changed search variable is not forward
> compatible (it *is* backwards compatible): that is previous version of
> notmuch-emacs will be unusable with a new style notmuch-saved-search
> variable.

pushed, with that paragraph as commit message


[PATCH] News for emacs saved-searches change.

2014-04-11 Thread David Bremner
Tomi Ollila  writes:
>
> I'm afraid the `just work' work badly when contained in markdown page
> (perhaps *just work*?).
>

sorry missed that. Care to fixup my mess? ;)

d


[PATCH] Add configurable changed tag to messages that have been changed on disk

2014-04-11 Thread dm-list-email-notm...@scs.stanford.edu
David Bremner  writes:

>> Exactly.  It could be a tick, or just the current time of day if your
>> clock does not go backwards.  (I'd be willing to do a full scan if the
>> clock ever goes backwards.)  The advantage of time is that you don't
>> have to synchronously update some counter.
>
> I think I'd lean towards global time so that one could use it to resolve
> conflicts between changes to multiple copies of the database.

I, too, would prefer to use time.  However, I'm doubtful it would help
resolve conflicts.  On the plus side, I'm not sure it is even needed to
resolve conflicts.  My mail synchronizer has an algorithm for resolving
conflicts that always works without human intervention and in my limited
experience does exactly what I want:

   * If there's a conflict between two replicas, ensure that each
 maildir ends up with the maximum number of the number copies of the
 message in each of the two databases being reconciled.  [Example:
 If replica A deletes a message and replica B moves it from folder
 INBOX to folder SPAM, you end up with a copy in spam.  If replica A
 moves a message to folder IMPORTANT and replica B moves it to SPAM,
 then you get two hard links to the same file, one in IMPORTANT and
 one in SPAM.]

   * If there's a conflict and two replicas have different tags on the
 same message, then the tags in notmuch's new.tags directive get
 logically ANDed, while all other tags get logically ORed.

Granted, I've only been using this system for a week.  On the other
hand, all I was doing was starting to test something I had written, yet
it ended up being so much better than my old system that I couldn't go
back and ended up using my system in production far earlier than
anticipated...

>> Making sure the write-operations update the time should be easy.  Most
>> or all of the changes are probably funneled through
>> _notmuch_message_sync.  Worst case, there are only 9 places in the
>> source code that make use of a Xapian:WritableDatabase, so I'm pretty
>> confident total changes wouldn't be much more than 50 lines of code.
>
> Maybe. Don't forget upgrading the database, updating the test suite, and
> presumably some changes to the CLI so the new mtime can actually be
> used. Not to be discouraging ;).

The CLI is trivial.  We'll just add another search keyword ctime
analogous to date.

As far as updating the test suite, etc., it's almost certain that the
core notmuch developers would be unsatisfied with whatever I've done,
since the code base is very clean and has a very uniform style.  So when
I say I'd want some "indication that such a change could be upstreamed,"
I mean more specifically that someone would be willing to shepherd the
process of getting the code into shape.

> In the ensuing time, nothing better has developed for tag
> synchronization (my pet use case) so maybe it's time to pursue this
> again.

I do have something pretty good for tag synchronization.  It requires a
full database scan each time to detect changes, but I've heavily
optimized it to be very fast by skipping over the notmuch library and
directly scanning the underlying Xapian Btrees.  Currently my bottleneck
is indexing messages (e.g., running notmuch new or calling
notmuch_database_add_message), which are painfully slow on 32-bit
machines.  (Unfortunately my mail server is a 32-bit machine.)

To give you an idea, on a 32 bit machine, if I get a handful of new mail
(e.g., 6 messages), running "notmuch new" takes 19 seconds, while
scanning the database to check for renames and changed tags adds another
1.4 seconds.  On a 64-bit machine, "notmuch new" might take 1 second,
while scanning the database adds 350 msec.

So full database scan's might not be the end of the world.  The biggest
performance bottleneck at this point is notmuch's painful indexing
performance.  It kills me that it takes 10 minutes to index 100,000 mail
messages on a 16-core machine with 48 GiB of RAM.  But the library is
non-reentrant and allocates thread IDs in such a way that it's hard to
create parallel databases and later merge them.  Basically I can't
figure out how to make productive use of more than one CPU core even
when synchronizing across 1GB Ethernet!

It's pretty beta, but my intention is to open-source my code, so glad
for beta testers if you are interested in testing tag synchronization.

> It would be good to have some preliminary idea about the time
> and space costs of adding document mtimes.  I guess database bloat
> should not be too bad, since it's only 64bits (?) per mail message.

Plus a Btree to index it, so figure at least 24 bytes per message.
Another issue is that values are always brought into memory with a
document, so it will consume more RAM.  But yeah, I don't think it
should be that bad.

David


Synchronization success stories?

2014-04-11 Thread David Mazieres
David Bremner  writes:

> Brian Sniffen  writes:
>
>> I'm thrilled by using notmuch to manage my mail.  Low-latency search is
>> very important to me.  But I use computers in a couple of
>> places---several of which are laptops.  Has anyone stories to share of
>> successful multi-computer notmuch sync, for a corpus of a
>> quarter-million messages or so?  
>
> I use syncmaildir to sync the actual messages, and a copy of the output
> of "notmuch dump" in git to sync the metadata.
>
> It works OK. A bit slow; depends how often you need to fetch new mail.

If you want to see my solution, it is here:

http://www.scs.stanford.edu/~dm/muchsync-0.tar.gz

I'm a little embarrassed by this code, as I just started to test it a
week ago then instantly became completely dependent on it.  I will
probably change the name (from muchsync to syncmuch) and the database
format before releasing.  But if you feel like beta-testing and giving
me feedback, have a look.

Beware that if you have been using notmuch dump, you may become
instantly hooked on my solution...

David