[PATCH] News for emacs saved-searches change.

2014-04-11 Thread Tomi Ollila
On Wed, Apr 09 2014, Mark Walters  wrote:

> ---
> The important point is that the changed search variable is not forward
> compatible (it *is* backwards compatible): that is previous version of
> notmuch-emacs will be unusable with a new style notmuch-saved-search
> variable.

the above part could be before '---' so that it is added to the commit
message, too. 

>
> Best wishes
>
> Mark
>
>
>
>  NEWS |   17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/NEWS b/NEWS
> index d4f4ea4..8aa4182 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -15,6 +15,23 @@ Command-Line Interface
>  Emacs Interface
>  ---
>  
> +Changed format for saved searches.
> +
> +  The format for `notmuch-saved-searches` has changed, but old style
> +  saved searches are still supported. The new style means that a saved
> +  search can store the desired sort order for the search, and it can
> +  store a separate query to use for generating the count notmuch
> +  shows.
> +
> +  The variable is fully customizable and any configuration done
> +  through customize should `just work', with the additional options

I'm afraid the `just work' work badly when contained in markdown page
(perhaps *just work*?).

Tomi


> +  mentioned above. For manual customization see the documentation for
> +  `notmuch-saved-searches`.
> +
> +  IMPORTANT: a new style notmuch-saved-searches variable will break
> +  previous versions of notmuch-emacs (even search will not work); to
> +  fix remove the customization for notmuch-saved-searches.
> +
>  Bug fix for saved searches with newlines in them.
>  
>Split lines confuse `notmuch count --batch`, so we remove embedded
> -- 
> 1.7.10.4


[PATCH] News for emacs saved-searches change.

2014-04-11 Thread David Bremner
Tomi Ollila  writes:
>
> I'm afraid the `just work' work badly when contained in markdown page
> (perhaps *just work*?).
>

sorry missed that. Care to fixup my mess? ;)

d


notmuch-hello buffer slow doe to slow query

2014-04-11 Thread Nils Dagsson Moskopp
Hello,

If notmuch-hello includes a saved search with a slow query, switching to
a notmuch-hello buffer is very slow due to notmuch-mode updating counts
for search results. mjw1009 suggested "(setq notmuch-hello-auto-refresh
nil)", which stops the counting and works around the problem.

Fundamentally, the problem is a slow query. On my laptop (Thinkpad T60),
many things are pretty much instant, even though I have a HDD, no SSD:

> ; time notmuch count 'tag:inbox and tag:list'
> 25452
> 0.02user 0.00system 0:00.03elapsed 72%CPU (0avgtext+0avgdata 3852maxresident)k
> 0inputs+0outputs (0major+1135minor)pagefaults 0swaps

However, from-queries take their time:

> ; time notmuch count 'not tag:replied and to:nils at dieweltistgarnichtso.net'
> 5328
> 0.10user 0.15system 0:14.14elapsed 1%CPU (0avgtext+0avgdata 3472maxresident)k
> 157544inputs+0outputs (0major+1039minor)pagefaults 0swaps

mjw1009 can reproduce if the from-query contains an "@" and thinks the
problem may be "something deeper down in notmuch (actually probably in
xapian)".


Greetings,
-- 
Nils Dagsson Moskopp // erlehmann
<http://dieweltistgarnichtso.net>
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20140411/1b60ff05/attachment.pgp>


Synchronization success stories?

2014-04-11 Thread David Mazieres
David Bremner  writes:

> Brian Sniffen  writes:
>
>> I'm thrilled by using notmuch to manage my mail.  Low-latency search is
>> very important to me.  But I use computers in a couple of
>> places---several of which are laptops.  Has anyone stories to share of
>> successful multi-computer notmuch sync, for a corpus of a
>> quarter-million messages or so?  
>
> I use syncmaildir to sync the actual messages, and a copy of the output
> of "notmuch dump" in git to sync the metadata.
>
> It works OK. A bit slow; depends how often you need to fetch new mail.

If you want to see my solution, it is here:

http://www.scs.stanford.edu/~dm/muchsync-0.tar.gz

I'm a little embarrassed by this code, as I just started to test it a
week ago then instantly became completely dependent on it.  I will
probably change the name (from muchsync to syncmuch) and the database
format before releasing.  But if you feel like beta-testing and giving
me feedback, have a look.

Beware that if you have been using notmuch dump, you may become
instantly hooked on my solution...

David


[PATCH v2 0/5] emacs: hello: convert saved-searches to plists

2014-04-11 Thread David Bremner
Mark Walters  writes:

> This is v2 of the series; v1 is at
> id:1396733065-32602-1-git-send-email-markwalters1009 at gmail.com

pushed,

d


[PATCH] Add configurable changed tag to messages that have been changed on disk

2014-04-11 Thread dm-list-email-notm...@scs.stanford.edu
David Bremner  writes:

>> Exactly.  It could be a tick, or just the current time of day if your
>> clock does not go backwards.  (I'd be willing to do a full scan if the
>> clock ever goes backwards.)  The advantage of time is that you don't
>> have to synchronously update some counter.
>
> I think I'd lean towards global time so that one could use it to resolve
> conflicts between changes to multiple copies of the database.

I, too, would prefer to use time.  However, I'm doubtful it would help
resolve conflicts.  On the plus side, I'm not sure it is even needed to
resolve conflicts.  My mail synchronizer has an algorithm for resolving
conflicts that always works without human intervention and in my limited
experience does exactly what I want:

   * If there's a conflict between two replicas, ensure that each
 maildir ends up with the maximum number of the number copies of the
 message in each of the two databases being reconciled.  [Example:
 If replica A deletes a message and replica B moves it from folder
 INBOX to folder SPAM, you end up with a copy in spam.  If replica A
 moves a message to folder IMPORTANT and replica B moves it to SPAM,
 then you get two hard links to the same file, one in IMPORTANT and
 one in SPAM.]

   * If there's a conflict and two replicas have different tags on the
 same message, then the tags in notmuch's new.tags directive get
 logically ANDed, while all other tags get logically ORed.

Granted, I've only been using this system for a week.  On the other
hand, all I was doing was starting to test something I had written, yet
it ended up being so much better than my old system that I couldn't go
back and ended up using my system in production far earlier than
anticipated...

>> Making sure the write-operations update the time should be easy.  Most
>> or all of the changes are probably funneled through
>> _notmuch_message_sync.  Worst case, there are only 9 places in the
>> source code that make use of a Xapian:WritableDatabase, so I'm pretty
>> confident total changes wouldn't be much more than 50 lines of code.
>
> Maybe. Don't forget upgrading the database, updating the test suite, and
> presumably some changes to the CLI so the new mtime can actually be
> used. Not to be discouraging ;).

The CLI is trivial.  We'll just add another search keyword ctime
analogous to date.

As far as updating the test suite, etc., it's almost certain that the
core notmuch developers would be unsatisfied with whatever I've done,
since the code base is very clean and has a very uniform style.  So when
I say I'd want some "indication that such a change could be upstreamed,"
I mean more specifically that someone would be willing to shepherd the
process of getting the code into shape.

> In the ensuing time, nothing better has developed for tag
> synchronization (my pet use case) so maybe it's time to pursue this
> again.

I do have something pretty good for tag synchronization.  It requires a
full database scan each time to detect changes, but I've heavily
optimized it to be very fast by skipping over the notmuch library and
directly scanning the underlying Xapian Btrees.  Currently my bottleneck
is indexing messages (e.g., running notmuch new or calling
notmuch_database_add_message), which are painfully slow on 32-bit
machines.  (Unfortunately my mail server is a 32-bit machine.)

To give you an idea, on a 32 bit machine, if I get a handful of new mail
(e.g., 6 messages), running "notmuch new" takes 19 seconds, while
scanning the database to check for renames and changed tags adds another
1.4 seconds.  On a 64-bit machine, "notmuch new" might take 1 second,
while scanning the database adds 350 msec.

So full database scan's might not be the end of the world.  The biggest
performance bottleneck at this point is notmuch's painful indexing
performance.  It kills me that it takes 10 minutes to index 100,000 mail
messages on a 16-core machine with 48 GiB of RAM.  But the library is
non-reentrant and allocates thread IDs in such a way that it's hard to
create parallel databases and later merge them.  Basically I can't
figure out how to make productive use of more than one CPU core even
when synchronizing across 1GB Ethernet!

It's pretty beta, but my intention is to open-source my code, so glad
for beta testers if you are interested in testing tag synchronization.

> It would be good to have some preliminary idea about the time
> and space costs of adding document mtimes.  I guess database bloat
> should not be too bad, since it's only 64bits (?) per mail message.

Plus a Btree to index it, so figure at least 24 bytes per message.
Another issue is that values are always brought into memory with a
document, so it will consume more RAM.  But yeah, I don't think it
should be that bad.

David


Synchronization success stories?

2014-04-11 Thread David Bremner
Brian Sniffen  writes:

> I'm thrilled by using notmuch to manage my mail.  Low-latency search is
> very important to me.  But I use computers in a couple of
> places---several of which are laptops.  Has anyone stories to share of
> successful multi-computer notmuch sync, for a corpus of a
> quarter-million messages or so?  

I use syncmaildir to sync the actual messages, and a copy of the output
of "notmuch dump" in git to sync the metadata.

It works OK. A bit slow; depends how often you need to fetch new mail.

d


[PATCH] Add configurable changed tag to messages that have been changed on disk

2014-04-11 Thread David Bremner
dm-list-email-notmuch at scs.stanford.edu writes:

> Gaute Hope  writes:

> Exactly.  It could be a tick, or just the current time of day if your
> clock does not go backwards.  (I'd be willing to do a full scan if the
> clock ever goes backwards.)  The advantage of time is that you don't
> have to synchronously update some counter.

I think I'd lean towards global time so that one could use it to resolve
conflicts between changes to multiple copies of the database.

> Making sure the write-operations update the time should be easy.  Most
> or all of the changes are probably funneled through
> _notmuch_message_sync.  Worst case, there are only 9 places in the
> source code that make use of a Xapian:WritableDatabase, so I'm pretty
> confident total changes wouldn't be much more than 50 lines of code.

Maybe. Don't forget upgrading the database, updating the test suite, and
presumably some changes to the CLI so the new mtime can actually be
used. Not to be discouraging ;).

> I would do it myself if there were any kind of indication that such a
> change could be upstreamed.  I brought this up in January, 2011, and
> didn't get a huge amount of interest in the ctime idea.  But I was also
> a lot less focused on what I needed.  Now that I have a working
> distributed setup and am actually using notmuch for my mail, I have a
> much better understanding of what is needed.

In the ensuing time, nothing better has developed for tag
synchronization (my pet use case) so maybe it's time to pursue this
again.  It would be good to have some preliminary idea about the time
and space costs of adding document mtimes.  I guess database bloat
should not be too bad, since it's only 64bits (?) per mail message.


[PATCH v4 2/3] emacs: add notmuch-version.el.tmpl and create notmuch-version.el from it

2014-04-11 Thread David Bremner
Tomi Ollila  writes:

> The notmuch cli program and emacs lisp versions may differ (especially
> in remote usage). It helps to resolve problems if we can determine
> the versions of notmuch cli and notmuch emacs mua separately.
>
> The build process now creates notmuch-version.el from template file
> by filling the version info to notmuch-emacs-version variable.
> ---
>
> Alternative to id:1395261431-24668-2-git-send-email-tomi.ollila at iki.fi
> only change being in notmuch-emacs-version docstring.

pushed this series with the two alternate patches

d


[PATCH] Add configurable changed tag to messages that have been changed on disk

2014-04-11 Thread Gaute Hope
Excerpts from dm-list-email-notmuch's message of 2014-04-10 17:31:04 +0200:
> Gaute Hope  writes:
>
> >> A better approach would be to add a new "modtime" xapian value that is
> >> updated whenever the tags or any other terms (such as XFDIRENTRY) are
> >> added to or deleted from a docid.  If it's a Xapian value, rather than a
> >> term, then modtime will be queriable just like date, allowing multiple
> >> applications to query all docids modified since the last time they ran.
> >>
> >> [... snip]
> >
> > This could also solve it, and probably have more uses. I don't quite see
> > how the opposite problem (for my use case) can be solved by this without
> > using a 'localchange' tag. This is to sync tag to maildir sync, when a
> > new tag has been added (by e.g. a user interaction in a client) it needs
> > to be copied to the maildir, if it is not done in the same go a
> > different application won't know whether the change was local or remote.
> > How did you solve this?
>
> Why don't you just set maildir.synchronize_flags=true?  When I
> synchronize mail across machines, I start by concurrently running
> "notmuch new" on both the local and remote machines, which picks up all
> the changed maildir flags.  Then I synchronize the mail and the tags
> between the two maildirs.  If maildir.synchronize=true, then atomically
> with setting the new tags I call notmuch_message_tags_to_maildir_flags()
> to sync the new tags to the maildir.

I am talking about syncing tags to a maildir _folder_, not flags. It
could be implemented as maildir.synchronize is now, but it would be a
larger feature which could work in a lot of different ways.

> The maildir flags question seems kind of independent of what we are
> talking about, which is just having an incremental way of examining the
> database.  Right now, I have to scan everything to find tags that have
> changed since the last synchronization event.  If I had modtime (or
> really it should be called "ctime", like inode change time), then I
> could look at only the few messages that changed, and it would probably
> shave 250msec off polling new mail for a 100,000-message maildir.
>
> Note you can't use the file system ctime/mtime because the file system
> may have changed since the last time you ran notmuch new.

If you have a unreliable clock or use a badly configured system you
could risk detecting changes in the case where application time stamp is
set in the future, a mod time now. Then the app won't know there has
been a change. The same could happen if the clock is in the past, and
the modtime is set, the clock is updated and the app won't know there
has been a change.

The only way to know is to do a full scan of the entire db. This could
be very expansive, and comparable to initial indexing, for some actions.

You would not necessarily, or reliably, be able to detect this.

With an internal tick this wouldn't be an issue.

> > I would suggest using a Xapian- or Index-time which gets a tick
> > everytime a modification is made to the index.
>
> Exactly.  It could be a tick, or just the current time of day if your
> clock does not go backwards.  (I'd be willing to do a full scan if the
> clock ever goes backwards.)  The advantage of time is that you don't
> have to synchronously update some counter.
>
> > Atomic operations could operate on the same time in case this
> > distinction turns out to be useful. Perhaps something like this
> > already exists in Xapian?
>
> I don't think it's important for atomic operations to have the same
> timestamp.  All that's important is that you be able to diff the
> database between the last time you scanned it.

Yeah, it is not necessary for anything I am planning on doing, but it
would be a way for other apps to know that a set of changes were done at
the same time.

> > This way clock skew, clock resolution (lots of operations happening in
> > the same second, msec or nanosec) problems won't be an issue. The crux
> > will be to make sure all write-operations trigger a tick on the
> > indextime.
>
> Clock skew is not really an issue.  It takes years to amass hundreds of
> thousands of email messages.  So adding 5 minutes of slop is not a big
> deal--you'll just scan a few messages needlessly.

Yes, but you risk missing changes without knowing. That is an issue for
my use case.


> Making sure the write-operations update the time should be easy.  Most
> or all of the changes are probably funneled through
> _notmuch_message_sync.  Worst case, there are only 9 places in the
> source code that make use of a Xapian:WritableDatabase, so I'm pretty
> confident total changes wouldn't be much more than 50 lines of code.

Yes :)

> I would do it myself if there were any kind of indication that such a
> change could be upstreamed.  I brought this up in January, 2011, and
> didn't get a huge amount of interest in the ctime idea.  But I was also
> a lot less focused on what I needed.  Now that I have a working
> distributed setup and am 

Synchronization success stories?

2014-04-11 Thread Brian Sniffen
I'm thrilled by using notmuch to manage my mail.  Low-latency search is
very important to me.  But I use computers in a couple of
places---several of which are laptops.  Has anyone stories to share of
successful multi-computer notmuch sync, for a corpus of a
quarter-million messages or so?  

I've tried offlineimap---it (and my Exchange sever) get grouchy with
mailboxes of that size.  I tried keeping ~/Maildir/ in Google Drive; it
took weeks to do the initial sync and I gave up.

I'm trying bittorrent-sync now, with no obivous failures.

-Brian

-- 
Brian Sniffen
Information Security
Akamai Technologies
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


notmuch-hello buffer slow doe to slow query

2014-04-11 Thread Nils Dagsson Moskopp
Hello,

If notmuch-hello includes a saved search with a slow query, switching to
a notmuch-hello buffer is very slow due to notmuch-mode updating counts
for search results. mjw1009 suggested (setq notmuch-hello-auto-refresh
nil), which stops the counting and works around the problem.

Fundamentally, the problem is a slow query. On my laptop (Thinkpad T60),
many things are pretty much instant, even though I have a HDD, no SSD:

 ; time notmuch count 'tag:inbox and tag:list'
 25452
 0.02user 0.00system 0:00.03elapsed 72%CPU (0avgtext+0avgdata 3852maxresident)k
 0inputs+0outputs (0major+1135minor)pagefaults 0swaps

However, from-queries take their time:

 ; time notmuch count 'not tag:replied and to:n...@dieweltistgarnichtso.net'
 5328
 0.10user 0.15system 0:14.14elapsed 1%CPU (0avgtext+0avgdata 3472maxresident)k
 157544inputs+0outputs (0major+1039minor)pagefaults 0swaps

mjw1009 can reproduce if the from-query contains an @ and thinks the
problem may be something deeper down in notmuch (actually probably in
xapian).


Greetings,
-- 
Nils Dagsson Moskopp // erlehmann
http://dieweltistgarnichtso.net


pgpxCpPcJjxH7.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Add configurable changed tag to messages that have been changed on disk

2014-04-11 Thread David Bremner
dm-list-email-notm...@scs.stanford.edu writes:

 Gaute Hope e...@gaute.vetsj.com writes:

 Exactly.  It could be a tick, or just the current time of day if your
 clock does not go backwards.  (I'd be willing to do a full scan if the
 clock ever goes backwards.)  The advantage of time is that you don't
 have to synchronously update some counter.

I think I'd lean towards global time so that one could use it to resolve
conflicts between changes to multiple copies of the database.

 Making sure the write-operations update the time should be easy.  Most
 or all of the changes are probably funneled through
 _notmuch_message_sync.  Worst case, there are only 9 places in the
 source code that make use of a Xapian:WritableDatabase, so I'm pretty
 confident total changes wouldn't be much more than 50 lines of code.

Maybe. Don't forget upgrading the database, updating the test suite, and
presumably some changes to the CLI so the new mtime can actually be
used. Not to be discouraging ;).

 I would do it myself if there were any kind of indication that such a
 change could be upstreamed.  I brought this up in January, 2011, and
 didn't get a huge amount of interest in the ctime idea.  But I was also
 a lot less focused on what I needed.  Now that I have a working
 distributed setup and am actually using notmuch for my mail, I have a
 much better understanding of what is needed.

In the ensuing time, nothing better has developed for tag
synchronization (my pet use case) so maybe it's time to pursue this
again.  It would be good to have some preliminary idea about the time
and space costs of adding document mtimes.  I guess database bloat
should not be too bad, since it's only 64bits (?) per mail message.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Synchronization success stories?

2014-04-11 Thread David Bremner
Brian Sniffen bsnif...@akamai.com writes:

 I'm thrilled by using notmuch to manage my mail.  Low-latency search is
 very important to me.  But I use computers in a couple of
 places---several of which are laptops.  Has anyone stories to share of
 successful multi-computer notmuch sync, for a corpus of a
 quarter-million messages or so?  

I use syncmaildir to sync the actual messages, and a copy of the output
of notmuch dump in git to sync the metadata.

It works OK. A bit slow; depends how often you need to fetch new mail.

d
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] News for emacs saved-searches change.

2014-04-11 Thread Tomi Ollila
On Wed, Apr 09 2014, Mark Walters markwalters1...@gmail.com wrote:

 ---
 The important point is that the changed search variable is not forward
 compatible (it *is* backwards compatible): that is previous version of
 notmuch-emacs will be unusable with a new style notmuch-saved-search
 variable.

the above part could be before '---' so that it is added to the commit
message, too. 


 Best wishes

 Mark



  NEWS |   17 +
  1 file changed, 17 insertions(+)

 diff --git a/NEWS b/NEWS
 index d4f4ea4..8aa4182 100644
 --- a/NEWS
 +++ b/NEWS
 @@ -15,6 +15,23 @@ Command-Line Interface
  Emacs Interface
  ---
  
 +Changed format for saved searches.
 +
 +  The format for `notmuch-saved-searches` has changed, but old style
 +  saved searches are still supported. The new style means that a saved
 +  search can store the desired sort order for the search, and it can
 +  store a separate query to use for generating the count notmuch
 +  shows.
 +
 +  The variable is fully customizable and any configuration done
 +  through customize should `just work', with the additional options

I'm afraid the `just work' work badly when contained in markdown page
(perhaps *just work*?).

Tomi


 +  mentioned above. For manual customization see the documentation for
 +  `notmuch-saved-searches`.
 +
 +  IMPORTANT: a new style notmuch-saved-searches variable will break
 +  previous versions of notmuch-emacs (even search will not work); to
 +  fix remove the customization for notmuch-saved-searches.
 +
  Bug fix for saved searches with newlines in them.
  
Split lines confuse `notmuch count --batch`, so we remove embedded
 -- 
 1.7.10.4
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH v2 0/5] emacs: hello: convert saved-searches to plists

2014-04-11 Thread David Bremner
Mark Walters markwalters1...@gmail.com writes:

 This is v2 of the series; v1 is at
 id:1396733065-32602-1-git-send-email-markwalters1...@gmail.com

pushed,

d
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] News for emacs saved-searches change.

2014-04-11 Thread David Bremner
Mark Walters markwalters1...@gmail.com writes:

 ---
 The important point is that the changed search variable is not forward
 compatible (it *is* backwards compatible): that is previous version of
 notmuch-emacs will be unusable with a new style notmuch-saved-search
 variable.

pushed, with that paragraph as commit message
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] News for emacs saved-searches change.

2014-04-11 Thread David Bremner
Tomi Ollila tomi.oll...@iki.fi writes:

 I'm afraid the `just work' work badly when contained in markdown page
 (perhaps *just work*?).


sorry missed that. Care to fixup my mess? ;)

d
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Add configurable changed tag to messages that have been changed on disk

2014-04-11 Thread dm-list-email-notmuch
David Bremner da...@tethera.net writes:

 Exactly.  It could be a tick, or just the current time of day if your
 clock does not go backwards.  (I'd be willing to do a full scan if the
 clock ever goes backwards.)  The advantage of time is that you don't
 have to synchronously update some counter.

 I think I'd lean towards global time so that one could use it to resolve
 conflicts between changes to multiple copies of the database.

I, too, would prefer to use time.  However, I'm doubtful it would help
resolve conflicts.  On the plus side, I'm not sure it is even needed to
resolve conflicts.  My mail synchronizer has an algorithm for resolving
conflicts that always works without human intervention and in my limited
experience does exactly what I want:

   * If there's a conflict between two replicas, ensure that each
 maildir ends up with the maximum number of the number copies of the
 message in each of the two databases being reconciled.  [Example:
 If replica A deletes a message and replica B moves it from folder
 INBOX to folder SPAM, you end up with a copy in spam.  If replica A
 moves a message to folder IMPORTANT and replica B moves it to SPAM,
 then you get two hard links to the same file, one in IMPORTANT and
 one in SPAM.]

   * If there's a conflict and two replicas have different tags on the
 same message, then the tags in notmuch's new.tags directive get
 logically ANDed, while all other tags get logically ORed.

Granted, I've only been using this system for a week.  On the other
hand, all I was doing was starting to test something I had written, yet
it ended up being so much better than my old system that I couldn't go
back and ended up using my system in production far earlier than
anticipated...

 Making sure the write-operations update the time should be easy.  Most
 or all of the changes are probably funneled through
 _notmuch_message_sync.  Worst case, there are only 9 places in the
 source code that make use of a Xapian:WritableDatabase, so I'm pretty
 confident total changes wouldn't be much more than 50 lines of code.

 Maybe. Don't forget upgrading the database, updating the test suite, and
 presumably some changes to the CLI so the new mtime can actually be
 used. Not to be discouraging ;).

The CLI is trivial.  We'll just add another search keyword ctime
analogous to date.

As far as updating the test suite, etc., it's almost certain that the
core notmuch developers would be unsatisfied with whatever I've done,
since the code base is very clean and has a very uniform style.  So when
I say I'd want some indication that such a change could be upstreamed,
I mean more specifically that someone would be willing to shepherd the
process of getting the code into shape.

 In the ensuing time, nothing better has developed for tag
 synchronization (my pet use case) so maybe it's time to pursue this
 again.

I do have something pretty good for tag synchronization.  It requires a
full database scan each time to detect changes, but I've heavily
optimized it to be very fast by skipping over the notmuch library and
directly scanning the underlying Xapian Btrees.  Currently my bottleneck
is indexing messages (e.g., running notmuch new or calling
notmuch_database_add_message), which are painfully slow on 32-bit
machines.  (Unfortunately my mail server is a 32-bit machine.)

To give you an idea, on a 32 bit machine, if I get a handful of new mail
(e.g., 6 messages), running notmuch new takes 19 seconds, while
scanning the database to check for renames and changed tags adds another
1.4 seconds.  On a 64-bit machine, notmuch new might take 1 second,
while scanning the database adds 350 msec.

So full database scan's might not be the end of the world.  The biggest
performance bottleneck at this point is notmuch's painful indexing
performance.  It kills me that it takes 10 minutes to index 100,000 mail
messages on a 16-core machine with 48 GiB of RAM.  But the library is
non-reentrant and allocates thread IDs in such a way that it's hard to
create parallel databases and later merge them.  Basically I can't
figure out how to make productive use of more than one CPU core even
when synchronizing across 1GB Ethernet!

It's pretty beta, but my intention is to open-source my code, so glad
for beta testers if you are interested in testing tag synchronization.

 It would be good to have some preliminary idea about the time
 and space costs of adding document mtimes.  I guess database bloat
 should not be too bad, since it's only 64bits (?) per mail message.

Plus a Btree to index it, so figure at least 24 bytes per message.
Another issue is that values are always brought into memory with a
document, so it will consume more RAM.  But yeah, I don't think it
should be that bad.

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Synchronization success stories?

2014-04-11 Thread David Mazieres
David Bremner da...@tethera.net writes:

 Brian Sniffen bsnif...@akamai.com writes:

 I'm thrilled by using notmuch to manage my mail.  Low-latency search is
 very important to me.  But I use computers in a couple of
 places---several of which are laptops.  Has anyone stories to share of
 successful multi-computer notmuch sync, for a corpus of a
 quarter-million messages or so?  

 I use syncmaildir to sync the actual messages, and a copy of the output
 of notmuch dump in git to sync the metadata.

 It works OK. A bit slow; depends how often you need to fetch new mail.

If you want to see my solution, it is here:

http://www.scs.stanford.edu/~dm/muchsync-0.tar.gz

I'm a little embarrassed by this code, as I just started to test it a
week ago then instantly became completely dependent on it.  I will
probably change the name (from muchsync to syncmuch) and the database
format before releasing.  But if you feel like beta-testing and giving
me feedback, have a look.

Beware that if you have been using notmuch dump, you may become
instantly hooked on my solution...

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch