Tag timestamps and synchronization

2011-01-24 Thread dm-list-email-notm...@scs.stanford.edu
One of the features I would like to see from notmuch is an easier
ability to synchronize tags across machines.  At the very least, I
would need either incremental dump and restore, or some way to
communicate arbitrary tags to a local imap server that shares
notmuch's maildir (much as notmuch currently syncs the standard tags),
so that I synchronize two maildirs with a tool like offlineimap.

As Carl pointed out to me in private email, there has been some
previous discussion in the following thread:

notmuch show id:87hbfnmiux.fsf at yoom.home.cworth.org

Based on that thread, there seems to be some desire for notmuch to
keep track of a per-message timestamp when the flags were last
updated.  This would allow much easier expiration for people who want
the deleted tag.  It would also allow incremental dump and restore of
tags, which is exactly what I need to sync tags across servers with
reasonable amounts of bandwidth.

Metadata timestamps are one of those things that probably have a lot
of different applications, so since Carl is considering a new database
format for the next release anyway, perhaps it also makes sense to add
a metadata change time for each messages.

The timestamp would be included in "dump" output, and you could
request a dump of changes since a particular time.  On restore, you
might have several options:

  - overwrite: always set the new tags and timestamp in the database
to the value in the restore data.

  - update: always set the tags, but update the to the current time.

  - conditional T: update only if the message metadata has not been
updated since time T.

To sync flags, then you just need to keep track of the last time you
synced with a particular server--call this time T.  Do a dump since
time T, upload to server, do a conditional restore for time T on
server.  Finally do a partial dump from time T on the server and an
overwrite import on the client.  (This policy makes changes on the
server always override conflicting ones on the client--perhaps people
want other policies, like union of the tags, etc.)

Second, there seems to be some desire in that thread to sync with IMAP
flags.  This would be particularly great, but the easies way to do it
is probably *not* to try to implement IMAP, but rather to use an
existing IMAP server and just modify the maildir so that the IMAP
server will pick up the flags.

In the case of dovecot, the arbitrary tag format is very simple.  Each
maildir has a file called dovecot-keywords mapping numbers 0, 1,
... to keywords.  Then mail file names contain lower-case letters for
the flags they are marked with--0 => a, 1 => b, etc.--allowing up to
26 arbitrary tags for each maildir.  One could probably sync to
dovecot's maildir format relatively easily in a script given
incremental dump and restore of tags.  Or possibly notmuch could
natively support dovecot as one of multiple back-end tag storage

Having a static tag mapping in the .notmuch-config file would be much
better than hard-coding flag2tag.  However, I'm not sure it's
sufficient.  The reason is that if you ever completely delete a tag
(e.g., you have "todo", and "meeting" tags and periodically have no
messages in either categories in a given mail folder), then an IMAP
server like dovecot might end up re-allocating the letters
corresponding to those tags in a different order.  Also, at least for
dovecot, the flag mappings are per-folder, which you kind of want
since you are limited to 26 non-standard tags, so global values might
not work.

I'm curious to hear people's thoughts/reactions?


Tag timestamps and synchronization

2011-01-24 Thread dm-list-email-notm...@scs.stanford.edu
At Tue, 25 Jan 2011 10:08:12 +1030,
Tim Stoakes wrote:
> I do something like this by using some shell scripts with formail, to
> 'store' notmuch tags into the X-Label headers of the individual mails.
> Offlineimap then syncs these headers. If I need the tags to become
> notmuch-ified on the target, I just scan all the mail's X-Label headers.

How well does offlineimap work when you modify the contents of
messages?  This doesn't change the message UIDs, does it?  Are you
syncing between two imapd instances, or one imapd and a maildir?  (I
currently run a local imap server as well, because it seems to be a
lot faster.)

How does the imap server even detect that the message contents has
been modified?  Does it have to stat 300,000 files every time you
check your email?

In my setup, I regularly check email from three or four different
machines, so the syncing is not just for backup purposes.  When I
switch between computers all my label changes need to be visible.

> I'm happy to share this hack glue if it would help.

Yes, I'd be glad to take a look at your scripts.


What to people use for calendar invites?

2015-04-16 Thread dm-list-email-notm...@scs.stanford.edu
I've been running into this issue lately where I agree to meet people
and we say it's confirmed, but if don't send them a calendar invite of
mime type text/calendar, then it's as if we never agreed and they don't
show up.  I get, "Oh, you never sent me a calendar invite so it wasn't
in my calendar."

I'm wondering if others have this problem and have figured the easiest
way to integrate notmuch with some kind of calendaring software that
generates ics or text/calendar attachments.


Enabling and disabling maildir.synchronize_flags

2015-07-01 Thread dm-list-email-notm...@scs.stanford.edu
Sorry if this question is answered somewhere, but I'm wondering:  What
is the best way to enable and disable maildir.synchronize_flags?

It seems that disabling it should simply be safe.  But re-enabling, one
risks losing tags, as the next notmuch new will cause old maildir flags
to override the xapian database.  So that suggests something like:

   notmuch dump > backup
   notmuch config set maildir.synchronize_flags false
   # Do I need to run notmuch new here?
   notmuch restore < backup

Is that safe?  The man page suggests one additionally need to run
notmuch new before running notmuch restore.  All of this is pretty slow.
Is there a more efficient way?

A one or two sentence clarification in the notmuch-config man page might
be helpful to people contemplating playing with this switch.  The
default is on, to I suspect it costs a lot of performance.  I've been
afraid to turn it off for fear that I won't be able to undo this


[PATCH] Add configurable changed tag to messages that have been changed on disk

2014-04-10 Thread dm-list-email-notm...@scs.stanford.edu
Gaute Hope  writes:

>> A better approach would be to add a new "modtime" xapian value that is
>> updated whenever the tags or any other terms (such as XFDIRENTRY) are
>> added to or deleted from a docid.  If it's a Xapian value, rather than a
>> term, then modtime will be queriable just like date, allowing multiple
>> applications to query all docids modified since the last time they ran.
>> [... snip]
> This could also solve it, and probably have more uses. I don't quite see
> how the opposite problem (for my use case) can be solved by this without
> using a 'localchange' tag. This is to sync tag to maildir sync, when a
> new tag has been added (by e.g. a user interaction in a client) it needs
> to be copied to the maildir, if it is not done in the same go a
> different application won't know whether the change was local or remote.
> How did you solve this?

Why don't you just set maildir.synchronize_flags=true?  When I
synchronize mail across machines, I start by concurrently running
"notmuch new" on both the local and remote machines, which picks up all
the changed maildir flags.  Then I synchronize the mail and the tags
between the two maildirs.  If maildir.synchronize=true, then atomically
with setting the new tags I call notmuch_message_tags_to_maildir_flags()
to sync the new tags to the maildir.

The maildir flags question seems kind of independent of what we are
talking about, which is just having an incremental way of examining the
database.  Right now, I have to scan everything to find tags that have
changed since the last synchronization event.  If I had modtime (or
really it should be called "ctime", like inode change time), then I
could look at only the few messages that changed, and it would probably
shave 250msec off polling new mail for a 100,000-message maildir.

Note you can't use the file system ctime/mtime because the file system
may have changed since the last time you ran notmuch new.

> I would suggest using a Xapian- or Index-time which gets a tick
> everytime a modification is made to the index.

Exactly.  It could be a tick, or just the current time of day if your
clock does not go backwards.  (I'd be willing to do a full scan if the
clock ever goes backwards.)  The advantage of time is that you don't
have to synchronously update some counter.

> Atomic operations could operate on the same time in case this
> distinction turns out to be useful. Perhaps something like this
> already exists in Xapian?

I don't think it's important for atomic operations to have the same
timestamp.  All that's important is that you be able to diff the
database between the last time you scanned it.

> This way clock skew, clock resolution (lots of operations happening in
> the same second, msec or nanosec) problems won't be an issue. The crux
> will be to make sure all write-operations trigger a tick on the
> indextime.

Clock skew is not really an issue.  It takes years to amass hundreds of
thousands of email messages.  So adding 5 minutes of slop is not a big
deal--you'll just scan a few messages needlessly.

Making sure the write-operations update the time should be easy.  Most
or all of the changes are probably funneled through
_notmuch_message_sync.  Worst case, there are only 9 places in the
source code that make use of a Xapian:WritableDatabase, so I'm pretty
confident total changes wouldn't be much more than 50 lines of code.

I would do it myself if there were any kind of indication that such a
change could be upstreamed.  I brought this up in January, 2011, and
didn't get a huge amount of interest in the ctime idea.  But I was also
a lot less focused on what I needed.  Now that I have a working
distributed setup and am actually using notmuch for my mail, I have a
much better understanding of what is needed.


[PATCH] Add configurable changed tag to messages that have been changed on disk

2014-04-11 Thread dm-list-email-notm...@scs.stanford.edu
David Bremner  writes:

>> Exactly.  It could be a tick, or just the current time of day if your
>> clock does not go backwards.  (I'd be willing to do a full scan if the
>> clock ever goes backwards.)  The advantage of time is that you don't
>> have to synchronously update some counter.
> I think I'd lean towards global time so that one could use it to resolve
> conflicts between changes to multiple copies of the database.

I, too, would prefer to use time.  However, I'm doubtful it would help
resolve conflicts.  On the plus side, I'm not sure it is even needed to
resolve conflicts.  My mail synchronizer has an algorithm for resolving
conflicts that always works without human intervention and in my limited
experience does exactly what I want:

   * If there's a conflict between two replicas, ensure that each
 maildir ends up with the maximum number of the number copies of the
 message in each of the two databases being reconciled.  [Example:
 If replica A deletes a message and replica B moves it from folder
 INBOX to folder SPAM, you end up with a copy in spam.  If replica A
 moves a message to folder IMPORTANT and replica B moves it to SPAM,
 then you get two hard links to the same file, one in IMPORTANT and
 one in SPAM.]

   * If there's a conflict and two replicas have different tags on the
 same message, then the tags in notmuch's new.tags directive get
 logically ANDed, while all other tags get logically ORed.

Granted, I've only been using this system for a week.  On the other
hand, all I was doing was starting to test something I had written, yet
it ended up being so much better than my old system that I couldn't go
back and ended up using my system in production far earlier than

>> Making sure the write-operations update the time should be easy.  Most
>> or all of the changes are probably funneled through
>> _notmuch_message_sync.  Worst case, there are only 9 places in the
>> source code that make use of a Xapian:WritableDatabase, so I'm pretty
>> confident total changes wouldn't be much more than 50 lines of code.
> Maybe. Don't forget upgrading the database, updating the test suite, and
> presumably some changes to the CLI so the new mtime can actually be
> used. Not to be discouraging ;).

The CLI is trivial.  We'll just add another search keyword ctime
analogous to date.

As far as updating the test suite, etc., it's almost certain that the
core notmuch developers would be unsatisfied with whatever I've done,
since the code base is very clean and has a very uniform style.  So when
I say I'd want some "indication that such a change could be upstreamed,"
I mean more specifically that someone would be willing to shepherd the
process of getting the code into shape.

> In the ensuing time, nothing better has developed for tag
> synchronization (my pet use case) so maybe it's time to pursue this
> again.

I do have something pretty good for tag synchronization.  It requires a
full database scan each time to detect changes, but I've heavily
optimized it to be very fast by skipping over the notmuch library and
directly scanning the underlying Xapian Btrees.  Currently my bottleneck
is indexing messages (e.g., running notmuch new or calling
notmuch_database_add_message), which are painfully slow on 32-bit
machines.  (Unfortunately my mail server is a 32-bit machine.)

To give you an idea, on a 32 bit machine, if I get a handful of new mail
(e.g., 6 messages), running "notmuch new" takes 19 seconds, while
scanning the database to check for renames and changed tags adds another
1.4 seconds.  On a 64-bit machine, "notmuch new" might take 1 second,
while scanning the database adds 350 msec.

So full database scan's might not be the end of the world.  The biggest
performance bottleneck at this point is notmuch's painful indexing
performance.  It kills me that it takes 10 minutes to index 100,000 mail
messages on a 16-core machine with 48 GiB of RAM.  But the library is
non-reentrant and allocates thread IDs in such a way that it's hard to
create parallel databases and later merge them.  Basically I can't
figure out how to make productive use of more than one CPU core even
when synchronizing across 1GB Ethernet!

It's pretty beta, but my intention is to open-source my code, so glad
for beta testers if you are interested in testing tag synchronization.

> It would be good to have some preliminary idea about the time
> and space costs of adding document mtimes.  I guess database bloat
> should not be too bad, since it's only 64bits (?) per mail message.

Plus a Btree to index it, so figure at least 24 bytes per message.
Another issue is that values are always brought into memory with a
document, so it will consume more RAM.  But yeah, I don't think it
should be that bad.


Synchronization success stories?

2014-04-13 Thread dm-list-email-notm...@scs.stanford.edu
Tilmann Singer  writes:

> David Mazieres  writes:
>> What happens if you get a message that's been stuck in a queue for a few
>> days and has an old Date: header?
> It would be missed.  I have set the timespan to look backwards for new
> mail to one month to be a bit safer against the stuck-in-queue cases,
> but mails with older Date: headers would definitely get missed.
> The current output of notmuch count "*" is the same on both the client
> and the server, so it seems I didn't run into this problem yet (maybe I
> was just lucky).

I've been playing around with reorganizing my maildir, and found a
couple of messages (on mailing lists) with clearly invalid dates years
in the past.  But checking with notmuch count is a good idea.  Then you
can always fall back to the slow path in the unlikely event that your
counts don't match up.  Well, except that A) count is just unique
message-IDs, not messages, and B) when synchronizing in both directions
you could still miss something.  You have to assume that the invalid
dates are only ever going to occur at one end of a synchronization

>> Or if you get new messages that have
>> the same Message-ID as old ones?
> Is that even possible?  I thought that notmuch guarantees the uniqueness
> of indexed message ids.  The only reference I could find without trying
> to read the code was this thread id:87mwyz3s9d.fsf at star.eba from 2012,
> which supports the assumption.

Sadly, yes it is quite possible, and even opens up a slight security
issue.  Suppose I know you are on a mailing list, and some message
appears on that mailing list that I don't want you to see.  I can send
you an innocuous-looking message that just happens to have the same
message-id, and you may never see the original mailing list message.
Even better, depending on how your spam filtering is setup, if I include
the GTUBE string in my message you may never see mine or the original.

That's why with muchsync, I replicate actual mail messages, rather than
message-IDs.  Then you can always periodically check for message-IDs
that appear in more than one file.  (In fact, thought I haven't
published an interface for this, the SQL database kept my muchsync makes
it trivial to check for this and detect certain attacks.)

I understand why notmuch went with message IDs.  For instance you have
sent this reply both directly to me and to a mailing list I am
subscribed to.  So I will get two slightly different copies of the
message (one will have the standard notmuch mailing list signature, the
other won't).  And this way once I've marked it read, the message will
be read even once the second copy comes in.  But personally I'd rather
see the occasional duplicate message than risk not seeing messages.  In
particular, if the goal is to see fewer unread messages, some sort of
feature that pro-actively skips all future messages in a thread or
subthread would be more useful...

> Here is how long they take (on a machine with an SSD, which certainly
> helps):
> $ time notmuch dump --format=batch-tag | sort > /tmp/notmuch.dump
> real0m3.643s
> user0m3.593s
> sys 0m0.140s
> $ time notmuch restore < /tmp/notmuch.dump
> real0m3.719s
> user0m3.357s
> sys 0m0.357s
> $ notmuch count 
> 117118

That's crazy.  I'm jealous.  Then again, this is how fast muchsync runs
(including a full database scan to detect changed messages and tags)
when there is no new mail:

$ time ./muchsync -v
[notmuch] No new mail.
synchronizing muchsync database with Xapian... 0.038506 (+0.038506)
starting scan of Xapian database... 0.039069 (+0.000563)
opened Xapian... 0.040851 (+0.001782)
scanned message IDs... 0.137647 (+0.096796)
scanned tags... 0.170404 (+0.032757)
scanned directories in xapian... 0.172100 (+0.001696)
scanned filenames in xapian... 0.172376 (+0.000276)
adjusted link counts... 0.199461 (+0.027085)
finished synchronizing muchsync database with Xapian... 0.212965 (+0.013505)

sys 0m0.023s


[PATCH] Add configurable changed tag to messages that have been changed on disk

2014-04-23 Thread dm-list-email-notm...@scs.stanford.edu
Austin Clements  writes:

>> A middle ground might be to use the maximum of two values: 1) the
>> time-of-day at which notmuch started executing, and 2) the highest ctime
>> in the database plus 100 microseconds (leaving plenty of slop to store
>> timestamps as IEEE doubles with 52 significant bits).  Since the values
>> will be Btree-indexed, computing the max plus one will be cheap.
> This makes me curious if you've considered how to fit this in to
> Xapian.  The Xapian query syntax supports range queries over document
> "values", but within the Xapian B-tree, values are stored in docid
> order, not value order, so Xapian's range query operator is actually a
> full scan in implementation.  I assume it does this so it doesn't have
> to store both forward and inverse indexes of values.  (I spent some
> time figuring out the layout of the Xapian database and have fairly
> detailed notes if anyone's curious.)

Aside from finding the previous max time, everything else should work
identically to the date query operator and NOTMUCH_VALUE_TIMESTAMP.

Though I believe you, I'm a little surprised the values aren't indexed.
An alternative design might use terms like XCTIME where
the x's are hex digits.  But this seems a bit clunky and not using
Xapian the way it is indented to be used.

When I do a query with a giant result set ordered by date (notmuch
search --sort=oldest-first "*"), the first few results come back pretty
quickly, so I guess the full database scan is not an issue, at least for
~10^5 messages.

> This is still reasonably fast in practice because it's a sequential
> scan and only requires a few bytes per message, but it's probably not
> what you'd expect.  That said, Xapian does track per-value statistics
> that would suffice for the particular problem of monotonic time stamps
> (e.g., Database::get_value_upper_bound).

Oh, well in that case there is no issue.  That max is the only statistic
we need.  Everything that requires a full database scan, like get me all
messages whose properties have changed since time X, is something that
you can't do at all right now.  And in fact I'm already scanning all
100,000 message IDs AND diffing the results against a separate sqlite
database to detect changes in only 0.09 seconds (Linux) or 1.2 seconds
(32-bit OpenBSD).  This will only make that faster, and additionally
allow other people to do what I'm doing without writing a bunch of C++

> In principle it would be possible to use user metadata or even
> document terms to support true B-tree range scans by ctime order, but
> I don't think it's possible to express queries over this using
> Xapian's query parser.  I've written about 90% of a (new) custom query
> parser for Notmuch that would enable this, but little things like my
> looming thesis deadline have interfered with me finishing it.

Yeah, I've been avoiding the query parser and just scanning terms and
postlists directly.  Since the lack of ctime forced me to scan the whole
database anyway, I found it much faster to scan each tag's posting list
and dump the results into sqlite than to extract tag terms on a
per-document basis the way notmuch dump does.

>> Incidentally, if you are really this paranoid about time stamps, it
>> should bother you that notmuch's directory timestamps only have one
>> second granularity.
> This is historical (and, I agree, unfortunate).  But nobody's
> complained, so it hasn't been worth changing the libnotmuch interface
> to support sub-second directory mtimes.  However, notmuch new does
> correctly handle deliveries in the same second it runs.  If the
> wall-clock time when it starts is the same as the on-disk directory
> mtime, it skips updating the in-database directory mtime at the end.
> Hence, on the next run, it will still consider the directory
> out-of-date.  It's a bit of a hack, but it's a hack that would be
> necessary for supporting older file systems even if we did support
> sub-second timestamps.

Yeah, is kind of a problem me.  I currently scan the XFDIRENTRY terms
belonging to a directory only if the directory's notmuch mtime has
changed since the last time I examined Xapian's state.  I used to scan
the actual directories, which was fine, but not so useful because I
don't actually want to deal with messages that notmuch has not yet
indexed.  Conversely, if a directory has not changed since the last time
muchsync ran, but notmuch's idea of the directory has changed (because
someone ran notmuch new), then I do care about scanning for new/deleted

But couldn't notmuch fix the sub-second problem in a fully backwards
compatible way?  After all, the database is already storing these mtimes
as doubles.  Even for OSes that don't support st_mtim, notmuch could
just add 0.1 seconds to the previous timestamp of a modified


folder and path completely broken in HEAD?

2014-05-02 Thread dm-list-email-notm...@scs.stanford.edu
Hey, I'm playing around with the head of the git repository
(bc64cdce289d84be2550c4fccb1f008d15eaeb0e) to try to figure out how the
new folder: prefixes work, as folders are a critical part of how I
organize my mail.  (Since tags are not hierarchical, folders are the
best way for me to group mail to a bunch of related addresses, while
retaining the ability to separate out any mailboxes that become high

I'm using a pretty standard maildir++ layout.  For example, underneath
my database.path I have a bunch of mail in directories such as:


It used to be the case that if I wanted to read all of my "mail" mail, I
could search for folder:mail, while to look at just voicemail, I could
say folder:mail.voicemail, etc.  Now, I can't get anything to match a
folder predicate period.  For example, using notmuch as notmuch-0.17 and
./notmuch as notmuch-0.18-rc2+2~gbc64cdc, here's what I get:

linux2$ notmuch count folder:mail
linux3$ notmuch count folder:mail.class
linux4$ notmuch count folder:mail.voicemail
linux5$ notmuch count folder:mail.voicemail/cur
linux6$ notmuch count folder:.mail.voicemail/cur
linux7$ ./notmuch count folder:mail
linux8$ ./notmuch count folder:.mail
linux9$ ./notmuch count folder:.mail.voicemail
linux10$ ./notmuch count folder:.mail.voicemail/cur
linux 11$ ./notmuch count path:.mail.voicemail
linux 12$ ./notmuch count path:.mail.voicemail/'**'
linux 13$ ./notmuch count path:.mail.voicemail/cur 
linux 14$ ./notmuch count folder:mail.voicemail

What gives?  Are the path and folder predicates completely broken, or is
something very important missing from the new notmuch-search-terms
manual page?  How can I make this work?


folder and path completely broken in HEAD?

2014-05-02 Thread dm-list-email-notm...@scs.stanford.edu
Jani Nikula  writes:

> On Fri, 02 May 2014, dm-list-email-notmuch at scs.stanford.edu wrote:
>> I'm using a pretty standard maildir++ layout.  For example, underneath
>> my database.path I have a bunch of mail in directories such as:
>> .INBOX.Main/{new,cur}
>> .mail.class/{new,cur}
>> .mail.voicemail/{new,cur}
>> ...
> Here's additional commentary on the specific queries.
>> linux7$ ./notmuch count folder:mail
>> 0
>> linux8$ ./notmuch count folder:.mail
>> 0

Oh, man.  That's a serious bummer.

Is there any mechanism left that would let me hierarchically group
messages?  I've got a ton of mail.* folders, and create new ones
dynamically.  I really want a mechanism to group them hierarchically, so
I can have a search that matches all current and future mail
directories.  I organized my whole mail setup around folders because a)
tags do not provide this kind of hierarchical control, and b) there
doesn't seem to be a convenient way to apply tags 100% reliably on
message delivery, whereas I *can* control the folder 100% reliably.

Worse, because of my poor performance, I was hoping to segregate
messages by year.  So it would be:


All the way back.  Now you are saying there will be no convenient way to
match just the "mail.class" part without the year?  How very
distressing.  Ugh.


folder and path completely broken in HEAD?

2014-05-03 Thread dm-list-email-notm...@scs.stanford.edu
Jani Nikula  writes:

> It's not going to help you, but I'll mention a few of the issues the old
> folder: search had, which we also had complaints about, and which would
> have been quite hard to fix while preserving the behaviour you want. In
> short, we considered the old folder: search broken.
> Given layout:
>   Foo/{cur,new}
>   foo/{cur,new}
>   fooing/{cur,new}
>   bar/foo/{cur,new}
>   cur
>   new
> It was impossible to refer to the top level folder.
> It was impossible to refer to foo without also referring to Foo, fooing,
> and bar/foo.
> In your layout, if you also had 2013/.bar.foo, folder:foo would match
> that as well. To not match that, you would have to include each
> folder:.foo.xxx in the search.

First, thanks for the response.  The responsiveness and friendliness of
the notmuch mailing list goes a long way towards compensating for any
missing features / customizability one might want.

I was already aware of the issues you raise, and had worked around them
by just renaming all my mail folders.  I agree that searching for a
particular folder is crucial functionality, and found it weird that I
had to abandon my main top-level mailbox (which I just renamed

However, currently it seems strange that there are *two* different
search terms (folder and path), and that neither one lets you search for
a portion of your folder name.  Admittedly the old folder code was one
of the parts of the notmuch source that didn't make sense to me (and now
I'm starting to understand why--e.g., the fact that it used stemming,
for instance, was just weird and maybe accidental).

I may be able to hack around this problem in the emacs lisp part or with
a wrapper script.  I'm already having to defadvice around
notmuch-call-notmuch-sexp to implement features I'm missing from
wanderlust and gnus (e.g., the inability to specify regexps matching all
my email addresses).

But to help me understand the current design, can you answer a couple of

First, are there people out there who do not use a collection of maildir
directories, with all mail in cur and new?  If not, why does notmuch try
to find mail in non-mail-directories, and why do you need search terms
differentiating new and cur?  Conversely, I find it particularly weird
that there's no convenient way to say "stop trying to index stuff that
isn't in a maildir (cur or new)."  You can do the inverse and blacklist
files, but then I end up with stuff like this in my .notmuch-config:


(And still I think there are other junk files that get left around by my
imap server or other software, so notmuch new still repeatedly tries to
index junk whenever I run it.)

Second, does anyone out there have a collection with more than a few
thousand maildirs?  I understand some people index millions of messages,
but even with tiny maildirs of a few hundred messages each, that's a
mere 10,000 maildirs.  So for that, who needs any kind of Xapian
indexing fanciness or dedicated XFOLDER prefix?  Scanning the entire set
of XDIRECTORY terms to apply an arbitrary predicate should take
negligible time.  So all you really need is a boolean search term
containing the directory docid (or with the existing schema the ability
to do a prefix search on XFDIRENTRYn:*).