Tag timestamps and synchronization

2011-01-24 Thread dm-list-email-notmuch
One of the features I would like to see from notmuch is an easier
ability to synchronize tags across machines.  At the very least, I
would need either incremental dump and restore, or some way to
communicate arbitrary tags to a local imap server that shares
notmuch's maildir (much as notmuch currently syncs the standard tags),
so that I synchronize two maildirs with a tool like offlineimap.

As Carl pointed out to me in private email, there has been some
previous discussion in the following thread:

notmuch show id:87hbfnmiux@yoom.home.cworth.org

Based on that thread, there seems to be some desire for notmuch to
keep track of a per-message timestamp when the flags were last
updated.  This would allow much easier expiration for people who want
the deleted tag.  It would also allow incremental dump and restore of
tags, which is exactly what I need to sync tags across servers with
reasonable amounts of bandwidth.

Metadata timestamps are one of those things that probably have a lot
of different applications, so since Carl is considering a new database
format for the next release anyway, perhaps it also makes sense to add
a metadata change time for each messages.

The timestamp would be included in dump output, and you could
request a dump of changes since a particular time.  On restore, you
might have several options:

  - overwrite: always set the new tags and timestamp in the database
to the value in the restore data.

  - update: always set the tags, but update the to the current time.

  - conditional T: update only if the message metadata has not been
updated since time T.

To sync flags, then you just need to keep track of the last time you
synced with a particular server--call this time T.  Do a dump since
time T, upload to server, do a conditional restore for time T on
server.  Finally do a partial dump from time T on the server and an
overwrite import on the client.  (This policy makes changes on the
server always override conflicting ones on the client--perhaps people
want other policies, like union of the tags, etc.)


Second, there seems to be some desire in that thread to sync with IMAP
flags.  This would be particularly great, but the easies way to do it
is probably *not* to try to implement IMAP, but rather to use an
existing IMAP server and just modify the maildir so that the IMAP
server will pick up the flags.

In the case of dovecot, the arbitrary tag format is very simple.  Each
maildir has a file called dovecot-keywords mapping numbers 0, 1,
... to keywords.  Then mail file names contain lower-case letters for
the flags they are marked with--0 = a, 1 = b, etc.--allowing up to
26 arbitrary tags for each maildir.  One could probably sync to
dovecot's maildir format relatively easily in a script given
incremental dump and restore of tags.  Or possibly notmuch could
natively support dovecot as one of multiple back-end tag storage
schemes.

Having a static tag mapping in the .notmuch-config file would be much
better than hard-coding flag2tag.  However, I'm not sure it's
sufficient.  The reason is that if you ever completely delete a tag
(e.g., you have todo, and meeting tags and periodically have no
messages in either categories in a given mail folder), then an IMAP
server like dovecot might end up re-allocating the letters
corresponding to those tags in a different order.  Also, at least for
dovecot, the flag mappings are per-folder, which you kind of want
since you are limited to 26 non-standard tags, so global values might
not work.

I'm curious to hear people's thoughts/reactions?

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Add configurable changed tag to messages that have been changed on disk

2014-04-10 Thread dm-list-email-notmuch
Gaute Hope e...@gaute.vetsj.com writes:

 A better approach would be to add a new modtime xapian value that is
 updated whenever the tags or any other terms (such as XFDIRENTRY) are
 added to or deleted from a docid.  If it's a Xapian value, rather than a
 term, then modtime will be queriable just like date, allowing multiple
 applications to query all docids modified since the last time they ran.

 [... snip]

 This could also solve it, and probably have more uses. I don't quite see
 how the opposite problem (for my use case) can be solved by this without
 using a 'localchange' tag. This is to sync tag to maildir sync, when a
 new tag has been added (by e.g. a user interaction in a client) it needs
 to be copied to the maildir, if it is not done in the same go a
 different application won't know whether the change was local or remote.
 How did you solve this?

Why don't you just set maildir.synchronize_flags=true?  When I
synchronize mail across machines, I start by concurrently running
notmuch new on both the local and remote machines, which picks up all
the changed maildir flags.  Then I synchronize the mail and the tags
between the two maildirs.  If maildir.synchronize=true, then atomically
with setting the new tags I call notmuch_message_tags_to_maildir_flags()
to sync the new tags to the maildir.

The maildir flags question seems kind of independent of what we are
talking about, which is just having an incremental way of examining the
database.  Right now, I have to scan everything to find tags that have
changed since the last synchronization event.  If I had modtime (or
really it should be called ctime, like inode change time), then I
could look at only the few messages that changed, and it would probably
shave 250msec off polling new mail for a 100,000-message maildir.

Note you can't use the file system ctime/mtime because the file system
may have changed since the last time you ran notmuch new.

 I would suggest using a Xapian- or Index-time which gets a tick
 everytime a modification is made to the index.

Exactly.  It could be a tick, or just the current time of day if your
clock does not go backwards.  (I'd be willing to do a full scan if the
clock ever goes backwards.)  The advantage of time is that you don't
have to synchronously update some counter.

 Atomic operations could operate on the same time in case this
 distinction turns out to be useful. Perhaps something like this
 already exists in Xapian?

I don't think it's important for atomic operations to have the same
timestamp.  All that's important is that you be able to diff the
database between the last time you scanned it.

 This way clock skew, clock resolution (lots of operations happening in
 the same second, msec or nanosec) problems won't be an issue. The crux
 will be to make sure all write-operations trigger a tick on the
 indextime.

Clock skew is not really an issue.  It takes years to amass hundreds of
thousands of email messages.  So adding 5 minutes of slop is not a big
deal--you'll just scan a few messages needlessly.

Making sure the write-operations update the time should be easy.  Most
or all of the changes are probably funneled through
_notmuch_message_sync.  Worst case, there are only 9 places in the
source code that make use of a Xapian:WritableDatabase, so I'm pretty
confident total changes wouldn't be much more than 50 lines of code.

I would do it myself if there were any kind of indication that such a
change could be upstreamed.  I brought this up in January, 2011, and
didn't get a huge amount of interest in the ctime idea.  But I was also
a lot less focused on what I needed.  Now that I have a working
distributed setup and am actually using notmuch for my mail, I have a
much better understanding of what is needed.

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Add configurable changed tag to messages that have been changed on disk

2014-04-11 Thread dm-list-email-notmuch
David Bremner da...@tethera.net writes:

 Exactly.  It could be a tick, or just the current time of day if your
 clock does not go backwards.  (I'd be willing to do a full scan if the
 clock ever goes backwards.)  The advantage of time is that you don't
 have to synchronously update some counter.

 I think I'd lean towards global time so that one could use it to resolve
 conflicts between changes to multiple copies of the database.

I, too, would prefer to use time.  However, I'm doubtful it would help
resolve conflicts.  On the plus side, I'm not sure it is even needed to
resolve conflicts.  My mail synchronizer has an algorithm for resolving
conflicts that always works without human intervention and in my limited
experience does exactly what I want:

   * If there's a conflict between two replicas, ensure that each
 maildir ends up with the maximum number of the number copies of the
 message in each of the two databases being reconciled.  [Example:
 If replica A deletes a message and replica B moves it from folder
 INBOX to folder SPAM, you end up with a copy in spam.  If replica A
 moves a message to folder IMPORTANT and replica B moves it to SPAM,
 then you get two hard links to the same file, one in IMPORTANT and
 one in SPAM.]

   * If there's a conflict and two replicas have different tags on the
 same message, then the tags in notmuch's new.tags directive get
 logically ANDed, while all other tags get logically ORed.

Granted, I've only been using this system for a week.  On the other
hand, all I was doing was starting to test something I had written, yet
it ended up being so much better than my old system that I couldn't go
back and ended up using my system in production far earlier than
anticipated...

 Making sure the write-operations update the time should be easy.  Most
 or all of the changes are probably funneled through
 _notmuch_message_sync.  Worst case, there are only 9 places in the
 source code that make use of a Xapian:WritableDatabase, so I'm pretty
 confident total changes wouldn't be much more than 50 lines of code.

 Maybe. Don't forget upgrading the database, updating the test suite, and
 presumably some changes to the CLI so the new mtime can actually be
 used. Not to be discouraging ;).

The CLI is trivial.  We'll just add another search keyword ctime
analogous to date.

As far as updating the test suite, etc., it's almost certain that the
core notmuch developers would be unsatisfied with whatever I've done,
since the code base is very clean and has a very uniform style.  So when
I say I'd want some indication that such a change could be upstreamed,
I mean more specifically that someone would be willing to shepherd the
process of getting the code into shape.

 In the ensuing time, nothing better has developed for tag
 synchronization (my pet use case) so maybe it's time to pursue this
 again.

I do have something pretty good for tag synchronization.  It requires a
full database scan each time to detect changes, but I've heavily
optimized it to be very fast by skipping over the notmuch library and
directly scanning the underlying Xapian Btrees.  Currently my bottleneck
is indexing messages (e.g., running notmuch new or calling
notmuch_database_add_message), which are painfully slow on 32-bit
machines.  (Unfortunately my mail server is a 32-bit machine.)

To give you an idea, on a 32 bit machine, if I get a handful of new mail
(e.g., 6 messages), running notmuch new takes 19 seconds, while
scanning the database to check for renames and changed tags adds another
1.4 seconds.  On a 64-bit machine, notmuch new might take 1 second,
while scanning the database adds 350 msec.

So full database scan's might not be the end of the world.  The biggest
performance bottleneck at this point is notmuch's painful indexing
performance.  It kills me that it takes 10 minutes to index 100,000 mail
messages on a 16-core machine with 48 GiB of RAM.  But the library is
non-reentrant and allocates thread IDs in such a way that it's hard to
create parallel databases and later merge them.  Basically I can't
figure out how to make productive use of more than one CPU core even
when synchronizing across 1GB Ethernet!

It's pretty beta, but my intention is to open-source my code, so glad
for beta testers if you are interested in testing tag synchronization.

 It would be good to have some preliminary idea about the time
 and space costs of adding document mtimes.  I guess database bloat
 should not be too bad, since it's only 64bits (?) per mail message.

Plus a Btree to index it, so figure at least 24 bytes per message.
Another issue is that values are always brought into memory with a
document, so it will consume more RAM.  But yeah, I don't think it
should be that bad.

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Synchronization success stories?

2014-04-13 Thread dm-list-email-notmuch
Tilmann Singer t...@tils.net writes:

 David Mazieres dm-list-email-notm...@scs.stanford.edu writes:
 What happens if you get a message that's been stuck in a queue for a few
 days and has an old Date: header?

 It would be missed.  I have set the timespan to look backwards for new
 mail to one month to be a bit safer against the stuck-in-queue cases,
 but mails with older Date: headers would definitely get missed.

 The current output of notmuch count * is the same on both the client
 and the server, so it seems I didn't run into this problem yet (maybe I
 was just lucky).

I've been playing around with reorganizing my maildir, and found a
couple of messages (on mailing lists) with clearly invalid dates years
in the past.  But checking with notmuch count is a good idea.  Then you
can always fall back to the slow path in the unlikely event that your
counts don't match up.  Well, except that A) count is just unique
message-IDs, not messages, and B) when synchronizing in both directions
you could still miss something.  You have to assume that the invalid
dates are only ever going to occur at one end of a synchronization
event.

 Or if you get new messages that have
 the same Message-ID as old ones?

 Is that even possible?  I thought that notmuch guarantees the uniqueness
 of indexed message ids.  The only reference I could find without trying
 to read the code was this thread id:87mwyz3s9d@star.eba from 2012,
 which supports the assumption.

Sadly, yes it is quite possible, and even opens up a slight security
issue.  Suppose I know you are on a mailing list, and some message
appears on that mailing list that I don't want you to see.  I can send
you an innocuous-looking message that just happens to have the same
message-id, and you may never see the original mailing list message.
Even better, depending on how your spam filtering is setup, if I include
the GTUBE string in my message you may never see mine or the original.

That's why with muchsync, I replicate actual mail messages, rather than
message-IDs.  Then you can always periodically check for message-IDs
that appear in more than one file.  (In fact, thought I haven't
published an interface for this, the SQL database kept my muchsync makes
it trivial to check for this and detect certain attacks.)

I understand why notmuch went with message IDs.  For instance you have
sent this reply both directly to me and to a mailing list I am
subscribed to.  So I will get two slightly different copies of the
message (one will have the standard notmuch mailing list signature, the
other won't).  And this way once I've marked it read, the message will
be read even once the second copy comes in.  But personally I'd rather
see the occasional duplicate message than risk not seeing messages.  In
particular, if the goal is to see fewer unread messages, some sort of
feature that pro-actively skips all future messages in a thread or
subthread would be more useful...

 Here is how long they take (on a machine with an SSD, which certainly
 helps):

 $ time notmuch dump --format=batch-tag | sort  /tmp/notmuch.dump
 real0m3.643s
 user0m3.593s
 sys 0m0.140s
 $ time notmuch restore  /tmp/notmuch.dump
 real0m3.719s
 user0m3.357s
 sys 0m0.357s
 $ notmuch count 
 117118

That's crazy.  I'm jealous.  Then again, this is how fast muchsync runs
(including a full database scan to detect changed messages and tags)
when there is no new mail:

$ time ./muchsync -v
[notmuch] No new mail.
synchronizing muchsync database with Xapian... 0.038506 (+0.038506)
starting scan of Xapian database... 0.039069 (+0.000563)
opened Xapian... 0.040851 (+0.001782)
scanned message IDs... 0.137647 (+0.096796)
scanned tags... 0.170404 (+0.032757)
scanned directories in xapian... 0.172100 (+0.001696)
scanned filenames in xapian... 0.172376 (+0.000276)
adjusted link counts... 0.199461 (+0.027085)
finished synchronizing muchsync database with Xapian... 0.212965 (+0.013505)

real0m0.220s
user0m0.173s
sys 0m0.023s

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Add configurable changed tag to messages that have been changed on disk

2014-04-23 Thread dm-list-email-notmuch
Austin Clements amdra...@mit.edu writes:

 A middle ground might be to use the maximum of two values: 1) the
 time-of-day at which notmuch started executing, and 2) the highest ctime
 in the database plus 100 microseconds (leaving plenty of slop to store
 timestamps as IEEE doubles with 52 significant bits).  Since the values
 will be Btree-indexed, computing the max plus one will be cheap.

 This makes me curious if you've considered how to fit this in to
 Xapian.  The Xapian query syntax supports range queries over document
 values, but within the Xapian B-tree, values are stored in docid
 order, not value order, so Xapian's range query operator is actually a
 full scan in implementation.  I assume it does this so it doesn't have
 to store both forward and inverse indexes of values.  (I spent some
 time figuring out the layout of the Xapian database and have fairly
 detailed notes if anyone's curious.)

Aside from finding the previous max time, everything else should work
identically to the date query operator and NOTMUCH_VALUE_TIMESTAMP.

Though I believe you, I'm a little surprised the values aren't indexed.
An alternative design might use terms like XCTIME where
the x's are hex digits.  But this seems a bit clunky and not using
Xapian the way it is indented to be used.

When I do a query with a giant result set ordered by date (notmuch
search --sort=oldest-first *), the first few results come back pretty
quickly, so I guess the full database scan is not an issue, at least for
~10^5 messages.

 This is still reasonably fast in practice because it's a sequential
 scan and only requires a few bytes per message, but it's probably not
 what you'd expect.  That said, Xapian does track per-value statistics
 that would suffice for the particular problem of monotonic time stamps
 (e.g., Database::get_value_upper_bound).

Oh, well in that case there is no issue.  That max is the only statistic
we need.  Everything that requires a full database scan, like get me all
messages whose properties have changed since time X, is something that
you can't do at all right now.  And in fact I'm already scanning all
100,000 message IDs AND diffing the results against a separate sqlite
database to detect changes in only 0.09 seconds (Linux) or 1.2 seconds
(32-bit OpenBSD).  This will only make that faster, and additionally
allow other people to do what I'm doing without writing a bunch of C++
code.

 In principle it would be possible to use user metadata or even
 document terms to support true B-tree range scans by ctime order, but
 I don't think it's possible to express queries over this using
 Xapian's query parser.  I've written about 90% of a (new) custom query
 parser for Notmuch that would enable this, but little things like my
 looming thesis deadline have interfered with me finishing it.

Yeah, I've been avoiding the query parser and just scanning terms and
postlists directly.  Since the lack of ctime forced me to scan the whole
database anyway, I found it much faster to scan each tag's posting list
and dump the results into sqlite than to extract tag terms on a
per-document basis the way notmuch dump does.

 Incidentally, if you are really this paranoid about time stamps, it
 should bother you that notmuch's directory timestamps only have one
 second granularity.

 This is historical (and, I agree, unfortunate).  But nobody's
 complained, so it hasn't been worth changing the libnotmuch interface
 to support sub-second directory mtimes.  However, notmuch new does
 correctly handle deliveries in the same second it runs.  If the
 wall-clock time when it starts is the same as the on-disk directory
 mtime, it skips updating the in-database directory mtime at the end.
 Hence, on the next run, it will still consider the directory
 out-of-date.  It's a bit of a hack, but it's a hack that would be
 necessary for supporting older file systems even if we did support
 sub-second timestamps.

Yeah, is kind of a problem me.  I currently scan the XFDIRENTRY terms
belonging to a directory only if the directory's notmuch mtime has
changed since the last time I examined Xapian's state.  I used to scan
the actual directories, which was fine, but not so useful because I
don't actually want to deal with messages that notmuch has not yet
indexed.  Conversely, if a directory has not changed since the last time
muchsync ran, but notmuch's idea of the directory has changed (because
someone ran notmuch new), then I do care about scanning for new/deleted
XFDIRENTRY terms.

But couldn't notmuch fix the sub-second problem in a fully backwards
compatible way?  After all, the database is already storing these mtimes
as doubles.  Even for OSes that don't support st_mtim, notmuch could
just add 0.1 seconds to the previous timestamp of a modified
directory.

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


folder and path completely broken in HEAD?

2014-05-02 Thread dm-list-email-notmuch
Hey, I'm playing around with the head of the git repository
(bc64cdce289d84be2550c4fccb1f008d15eaeb0e) to try to figure out how the
new folder: prefixes work, as folders are a critical part of how I
organize my mail.  (Since tags are not hierarchical, folders are the
best way for me to group mail to a bunch of related addresses, while
retaining the ability to separate out any mailboxes that become high
traffic.)

I'm using a pretty standard maildir++ layout.  For example, underneath
my database.path I have a bunch of mail in directories such as:

.INBOX.Main/{new,cur}
.mail.class/{new,cur}
.mail.voicemail/{new,cur}

It used to be the case that if I wanted to read all of my mail mail, I
could search for folder:mail, while to look at just voicemail, I could
say folder:mail.voicemail, etc.  Now, I can't get anything to match a
folder predicate period.  For example, using notmuch as notmuch-0.17 and
./notmuch as notmuch-0.18-rc2+2~gbc64cdc, here's what I get:

linux2$ notmuch count folder:mail
16257
linux3$ notmuch count folder:mail.class
1896
linux4$ notmuch count folder:mail.voicemail
34
linux5$ notmuch count folder:mail.voicemail/cur
34
linux6$ notmuch count folder:.mail.voicemail/cur
34
linux7$ ./notmuch count folder:mail
0
linux8$ ./notmuch count folder:.mail
0
linux9$ ./notmuch count folder:.mail.voicemail
0
linux10$ ./notmuch count folder:.mail.voicemail/cur
0
linux 11$ ./notmuch count path:.mail.voicemail
0
linux 12$ ./notmuch count path:.mail.voicemail/'**'
0
linux 13$ ./notmuch count path:.mail.voicemail/cur 
0
linux 14$ ./notmuch count folder:mail.voicemail
0

What gives?  Are the path and folder predicates completely broken, or is
something very important missing from the new notmuch-search-terms
manual page?  How can I make this work?

Thanks,
David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: folder and path completely broken in HEAD?

2014-05-02 Thread dm-list-email-notmuch
Jani Nikula j...@nikula.org writes:

 On Fri, 02 May 2014, dm-list-email-notm...@scs.stanford.edu wrote:

 I'm using a pretty standard maildir++ layout.  For example, underneath
 my database.path I have a bunch of mail in directories such as:

 .INBOX.Main/{new,cur}
 .mail.class/{new,cur}
 .mail.voicemail/{new,cur}
 ...
 Here's additional commentary on the specific queries.

 linux7$ ./notmuch count folder:mail
 0
 linux8$ ./notmuch count folder:.mail
 0

Oh, man.  That's a serious bummer.

Is there any mechanism left that would let me hierarchically group
messages?  I've got a ton of mail.* folders, and create new ones
dynamically.  I really want a mechanism to group them hierarchically, so
I can have a search that matches all current and future mail
directories.  I organized my whole mail setup around folders because a)
tags do not provide this kind of hierarchical control, and b) there
doesn't seem to be a convenient way to apply tags 100% reliably on
message delivery, whereas I *can* control the folder 100% reliably.

Worse, because of my poor performance, I was hoping to segregate
messages by year.  So it would be:

  2013/.mail.class
  2013/.mail.voicemail
  2014/.mail.class
  2014/.mail.voicemail

All the way back.  Now you are saying there will be no convenient way to
match just the mail.class part without the year?  How very
distressing.  Ugh.

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: folder and path completely broken in HEAD?

2014-05-03 Thread dm-list-email-notmuch
Mark Walters markwalters1...@gmail.com writes:

 All the way back.  Now you are saying there will be no convenient way to
 match just the mail.class part without the year?  How very
 distressing.  Ugh.

 Hi

 I am not quite sure what you are meaning by hierarchically group
 messages. Searching for path:dir/foo/bar/** should give all messages in
 all directories beneath dir/foo/bar. 

The problem is that the maildir++ spec disallows such pathnames.  If I
need compatibility with maildir++ (for instance for an imap server), at
least on a per-year basis, then my maildirs have to have names like:

   2013/.foo.foo
   2013/.foo.bar
   2013/.foo.baz
   2014/.foo.foo
   2014/.foo.bar
   2014/.foo.baz

So if I want a way to aggregate all my foo mail in a single search,
right now I just ask for folder:foo.  Will there be any equivalent in
the new notmuch?

Thanks,
David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: folder and path completely broken in HEAD?

2014-05-03 Thread dm-list-email-notmuch
Jani Nikula j...@nikula.org writes:

 It's not going to help you, but I'll mention a few of the issues the old
 folder: search had, which we also had complaints about, and which would
 have been quite hard to fix while preserving the behaviour you want. In
 short, we considered the old folder: search broken.

 Given layout:

   Foo/{cur,new}
   foo/{cur,new}
   fooing/{cur,new}
   bar/foo/{cur,new}
   cur
   new

 It was impossible to refer to the top level folder.

 It was impossible to refer to foo without also referring to Foo, fooing,
 and bar/foo.

 In your layout, if you also had 2013/.bar.foo, folder:foo would match
 that as well. To not match that, you would have to include each
 folder:.foo.xxx in the search.

First, thanks for the response.  The responsiveness and friendliness of
the notmuch mailing list goes a long way towards compensating for any
missing features / customizability one might want.

I was already aware of the issues you raise, and had worked around them
by just renaming all my mail folders.  I agree that searching for a
particular folder is crucial functionality, and found it weird that I
had to abandon my main top-level mailbox (which I just renamed
.INBOX.Main).

However, currently it seems strange that there are *two* different
search terms (folder and path), and that neither one lets you search for
a portion of your folder name.  Admittedly the old folder code was one
of the parts of the notmuch source that didn't make sense to me (and now
I'm starting to understand why--e.g., the fact that it used stemming,
for instance, was just weird and maybe accidental).

I may be able to hack around this problem in the emacs lisp part or with
a wrapper script.  I'm already having to defadvice around
notmuch-call-notmuch-sexp to implement features I'm missing from
wanderlust and gnus (e.g., the inability to specify regexps matching all
my email addresses).

But to help me understand the current design, can you answer a couple of
questions?


First, are there people out there who do not use a collection of maildir
directories, with all mail in cur and new?  If not, why does notmuch try
to find mail in non-mail-directories, and why do you need search terms
differentiating new and cur?  Conversely, I find it particularly weird
that there's no convenient way to say stop trying to index stuff that
isn't in a maildir (cur or new).  You can do the inverse and blacklist
files, but then I end up with stuff like this in my .notmuch-config:


ignore=dovecot-keywords;dovecot-uidlist;dovecot-uidvalidity;dovecot.index;dovecot.index.cache;dovecot.index.log;dovecot.index.log.2;dovecot.index.search;dovecot.index.search.uids;maildirfolder;

(And still I think there are other junk files that get left around by my
imap server or other software, so notmuch new still repeatedly tries to
index junk whenever I run it.)


Second, does anyone out there have a collection with more than a few
thousand maildirs?  I understand some people index millions of messages,
but even with tiny maildirs of a few hundred messages each, that's a
mere 10,000 maildirs.  So for that, who needs any kind of Xapian
indexing fanciness or dedicated XFOLDER prefix?  Scanning the entire set
of XDIRECTORY terms to apply an arbitrary predicate should take
negligible time.  So all you really need is a boolean search term
containing the directory docid (or with the existing schema the ability
to do a prefix search on XFDIRENTRYn:*).

Thanks,
David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


What to people use for calendar invites?

2015-04-16 Thread dm-list-email-notmuch
I've been running into this issue lately where I agree to meet people
and we say it's confirmed, but if don't send them a calendar invite of
mime type text/calendar, then it's as if we never agreed and they don't
show up.  I get, Oh, you never sent me a calendar invite so it wasn't
in my calendar.

I'm wondering if others have this problem and have figured the easiest
way to integrate notmuch with some kind of calendaring software that
generates ics or text/calendar attachments.

Thanks,
David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Enabling and disabling maildir.synchronize_flags

2015-07-01 Thread dm-list-email-notmuch
Sorry if this question is answered somewhere, but I'm wondering:  What
is the best way to enable and disable maildir.synchronize_flags?

It seems that disabling it should simply be safe.  But re-enabling, one
risks losing tags, as the next notmuch new will cause old maildir flags
to override the xapian database.  So that suggests something like:

   notmuch dump  backup
   notmuch config set maildir.synchronize_flags false
   # Do I need to run notmuch new here?
   notmuch restore  backup

Is that safe?  The man page suggests one additionally need to run
notmuch new before running notmuch restore.  All of this is pretty slow.
Is there a more efficient way?

A one or two sentence clarification in the notmuch-config man page might
be helpful to people contemplating playing with this switch.  The
default is on, to I suspect it costs a lot of performance.  I've been
afraid to turn it off for fear that I won't be able to undo this
cleanly.

Thanks,
David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Enabling and disabling maildir.synchronize_flags

2015-08-16 Thread dm-list-email-notmuch
David Bremner da...@tethera.net writes:

 David Mazieres dm-list-email-notm...@scs.stanford.edu writes:
 So my question remains, what's the easiest safe way to re-enable
[ 2 more citation lines. Click/Enter to show. ]
 synchronize_flags after disabling it?  (Safe meaning it won't change any
 tags.)  It could be that there's a very simple answer, in which case
 sticking it in the man page might be nice.

 I can't think of a simple, safe, and fast answer.

Okay, thanks.  At least I wasn't missing something obvious.

 2) when the lastmod changes go in, it seems like you could run the first
notmuch new after enabling tag synchonizing, and dump only the tag
changes since a checkpoint lastmod value. This would allow rolling
back the unwanted tag changes.

Indeed, one of many reasons I'm eagerly awaiting lastmod changes.

 [1]: see this potential test, if for some reason we wanted to
  guarantee this behaviours.

If we did want this, I'm assuming it would take the form of a new option
to notmuch new (--override-flags) which says to do the synchronization
in the other direction (Xapian - Maildir)?  There would be benefit to
having such a flag, but I don't know how hard it would be to implement,
so I can't do the cost/benefit analysis.

As a kind of aside, one reason people might want to synchronize flags is
for mobile device support.  I don't regularly access my email from my
mobile phone, but on those rare occasions when I might need to, I set up
an IMAP server and use an imap client on the phone.  I wonder if anyone
has thought about implementing an IMAP-ish server directly on top of
libnotmuch.  (I say IMAP-ish because the obvious SEARCH command
implementation wouldn't be RFC3501-compliant, but who cares when notmuch
has something better.)

Does anyone else use both notmuch but also access email from a mobile
device?  If so what do you do?

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-08-31 Thread dm-list-email-notmuch
Amadeusz Żołnowski  writes:

>> So... based on all the evidence so fare the culprit seems to be that
>> something is moving mail files into your Spam folder on the client.
>> If that rings any bells and solves the problem, great.  If not, here
>> is what we need to do to track it down further.
>
> I have followed you hints to track down the issue.  All of these
> messages are spam. What I suspect follows.
>
> All of these files have been placed to new/ subdir by maildrop and
> during posthook (afew) have been stripped of any tags besides 'spam'
> tag, in particular 'unread' tag has been removed, but files still remain
> in new/ subdir.  So... what had to happen is that during muchsync these
> messages have been discovered as already read, so they don't belong to
> new/ but must be moved to cur/.  And this is what happened on client
> side.  During next muchsync these changes had to be pushed to server,
> i.e. move from new/ to cur/.

Right.  Muchsync checks to see if maildir.synchronize_flags is true on
the client.  If it is, then muchsync calls
notmuch_message_tags_to_maildir_flags after setting the flags (which is
the same as what would happen if you set the tags manually with the
"notmuch tag" command).

A maildir file in the new/ directory can't have any tags (except the
implicit unread flag, which is indicated by the absence of "S" in the
end of the filename).  So the notmuch_message_tags_to_maildir_flags()
function is renaming the file to the cur subdirectory, and then
propagating this rename back to the server.

The one thing I'm still unclear on is whether afew is running on the
client of the server.  If you are running it on the client, then this
makes sense.  If you are running it on the server, then somehow afew
must not be respecting the maildir.synchronize_flags setting.
Otherwise, the file should already be moved to the cur directory after
having the unread tag stripped off on the server.  I guess the other
option is that your maildir.synchronize_flags false on the server and
true on the client.

> So if my assumptions are correct, actually there is no issue!  I would
> just have to adjust afew filtering to prevent this behaviour.

Right.  You could have afew preserve the unread flag on spam.
Alternatively, you could just disable maildir.synchronize_flags on both
the client and server.  Finally, you could just accept the performance
penalty, as one would hope that this is a one-time thing and that
usually you don't have 5000 new spam messages every time you synchronize
your mail.

David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: muchsync files renames

2015-09-01 Thread dm-list-email-notmuch
David Bremner  writes:

> Amadeusz Żołnowski  writes:
>
>> What's more surprising is that there is a test case in notmuch test
>> suite which test whether after modifing tag of a mail it is moved from
>> new/ to cur/. Yes, it should be moved on any tag modification if I
>> understand correctly. But it seems it does not for my maildirs...
>>
>
> If I understand the code correctly, this movement will only happen when
> one of the maildir-flag-equivalent tags is changed. I haven't dug ack
> through the archives, but I think mutt uses presence in new/ as some
> kind of extra unseen state, so people requested not to move files until
> needed.

Can you explain how/where this is implemented?  I would like muchsync to
do exactly what notmuch does, and ideally without replicating its logic,
if I can just have libnotmuch handle this.  Currently, my code looks
something like this:

  notmuch_message_freeze()
  notmuch_message_remove_all_tags()
  notmuch_message_add_tag(); notmuch_message_add_tag(); ...
  if (synchronize_tags)
notmuch_message_tags_to_maildir_flag()
  notmuch_message_thaw()

And what we're finding is the above code causes the message to move from
new/ to cur/, while the "notmuch tag" command does not, even while
changing between the same before and after tag sets.

Any ideas?

Thanks,
David
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Problems with unicode characters under emacs and Xorg

2020-11-03 Thread dm-list-email-notmuch
Tomi Ollila  writes:

> Emacs versions involved ?

I'm using the latest version with arch linux, namely emacs 27.1-3.
Also, for what it's worth, "fc-list | wc -l" shows 4769 fonts installed
on my system.  Could that be too many if emacs does some sort of linear
search for characters?

Thanks,
David
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


Re: Problems with unicode characters under emacs and Xorg

2020-11-02 Thread dm-list-email-notmuch
David Edmondson  writes:

> I haven't seen this. Threads with a lot of complex HTML content (lots of
> nested tables, for example) can take a long time to render for me, but
> that is generally interruptable.
>
> Could you share one of these messages, or a sufficiently similar test
> case?

Thanks for the reply.  I can send one of these emails to you privately
if necessary, as it might contain semi-sensitive information.  However,
I think all you need is the subject line.  For example:

Subject: 
=?UTF-8?B?RGF2aWQ6IEhvdy1UbyBIaXJlIHRoZSBCZXN0IFJlbW9kZWxpbmcgQ29udHJhY3RvciDwn5Od?=

That subject line alone triggers the problem, because any search
returning that thread triggers the problem.  When decoded, the subject
line ends with unicode code point 0x1F4DD (MEMO).  Indeed, if I open up
a fresh emacs, and, independent of notmuch, type "C-x 8 RET memo RET",
it causes the emacs to hang for a minute or so.

Arguably this is a limitation of emacs or fontconfig, or I've installed
too many fonts on my system, or I've installed too few fonts (because
after all that computation it just renders a box with hext digits 01F4DD
in it instead of showing the MEMO icon).  However, the problem only
happens with notmuch, because notmuch is the only emacs functionality I
need that renders anything other than a very limited set of unicode
characters.  So if there's any way either to workaround the problem, or
to copy whatever other notmuch users are doing (is there some particular
unicode font I should just install on my system?), I would be very
happy.

Thanks,
David
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


Re: Problems with unicode characters under emacs and Xorg

2020-11-02 Thread dm-list-email-notmuch
David Edmondson  writes:

> This works fine for me, and I get an appropriate character (not just the
> hex box).
>
> According to `describe-char' it's rendered using the Symbola font. Do
> you have that installed? (It's the "font-symbola" package on Debian I
> believe.)

I just installed the ttf-symbola package from AUR and ran fc-cache (not
sure if necessary).  Now the problem is completely gone.  Not only that,
but I even get the little memo symbol instead of a box with the hex code
point number.

Thank you so much!  This was driving me nuts for months.

David
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org