[notmuch] Quick thoughts on a notmuch daemon

2009-12-08 Thread Michiel Buddingh'
On Thu, 03 Dec 2009 14:27:05 -0800, Carl Worth  wrote:
> A simple solution would be a notmuch daemon that can accept commands on
> stdin, (in basically the exact same form as the current notmuch
> command-line interface). If the daemon does the job of periodically
> incorporating new mail, then the only command necessary to solve (1)
> above would be the tag command.

If you add a second pipe for notmuch to broadcast information about
events (such as new mail being indexed) you could farm out most of the
logic that will increasingly clutter up notmuch-new.c to an external
daemon.

Just the mailid and path would be enough for people to implement their
own tagging based on directory, Maildir flags or (for all I care)
Bayesian content filtering with relative ease.

Just a thought
-- 
Michiel Buddingh'


[notmuch] [PATCH] notmuch: Add Maildir directory name as tag name for messages

2009-12-06 Thread Michiel Buddingh'

First of all, apologies for taking so long to get back to this.

On Fri, 27 Nov 2009, Carl Worth  wrote:
> The auto-detection is just three additional stats (at most) for each
> directory, right? That seems cheap enough to me.

If that's cheap enough, then I won't disagree with auto-detection.  
Jan Janak's patch seems to take most of the disk access cost out of it,
in any case.

> That seems orthogonal to me. Would the dovecot index files be easy to
> skip with a pattern-based blacklist?

Yes, and that's a much more elegant solution.

> > I'll be happy to implement them, although I'd like for others to
> > chime in on the configure-as-Maildir vs. autodetect-Maildir issue.
> > And thanks for your patience in working through my patch.

I didn't mean to call a vote--rather to solicit the opinions of others
with possibly even more exotic mail storage configurations.

A new patch is attached.  Apologies for the rather verbose Maildir
handling logic, but I couldn't find a way to minimize the calls to
is_maildir that was both neat and readable.

-- 
Michiel

---
 notmuch-client.h |1 +
 notmuch-new.c|   93 +++--
 2 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/notmuch-client.h b/notmuch-client.h
index 50a30fe..7bc84a1 100644
--- a/notmuch-client.h
+++ b/notmuch-client.h
@@ -77,6 +77,7 @@ typedef struct {
 int saw_read_only_directory;
 int output_is_a_tty;
 int verbose;
+int tag_maildir;

 int total_files;
 int processed_files;
diff --git a/notmuch-new.c b/notmuch-new.c
index 9d20616..8742ab4 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -109,6 +109,60 @@ is_maildir (struct dirent **entries, int count)
 return 0;
 }

+/* Tag new mail according to its Maildir attribute flags.
+ *
+ * Test if the mail file's filename contains any of the
+ * standard Maildir attributes, and translate these to
+ * the corresponding standard notmuch tags.
+ *
+ * If the message is not marked as 'seen', or if no
+ * flags are present, tag as 'inbox, unread'.
+ */
+static void
+derive_tags_from_maildir_flags (notmuch_message_t *message,
+   const char * path)
+{
+int seen = FALSE;
+int end_of_flags = FALSE;
+size_t l = strlen(path);
+
+/* Non-experimental message flags start with this */
+char * i = strstr(path, ":2,");
+i = (i) ? i : strstr(path, "!2,"); /* This format is used on VFAT */
+if (i != NULL) {
+   i += 3;
+   for (; i < (path + l) && !end_of_flags; i++) {
+   switch (*i) {
+   case 'F' :
+   notmuch_message_add_tag (message, "flagged");
+   break;
+   case 'R': /* replied */
+   notmuch_message_add_tag (message, "answered");
+   break;
+   case 'D':
+   notmuch_message_add_tag (message, "draft");
+   break;
+   case 'S': /* seen */
+   seen = TRUE;
+   break;
+   case 'T': /* trashed */
+   notmuch_message_add_tag (message, "deleted");
+   break;
+   case 'P': /* passed */
+   notmuch_message_add_tag (message, "forwarded");
+   break;
+   default:
+   end_of_flags = TRUE;
+   break;
+   }
+   }
+}
+
+if (i == NULL || !seen) {
+   tag_inbox_and_unread (message);
+}
+}
+
 /* Examine 'path' recursively as follows:
  *
  *   o Ask the filesystem for the mtime of 'path' (path_mtime)
@@ -142,6 +196,7 @@ add_files_recursive (notmuch_database_t *notmuch,
 notmuch_status_t status, ret = NOTMUCH_STATUS_SUCCESS;
 notmuch_message_t *message = NULL;
 struct dirent **namelist = NULL;
+int maildir_detected = -1; /* -1 = unset */
 int num_entries;

 /* If we're told to, we bail out on encountering a read-only
@@ -189,13 +244,37 @@ add_files_recursive (notmuch_database_t *notmuch,
if (strcmp (entry->d_name, ".") == 0 ||
strcmp (entry->d_name, "..") == 0 ||
(entry->d_type == DT_DIR &&
-(strcmp (entry->d_name, "tmp") == 0) &&
-is_maildir (namelist, num_entries)) ||
-   strcmp (entry->d_name, ".notmuch") ==0)
+strcmp (entry->d_name, ".notmuch") == 0))
{
continue;
}

+
+   /* If this directory is a Maildir folder, we need to
+* ignore any subdirectories marked tmp/, and scan for
+* Maildir attributes on messages contained in the sub-
+* directories 'new' and 'cur'. */
+   if (maildir_detected != 0 &&
+   entry->d_type == DT_DIR &&
+   ((strcmp (entry->d_name, "tmp") == 0) ||
+(strcmp (entry->d_name, "new") == 0) ||
+(strcmp (entry->d_name, "cur") == 0))) {
+
+   /* is_maildir scans the entire directory.  No need to
+  do this more than once, if at all */
+   if (maildir_detected == -1) {
+

[notmuch] [PATCH] notmuch: Add Maildir directory name as tag name for messages

2009-11-26 Thread Michiel Buddingh'
Carl Worth  wrote:
> > +" The other value is 'storage_type', which can currently be set to\n"
> > +" 'maildir' or 'none'.\n";
>
> This part of the patch I don't like. I've got a mail collection spanning
> over a decade, and it's seen a lot of strange things. Most of my mail is
> in maildir format, but not quite all of it. And I actually like the
> ability to just shove random new messages into the mail store manually
> without having to create a maildir name for it.
>
> So I don't think a global configuration makes sense here. Meanwhile,
> it's really easy to detect the presence of a maildir. Whenever we see
> child directories of "cur", "new", and "tmp" then we should turn on the
> processing of maildir flags for when processing mail in "cur" and "new".

  I considered that approach; ideally, we could test for the presence of
all three of cur, tmp and new--but this is rather messy to do in the
current treewalk structure.  Taking any one of them as proof positive of
a Maildir might lead to unpleasant surprises--it's not all that incon-
ceivable for someone to name a mail folder 'tmp'.

  There's another matter; Some mail stores will place (large) index files
in folder roots, i.e. one level above cur/, tmp/ and new/.  Looking
at the ones dovecot (an IMAP server) uses, I can make out a from header,
a subject header, and a message-id, as plaintext in the first 100k or
so.  It's not all that inconceivable that notmuch might register it as
a 'real' email, with unpleasant consequences for the index.

  I've seen some patches fly by that add support for multiple mail
stores.  Turning on Maildir support on a per-directory basis might
resolve that problem while still supporting heterogenous mail archives
to some degree.  I am not convinced we can do the right thing
automatically without causing some grief to a subset of users.

> > @@ -257,7 +262,7 @@ notmuch_config_open (void *ctx,
> > talloc_free (email);
> > }
> >  }
> > -
> > +
> >  /* When we create a new configuration file here, we  add some
> >   * comments to help the user understand what can be done. */
> >  if (is_new) {
>
> [nit] Trailing whitespace inserted there as well.

> Hmm... I was going to say that git ships with a pre-commit hook you can
> turn on that checks for trailing whitespace and aborts the commit if
> it's present. But it looks like the currently shipping pre-commit.sample
> hook doesn't do this anymore.

Haven't tested it, but it seems you can put

[core]
whitespace = trailing-space,space-before-tab

into your ~/.gitconfig now.  I've also set emacs to mark trailing
whitespace with big red markers.

> OK, now we're into the meat of things. Clearly, you're directly
> supporting the documented flags of maildir. But we need to do a few
> things differently here. Most importantly, notmuch is already using an
> "unread" tag, so maildir's S flag should map that *that* rather than
> adding new "unseen" and "seen" flags. So messages with the S flag would
> not get the "unread" tag and messages without S would get the "unread"
> tag.

When writing the patch, I assumed there might be a minor (but important)
distinction between marking a mail 'seen' (i.e. the MUA storing the fact
that the file has been visited) and 'read' (i.e. the user marking the
contents of a mail as being read and understood).  As I found out later,
notmuch's interpretation of 'read' and 'unread' is the former, so there
is no distinction.

> The "flagged" and "replied" tags seem reasonable enough. But for
> "trashed" and "passed" I think I'd rather see the tag names as "deleted"
> and "forwarded". (Since I can imagine adding commands to notmuch for
> "delete" and "forward" but not for "trash" nor "pass").

Fair enough.

> Oh, and setting the "inbox" tag correctly here based on the maildir tags
> is the final and most important thing. It looks like that's missing from
> the above. So, a missing "S" flag should map to adding both the "inbox"
> and "unread" tags.

Makes sense, will do.

> > +   if (state->storage_type == MAILDIR) {
> > +   char * leaf = basename(next);
>
> You could save the basename call by examining the leaf name when it is
> available as a standalone string up in the caller.

Which would require testing with S_ISDIR twice, which is uglier, but
essentially free, so I'll grant it's the better thing to do.

> So this patch is close, but needs a few fixes.

I'll be happy to implement them, although I'd like for others to chime
in on the configure-as-Maildir vs. autodetect-Maildir issue.  And thanks
for your patience in working through my patch.

-- 
Michiel Buddingh'


[notmuch] [PATCH 1/2] lib/message: Add function to get maildir flags.

2009-11-22 Thread Michiel Buddingh'
Stefan Schmidt  wrote:
> > This function should interpret the flags that it finds and return a
> > suitable set of notmuch tags. I'd suggest that 'unread' messages get
> > both 'unread' and 'inbox' tags, as Maildir doesn't distinguish between
> > 'don't show this to be by default again please' and 'I've read this
> > message'. It seems best to hide the maildir-specific details inside the
> > library instead of exposing them.
>
> Thanks for the review. On a second thought the interface was really a bit 
> ugly.
> :)
>
> I'm just back to my box and going through the outstanding mails shows me that
> Michiel Buddingh has a more complete patch on the
> convert-maildir-flags-into-tags issue which Carl has tagged for review. Will
> wait what comes out of it and if anything is left for me to. :)

Apologies.  In my haste to cover up my appalling and incorrect first patch, I
neglected to review the archives to see if someone had already done this. Sorry
for stealing your thunder.

Michiel


[notmuch] [PATCH] notmuch: Add Maildir directory name as tag name for messages

2009-11-22 Thread Michiel Buddingh'
Dirk-Jan C. Binnema  wrote:
> Michiel> +
> Michiel> +static void
> Michiel> +derive_tags_from_maildir_flags (notmuch_message_t 
> Michiel> *message, const char *
> Michiel> path)
>
> I see you don't handle the "N" -- is that deliberate? Also, a 
> minor addition may to also allow for '!' instead of ':' as a 
> separator, as that's the semi-official way to use Maildirs on 
> (V)FAT filesystems (which don't allow for colons in filenames).

Not deliberate.  Simply unaware of the "N" flag, nor aware of 
practices for storing Maildirs on (V)FAT.

I've used only this file as a reference.
http://cr.yp.to/proto/maildir.html

mvg,
Michiel Buddingh'


[notmuch] [PATCH] notmuch: Add Maildir directory name as tag name for messages

2009-11-22 Thread Michiel Buddingh'
Carl Worth  wrote:

> > A Maildir-aware notmuch could incorporate this to be far more
> > resistant to bulk mail moves done by other clients, by using
> > filename lookups to avoid accessing and parsing the mail
> > files themselves.

> I don't think opening a file to read out a message ID will ever be
> a bottleneck. But yes, we could take advantage of the unique name
> if we insisted that the storage have it.

I'm not so sure.  On traditional unix-like filesystems, every file 
access is another potential disk seek.  

People use a lot of different strategies for keeping their mail 
accessible and performant; you yourself, I gather, use non-semantic
directories mainly intended to keep the number of files per directory
low; others store their mail per month, or per year.

I might choose to move all my mail from 2008 from my INBOX to the 
folder Archive.2008; such moves may change the paths of thousands 
of messages at a time.

> Personally, I still regularly end up indexing stuff that's not 
> in proper maildir format, (like just manually copying a mail
> file "foo" down into a directory that I know isn't being delivered 
> to by any MDA but that I want notmuch to index.) Maybe that's just
> me, because I'm always bringing up little things for debugging, 
> etc. But it is convenient at least.

Oh true.  And it occurs to me that notmuch is a quite sensible 
companion tool to MH users (if they're still around in any numbers)

> Actually, I don't think that's true at all. Notmuch is definitely
> intended to become a lot more than it is right now. And if it's not
> making it easy for you to deal with mail the way you'd like to, then
> we definitely do want to look into expanding notmuch to be able to
> address that.

Thanks for your consideration; on the other hand, I do think that it
is a good idea not to make matters more complex than they need to be,
so I can certainly sympathise with the principles you've set for this
project.

regards,
Michiel Buddingh'


[notmuch] [PATCH] notmuch: Add Maildir directory name as tag name for messages

2009-11-22 Thread Michiel Buddingh'

On Sun, 22 Nov 2009 05:04:56 +0100, Carl Worth  wrote:
> Hi Michel, welcome to Notmuch!

Thanks, and apologies for the accidental base64 encoding.

First things first:

>> In the mean time, I've made a smaller, hopefully more harmless 
>> patch to let 'notmuch new' mark messages stored in a Maildir 'cur'
>> folder as 'read' rather than 'unread'.
> 
> Can others who have more experience weigh in here? Will this do the
> right thing for you? Do mail clients wait to move things into "cur"
> after the user has actually read them, or is that just a place where
> they are moved when the MUA _receives_ them?

You're absolutely right, and I'm a fool (because I _knew_ this, but
forgot).  Maildir stores flags (seen, replied, flagged, trashed,
passed) in file names.

On the positive side, this allows us to map these flags onto tags,
at least for indexing (the patch at the bottom implements this), and,
if I can pry you away from your principles, later for modification
as well.

>> Any attempt to match tags up to directories will eventually have 
>> to deal with with the fact that tags can't be neatly mapped onto 
>> them.  If I remove a directory-tag from a message, does this 
>> mean the message is removed from that directory?  What if a 
>> message has two directory-tags, does it mean it's present in both
>> directories?
> 
> Right. We definitely don't want a strong mapping here.

I propose that the maildir 'storage_type' could make an exception for
standard Maildir flags.  It'll take relatively little effort to
special-case the abovementioned flags, and it'd be a huge boon to
interoperability.

>> At the same time, this kind of interoperability would be highly
>> desirable to those of us who access their mail using other  
>> clients (webmail, mobile phones, etc.) that expect hierarchical
>> ordering.
> 
> That kind of thing is going to be "harder".
> 
> So far we're trying to stick with the principle that notmuch itself
> doesn't mess with the data store.

I respect your desire to stick to that principle.  But I also know 
that purity and simplicity, generally speaking, are unattainable
luxuries for most applications that handle mail.

> But then, we also want notmuch to be
> very scriptable, so someone might write a tool that uses notmuch search
> to export a set of hierarchical maildirs based on the tag names. (These
> could even just be populated with symlinks like mairix does.) So
> something like that could be really useful for integrating.

That is a very interesting idea.  On the other hand, interoperability
with Maildir mail stores is unlikely to be a corner case.  The MTA is
probably going to deliver new mail to a Maildir, procmail understands it,
etc.  I'd feel more comfortable relegating this integration to a 
scripted glue layer if I knew for certain such a glue layer would be
up to the task.

> I'm very much of the opinion that the user shouldn't care at all about
> the storage of the actual mail files that notmuch indexes.

The user certainly shouldn't, but I'm not sure that notmuch can remain
as agnostic about the actual storage of messages as planned.

Another thing; notmuch currently indexes by message-id (or SHA-1 hash
if that is not present).  Maildir, on the other hand, uses file names
as stable (when stripped of the parts trailing the colon) unique 
(knock on wood) identifiers.  A Maildir-aware notmuch could incorporate
this to be far more resistant to bulk mail moves done by other clients,
by using filename lookups to avoid accessing and parsing the mail 
files themselves.

I should re-iterate that I'm new to notmuch, and it's obvious that I'm
trying to beat it into becoming something it was never intended to be;
on the other hand, I'd like people to chime in on this.

again via webmail,
Michiel Buddingh'

---
 notmuch-client.h |   10 ++
 notmuch-config.c |   33 +--
 notmuch-new.c|   95
+++--
 3 files changed, 124 insertions(+), 14 deletions(-)

diff --git a/notmuch-client.h b/notmuch-client.h
index ea77686..c39be06 100644
--- a/notmuch-client.h
+++ b/notmuch-client.h
@@ -69,12 +69,16 @@
 #define STRNCMP_LITERAL(var, literal) \
 strncmp ((var), (literal), sizeof (literal) - 1)

+enum storage_type { UNSET, NONE, MAILDIR };
+
 typedef void (*add_files_callback_t) (notmuch_message_t *message);

 typedef struct {
 int ignore_read_only_directories;
 int saw_read_only_directory;

+enum storage_type storage_type;
+
 int total_files;
 int processed_files;
 int added_messages;
@@ -179,7 +183,13 @@ notmuch_config_set_user_other_email (notmuch_config_t
*config,
 const char *other_email[],
 size_t length);

+enum storage_type
+notmuch_config_get_storage_type (notmuch_