[notmuch] [PATCH] Add post-add and post-tag hooks

2009-12-22 Thread Olly Betts
Tomas Carnecky writes:
> #if defined(__sun__)
>   ... sprintf, stat etc
> #else
>   (void) path;
>   return dirent->d_type == DT_DIR;
> #endif

Rather than a platform-specific check, it would be better to check if DT_DIR
is defined.

Beware that even on Linux (where the d_type field is present), it may always
contain DT_UNKNOWN for some filesystems, so you really should check for that
case and fall back to using stat() instead.

Cheers,
Olly



[notmuch] Missing messages breaking threads

2009-12-22 Thread Olly Betts
Carl Worth writes:
> We don't have any concept of versioning yet, but it would obviously be
> easy to have a new version document with an increasing integer.

Adding a magic document for this isn't ideal as you have to make sure
it can't appear in search results, etc.

This is just the sort of thing which Xapian's "user metadata" is there
for.  It's essentially a key/value store which is versioned along with
the rest of the Xapian database.  So to set it:

  database.set_metadata("version", "1");

And to read (and default if not set):

  string version = database.get_metadata("version");
  if (version.empty()) version = "0";

Cheers,
   Olly



[notmuch] [PATCH] Add an "--output=(json|text|)" command-line option to both notmuch-search and notmuch-show.

2009-12-22 Thread Carl Worth
On Fri, 18 Dec 2009 20:36:34 -0400, David Bremner  wrote:
> It's a detail, but could you choose two names that are not substrings of
> each other?  Eventually we do want tab completion on the command line to
> work :).

Yes, that's a good point.

> Also, "search --for tags foo" suggests to me that
> searching for tags matching foo.  What about using --output for that?

OK. "--output" sounds good to me here.

> One thing that is not completely clear to me at this point is what the
> difference is between 
> 
> notmuch search --for messages  search-terms
> 
> and 
> 
> notmuch show search-terms

So, "notmuch show " is clear enough---it works as it does
today.

The new command, ("notmuch search --output=messages"), would be quite
different. It would have single-line output for each message, (as
"notmuch search" has single line-output already, but for threads by
default). You can see behavior like this in the "notmuch
search-messages" command for which I sent a patch a while ago, (but have
never merged).

The idea is that "notmuch search" would always give single-line output
suitable for various kinds of processing.

For example. How much mail have I sent?

notmuch search --output=messages tag:sent | wc -l

That's something you can't do with a thread-based search, (and it's not
convenient to get a robust result from "notmuch show").

Once this is combined with a new --format to select what gets printed, I
can imagine a lot of useful things, like collecting email addresses:

notmuch search --output=messages --format="${FROM}" to:cworth at 
cworth.org

or whatever. I can imagine a lot of different queries I'd like to be
able to make of my mail store with things like this.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20091222/409c9d49/attachment.pgp>


[notmuch] [PATCH] JSON output for notmuch-search and notmuch-show.

2009-12-22 Thread Carl Worth
On Fri, 18 Dec 2009 10:47:33 -0800, Scott Robinson  
wrote:
> Resubmit a full patch, or submit another one on top of it?

A new full patch would be great, thanks!

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20091222/c075c46d/attachment.pgp>


[notmuch] Missing messages breaking threads

2009-12-22 Thread Carl Worth
On Tue, 22 Dec 2009 22:48:25 + (UTC), Olly Betts  wrote:
> This is just the sort of thing which Xapian's "user metadata" is there
> for.  It's essentially a key/value store which is versioned along with
> the rest of the Xapian database.  So to set it:
> 
>   database.set_metadata("version", "1");
> 
> And to read (and default if not set):
> 
>   string version = database.get_metadata("version");
>   if (version.empty()) version = "0";

Thanks, Olly!

That is exactly what we'll want here, and is much better than a magic
document.

-Carl (grateful to have a Xapian expert keeping watch on the list)
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20091222/aec15b3c/attachment.pgp>


[notmuch] [PATCH] Add post-add and post-tag hooks

2009-12-22 Thread Tomas Carnecky
On 12/22/09 3:56 AM, Tomas Carnecky wrote:
 > The post-add hook is run by 'notmuch new' after each new message is 
added,
 > post-tag is run after a tag has been added or removed. The hooks are 
stored
 > in the users home directory (~/.notmuch/hooks/).
 >
 > Since post-tag is run unconditionally every time a new tag is added 
or removed,
 > that means it is also invoked when 'notmuch new' adds the two implicit
 > tags (inbox, unread). So make sure your scripts don't choke on that 
and can
 > be both executed in parallel.

What are these good for? I (try to) use these two hooks to automatically 
tag messages. But not in the usual way, I don't use static scripts, I 
use a spam filter. I hope to be able to teach it to classify the 
messages, not only spam/ham but also add tags such as patch (does that 
message contain a patch?), tag messages based on which mailing lists the 
messages belong etc.

I use dspam as the spam filter. Each tag is actually a virtual user that 
exists in dspam. When adding new messages dspam classifies the mails and 
I assign the tags based on the result. If dspam deemed the message Spam 
then I set the tag. To train dspam I use the post-tag hook: whenever I 
change a tag (for example add 'spam' to a falsely unrecognized spam), 
the post-tag hook retrains dspam.

Since the post-add hook is running synchronously with 'notmuch new', 
this adds quite a bit overhead. Depending on how fast the spam filter 
is, it adds more or less time to do the import of new messages. It also 
depends on how many tags you want to assign - dspam has to run once for 
each tag to see if the tag should be assigned or not.

tom

--- >8 --- post-add
#!/bin/bash

# This is so that the post-tag doesn't trigger retraining!
export NOTMUCH_POST_ADD=1

MESSAGEID=$1
FILENAME=$2

# Array of tags.
tags=( spam )
for tag in "${tags[@]}"; do
 RESULT="$(/opt/dspam/bin/dspam --user $tag --deliver=summary < 
$FILENAME)"

 if echo $RESULT | grep -q 'result="Spam";'; then
 echo $tag
 fi
done

# I remove the inbox flag from all new messages and keep only 'unread'
echo "-inbox"
--- >8 ---

--- >8 --- post-tag
#!/bin/sh

if [ "$NOTMUCH_POST_ADD" ]; then
 echo "Exiting due to running in post-add"
 exit
fi

MESSAGEID=$1
FILENAME=$2
TAG=$3
ADDREMOVE=$4

if [ "x$ADDREMOVE" = "xadded" ]; then
 CLASS="spam"
else
 CLASS="innocent"
fi

/opt/dspam/bin/dspam --user $TAG --source=error --class=$CLASS < $FILENAME
--- >8 ---



[notmuch] [PATCH] Add post-add and post-tag hooks

2009-12-22 Thread Tomas Carnecky
The post-add hook is run by 'notmuch new' after each new message is added,
post-tag is run after a tag has been added or removed. The hooks are stored
in the users home directory (~/.notmuch/hooks/).

Since post-tag is run unconditionally every time a new tag is added or removed,
that means it is also invoked when 'notmuch new' adds the two implicit
tags (inbox, unread). So make sure your scripts don't choke on that and can
be both executed in parallel.

Signed-off-by: Tomas Carnecky 
---
 lib/message.cc |   45 ++
 notmuch-new.c  |   66 
 2 files changed, 111 insertions(+), 0 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 49519f1..bcd8abb 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -664,6 +664,47 @@ _notmuch_message_remove_term (notmuch_message_t *message,
 return NOTMUCH_PRIVATE_STATUS_SUCCESS;
 }

+/* Run the post-tag hook */
+static void
+post_tag_hook (notmuch_message_t *message, const char *tag, int added)
+{
+/* Skip tags that notmuch itself assigns to new messages */
+const char *skip[] = {
+"inbox", "unread"
+};
+
+for (int i = 0; i < sizeof (skip) / sizeof (skip[0]); ++i) {
+if (strcmp(skip[i], tag) == 0)
+return;
+}
+
+char proc[PATH_MAX];
+snprintf (proc, PATH_MAX, "%s/.notmuch/hooks/post-tag", getenv("HOME"));
+if (access (proc, X_OK))
+return;
+
+int pid = fork ();
+if (pid == -1)
+return;
+
+/* Wait for the hook to finish. This behaviour might be changed in the
+ * future, but for now I think it's better to take the safe route. */
+if (pid > 0) {
+waitpid (0, NULL, 0);
+return;
+}
+
+const char *filename = notmuch_message_get_filename (message);
+const char *message_id = notmuch_message_get_message_id (message);
+
+const char *args[] = {
+proc, message_id, filename, tag, added ? "added" : "removed", NULL
+};
+
+execv (proc, (char *const *) );
+exit (0);
+}
+
 notmuch_status_t
 notmuch_message_add_tag (notmuch_message_t *message, const char *tag)
 {
@@ -684,6 +725,8 @@ notmuch_message_add_tag (notmuch_message_t *message, const 
char *tag)
 if (! message->frozen)
_notmuch_message_sync (message);

+post_tag_hook (message, tag, 1);
+
 return NOTMUCH_STATUS_SUCCESS;
 }

@@ -707,6 +750,8 @@ notmuch_message_remove_tag (notmuch_message_t *message, 
const char *tag)
 if (! message->frozen)
_notmuch_message_sync (message);

+post_tag_hook (message, tag, 0);
+
 return NOTMUCH_STATUS_SUCCESS;
 }

diff --git a/notmuch-new.c b/notmuch-new.c
index 837ae4f..d984aae 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -42,6 +42,71 @@ handle_sigint (unused (int sig))
 interrupted = 1;
 }

+/* Run the post-add hook. The hook is given the chance to specify additional 
tags
+ * that should be added to the message. The hook writes the tags to its stdout,
+ * separated by a newline. The script's stdout is redirected to a pipe so that
+ * notmuch can process its output. The tags can be prefixed with '+' or '-' to
+ * indicate if the tag should be added or removed. Absence of one of these 
prefixes
+ * means that the tag will be added. */
+static void
+post_add_hook (notmuch_message_t *message)
+{
+char proc[PATH_MAX];
+snprintf (proc, PATH_MAX, "%s/.notmuch/hooks/post-add", getenv ("HOME"));
+if (access (proc, X_OK))
+return;
+
+/* The pipe between the hook and the notmuch process. The script writes
+ * into fds[0], notmuch reads from fds[1]. */
+int fds[2];
+if (pipe (fds))
+   return;
+
+int pid = fork ();
+if (pid == -1) {
+   close (fds[0]);
+   close (fds[1]);
+   return;
+} else if (pid > 0) {
+   close (fds[0]);
+   waitpid (0, NULL, 0);
+
+   char buffer[256] = { 0, };
+   read (fds[1], buffer, sizeof (buffer));
+
+   char *tag;
+   for (tag = buffer; tag && *tag; ) {
+   char *end = strchr (tag, '\n');
+   if (end)
+   *end = 0;
+
+   if (tag[0] == '+')
+   notmuch_message_add_tag (message, tag + 1);
+   else if (tag[0] == '-')
+   notmuch_message_remove_tag (message, tag + 1);
+   else
+   notmuch_message_add_tag (message, tag);
+
+   tag = end ? end + 1 : end;
+   }
+
+   return;
+}
+
+/* This is the child process (where the hook runs) */
+close (fds[1]);
+dup2 (fds[0], 1);
+
+const char *filename = notmuch_message_get_filename (message);
+const char *message_id = notmuch_message_get_message_id (message);
+const char *args[] = {
+   proc, message_id, filename, NULL
+};
+
+execv (proc, (char *const *) );
+exit (0);
+}
+
 static void
 tag_inbox_and_unread (notmuch_message_t *message)
 {
@@ -253,6 +318,7 @@ add_files_recursive (notmuch_database_t *notmuch,
   

Re: [notmuch] [PATCH] Add post-add and post-tag hooks

2009-12-22 Thread Olly Betts
Tomas Carnecky writes:
 #if defined(__sun__)
   ... sprintf, stat etc
 #else
   (void) path;
   return dirent-d_type == DT_DIR;
 #endif

Rather than a platform-specific check, it would be better to check if DT_DIR
is defined.

Beware that even on Linux (where the d_type field is present), it may always
contain DT_UNKNOWN for some filesystems, so you really should check for that
case and fall back to using stat() instead.

Cheers,
Olly

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] Missing messages breaking threads

2009-12-22 Thread Carl Worth
On Tue, 22 Dec 2009 22:48:25 + (UTC), Olly Betts o...@survex.com wrote:
 This is just the sort of thing which Xapian's user metadata is there
 for.  It's essentially a key/value store which is versioned along with
 the rest of the Xapian database.  So to set it:
 
   database.set_metadata(version, 1);
 
 And to read (and default if not set):
 
   string version = database.get_metadata(version);
   if (version.empty()) version = 0;

Thanks, Olly!

That is exactly what we'll want here, and is much better than a magic
document.

-Carl (grateful to have a Xapian expert keeping watch on the list)


pgpXLbC5HmGJ2.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch