[PATCH 0/3] Speed up notmuch new for unchanged directories

2012-06-25 Thread Austin Clements
Quoth Sascha Silbe on Jun 26 at 12:13 am:
> Austin Clements  writes:
> > On Sun, 24 Jun 2012, Sascha Silbe  wrote:
> 
> ["notmuch new" listing every directory, even if it's unchanged]
> > I haven't looked over your patches yet, but this result surprises me.
> > Could you explain your setup a little more?  How much mail do you have
> > and across how many directories?  What file system are you using?
> 
> As mentioned in passing already, I have a total of about 900k unique
> mails (sometimes several copies of them, received over different paths,
> e.g. mailing list and a direct CC). Most of that is "old" mails, in
> directories that are not getting updated. If notmuch would support mbox,
> I'd use that instead for those old mails. The total number of
> directories in the mail store is about 29k and the total number of files
> (including the git repository and mbox files that sup used) is about
> 1.25M.
> 
> Since a housekeeping job last weekend, the number of mails in
> directories that are still getting updated is about 4k, i.e. about 5? of
> the total number of mails or 3? of the total number of files. The number
> of directories getting updated is 104, i.e. about 4? of the total number
> of directories.
> 
> Ideally, we'd get the run-time of "notmuch new" down by a similar
> factor. With just plain POSIX and no additional information that won't
> be possible, but providing a way to channel information about updates
> into notmuch (rather than having it scan everything over and over again)
> should help. That information is already available as output from the
> mail fetching process (rsync in my case). Of course, it would be purely
> optional: "notmuch new" without additional information would simply
> continue to scan everything.

This would be great.  I've been thinking along similar lines for a
while (in my case, I want to feed notmuch new from inotify), though I
haven't written any code for it.

> > I'm also surprised that your new approach helps.  This directory listing
> > has to be read off disk one way or the other, but listing directories is
> > the bread-and-butter of file systems, whereas I would think that Xapian
> > would require more IO to accomplish the same effect.
> 
> "notmuch new" needs to iterate over a list of all directories to find
> those with new mails (and potentially new subdirectories). However, it
> does not need to list the *contents* of those folders. I'm surprised as
> well, but rather in the opposite direction: Based on a naive
> calculation, we'd expect to see a speedup on the order of
> (1.25M+29k)/29k?=?44. The actual results suggest that stat()ing (done
> 29k times both before and after the patch) is taking about 19 times as
> long as listing a directory entry (before the patch we listed 1M
> entries, now we list none if nothing has changed). (*)

For a cold cache, these aren't the numbers that matter.  With an HDD
and how few files your directories contain on average, only seeks will
matter.  I would expect your workload without your patch to have at
least 1 but closer to 2 seeks per directory: one to stat the directory
and one to get the directory contents block.  Some of the stat seeks
will be eliminated by the buffer cache, even starting cold, because of
inode locality (absolute best case is 16x reduction, but if you
created the directories over time, then this locality is probably
quite poor).  There are a few other potential seeks to get the
directory document from Xapian and to get its mtime value, but those
should exhibit strong locality, so they probably don't contribute
much.  NewEgg says your drive has an average seek time of 8.9ms, so
with 29k directories and assuming your directories are sequential on
disk, that's at least 258s and closer to 512s, which agrees with your
benchmark results.

I'm surprised by your results because I would expect your workload
with your patches to exhibit about the same number of seeks: one to
stat the directory (same as before) and one for
notmuch_directory_get_child_files, which has to seek in the term index
to get the child directories.  My guess is that this exhibits better
locality because the child directory terms are stored contiguously in
the database's key space (though not necessarily sequentially on disk
unless this is a fresh database).

Unfortunately, I'm not sure of a good way to test this hypothesis.
Any thoughts?


Re: [PATCH 0/3] Speed up notmuch new for unchanged directories

2012-06-25 Thread Austin Clements
Quoth Sascha Silbe on Jun 26 at 12:13 am:
> Austin Clements  writes:
> > On Sun, 24 Jun 2012, Sascha Silbe  wrote:
> 
> ["notmuch new" listing every directory, even if it's unchanged]
> > I haven't looked over your patches yet, but this result surprises me.
> > Could you explain your setup a little more?  How much mail do you have
> > and across how many directories?  What file system are you using?
> 
> As mentioned in passing already, I have a total of about 900k unique
> mails (sometimes several copies of them, received over different paths,
> e.g. mailing list and a direct CC). Most of that is "old" mails, in
> directories that are not getting updated. If notmuch would support mbox,
> I'd use that instead for those old mails. The total number of
> directories in the mail store is about 29k and the total number of files
> (including the git repository and mbox files that sup used) is about
> 1.25M.
> 
> Since a housekeeping job last weekend, the number of mails in
> directories that are still getting updated is about 4k, i.e. about 5‰ of
> the total number of mails or 3‰ of the total number of files. The number
> of directories getting updated is 104, i.e. about 4‰ of the total number
> of directories.
> 
> Ideally, we'd get the run-time of "notmuch new" down by a similar
> factor. With just plain POSIX and no additional information that won't
> be possible, but providing a way to channel information about updates
> into notmuch (rather than having it scan everything over and over again)
> should help. That information is already available as output from the
> mail fetching process (rsync in my case). Of course, it would be purely
> optional: "notmuch new" without additional information would simply
> continue to scan everything.

This would be great.  I've been thinking along similar lines for a
while (in my case, I want to feed notmuch new from inotify), though I
haven't written any code for it.

> > I'm also surprised that your new approach helps.  This directory listing
> > has to be read off disk one way or the other, but listing directories is
> > the bread-and-butter of file systems, whereas I would think that Xapian
> > would require more IO to accomplish the same effect.
> 
> "notmuch new" needs to iterate over a list of all directories to find
> those with new mails (and potentially new subdirectories). However, it
> does not need to list the *contents* of those folders. I'm surprised as
> well, but rather in the opposite direction: Based on a naive
> calculation, we'd expect to see a speedup on the order of
> (1.25M+29k)/29k = 44. The actual results suggest that stat()ing (done
> 29k times both before and after the patch) is taking about 19 times as
> long as listing a directory entry (before the patch we listed 1M
> entries, now we list none if nothing has changed). (*)

For a cold cache, these aren't the numbers that matter.  With an HDD
and how few files your directories contain on average, only seeks will
matter.  I would expect your workload without your patch to have at
least 1 but closer to 2 seeks per directory: one to stat the directory
and one to get the directory contents block.  Some of the stat seeks
will be eliminated by the buffer cache, even starting cold, because of
inode locality (absolute best case is 16x reduction, but if you
created the directories over time, then this locality is probably
quite poor).  There are a few other potential seeks to get the
directory document from Xapian and to get its mtime value, but those
should exhibit strong locality, so they probably don't contribute
much.  NewEgg says your drive has an average seek time of 8.9ms, so
with 29k directories and assuming your directories are sequential on
disk, that's at least 258s and closer to 512s, which agrees with your
benchmark results.

I'm surprised by your results because I would expect your workload
with your patches to exhibit about the same number of seeks: one to
stat the directory (same as before) and one for
notmuch_directory_get_child_files, which has to seek in the term index
to get the child directories.  My guess is that this exhibits better
locality because the child directory terms are stored contiguously in
the database's key space (though not necessarily sequentially on disk
unless this is a fresh database).

Unfortunately, I'm not sure of a good way to test this hypothesis.
Any thoughts?
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[RFC PATCH 14/14] new: Add scan support for mbox:// URIs

2012-06-25 Thread Ethan Glasser-Camp
A lot of code is duplicated from maildir, I don't think I handled all
those errors correctly, and I didn't report any progress.

Signed-off-by: Ethan Glasser-Camp 
---
 notmuch-new.c |  299 +++--
 1 file changed, 289 insertions(+), 10 deletions(-)

diff --git a/notmuch-new.c b/notmuch-new.c
index 1bf4e25..36fee34 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -19,6 +19,7 @@
  */

 #include "notmuch-client.h"
+#include 

 #include 

@@ -239,16 +240,6 @@ _entry_in_ignore_list (const char *entry, 
add_files_state_t *state)
 return FALSE;
 }

-/* Call out to the appropriate add_files function, based on the URI. */
-static notmuch_status_t
-add_files_uri (unused(notmuch_database_t *notmuch),
-  unused(const char *uri),
-  unused(add_files_state_t *state))
-{
-/* Stub for now */
-return NOTMUCH_STATUS_SUCCESS;
-}
-
 /* Progress-reporting function.
  *
  * Can be used by any mailstore-crawling function that wants to alert
@@ -674,6 +665,294 @@ add_files (notmuch_database_t *notmuch,
 return ret;
 }

+/* Scan an mbox file for messages.
+ *
+ * We assume that mboxes grow monotonically only.
+ *
+ * The mtime of the mbox file is stored in a "directory" document in
+ * Xapian.
+ */
+static notmuch_status_t
+add_messages_mbox_file (notmuch_database_t *notmuch,
+   const char *path,
+   add_files_state_t *state)
+{
+notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS, status;
+struct stat st;
+time_t fs_mtime, db_mtime, stat_time;
+FILE *mbox;
+char *line, *path_uri = NULL, *message_uri = NULL;
+int line_len;
+size_t offset, end_offset, line_size = 0;
+notmuch_directory_t *directory;
+int content_length = -1, is_headers;
+
+if (stat (path, &st)) {
+   fprintf (stderr, "Error reading mbox file %s: %s\n",
+path, strerror (errno));
+   return NOTMUCH_STATUS_FILE_ERROR;
+}
+
+stat_time = time (NULL);
+if (! S_ISREG (st.st_mode)) {
+   fprintf (stderr, "Error: %s is not a file.\n", path);
+   return NOTMUCH_STATUS_FILE_ERROR;
+}
+
+fs_mtime = st.st_mtime;
+
+path_uri = talloc_asprintf (notmuch, "mbox://%s", path);
+status = notmuch_database_get_directory (notmuch, path_uri, &directory);
+if (status) {
+   ret = status;
+   goto DONE;
+}
+db_mtime = directory ? notmuch_directory_get_mtime (directory) : 0;
+
+if (directory && db_mtime == fs_mtime) {
+   goto DONE;
+}
+
+mbox = fopen (path, "r");
+if (mbox == NULL) {
+   fprintf (stderr, "Error: couldn't open %s for reading.\n",
+path);
+   ret = NOTMUCH_STATUS_FILE_ERROR;
+   goto DONE;
+}
+
+line_len = getline (&line, &line_size, mbox);
+
+if (line_len == -1) {
+   fprintf (stderr, "Error: reading from %s failed: %s\n",
+path, strerror (errno));
+   ret = NOTMUCH_STATUS_FILE_ERROR;
+   goto DONE;
+}
+
+if (strncmp (line, "From ", 5) != 0) {
+   fprintf (stderr, "Note: Ignoring non-mbox file: %s\n",
+path);
+   ret = NOTMUCH_STATUS_FILE_ERROR;
+   goto DONE;
+}
+free(line);
+line = NULL;
+
+/* Loop invariant: At the beginning of the loop, we have just read
+ * a From_ line, but haven't yet read any of the headers.
+ */
+while (! feof (mbox)) {
+   is_headers = 1;
+   offset = ftell (mbox);
+   content_length = -1;
+
+   /* Read lines until we either get to the next From_ header, or
+* we find a Content-Length header (mboxcl) and we run out of headers.
+*/
+   do {
+   /* Get the offset before we read, in case we got another From_ 
header. */
+   end_offset = ftell (mbox);
+
+   line_len = getline (&line, &line_size, mbox);
+
+   /* Check to see if this line is a content-length header,
+* or the end of the headers. */
+   if (is_headers && strncasecmp (line, "Content-Length: ",
+  strlen ("Content-Length: ")) == 0)
+   content_length = strtol (line + strlen ("Content-Length: "),
+NULL, 10);
+
+   if (is_headers && strlen (line) == 1 && *line == '\n') {
+   is_headers = 0;
+   /* If we got a content_length, skip the message body. */
+   if (content_length != -1) {
+   fseek (mbox, content_length, SEEK_CUR);
+   line_len = getline (&line, &line_size, mbox);
+
+   /* We should be at the end of the message.  Sanity
+* check: there should be a blank line, and then
+* another From_ header. */
+   if (strlen (line) != 1 || *line != '\n') {
+   fprintf (stderr, "Warning: message with Content-Length 
not "
+"immediately

[RFC PATCH 13/14] Tests for mbox support

2012-06-25 Thread Ethan Glasser-Camp
These need to be improved, rather than hard-coding byte offsets.

Signed-off-by: Ethan Glasser-Camp 
---
 test/mbox |   59 +
 test/notmuch-test |1 +
 2 files changed, 60 insertions(+)
 create mode 100755 test/mbox

diff --git a/test/mbox b/test/mbox
new file mode 100755
index 000..f03f887
--- /dev/null
+++ b/test/mbox
@@ -0,0 +1,59 @@
+#!/usr/bin/env bash
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+
+test_description='basic mbox support'
+. ./test-lib.sh
+
+mkdir -p $MAIL_DIR/some-mboxes/subdir $MAIL_DIR/database $MAIL_DIR/corpus
+
+# The Content-Length headers here include the final newline (added later).
+generate_message '[body]="Mbox message 1."' '[header]="Content-Length: 16"' 
"[dir]=corpus"
+generate_message '[body]="Mbox message 2. Longer."' '[header]="Content-Length: 
24"' "[dir]=corpus"
+generate_message '[body]="Mbox message 3."' "[dir]=corpus"
+generate_message '[body]="Mbox message 4."' "[dir]=corpus"
+generate_message '[body]="Mbox message 5. Last message."' 
'[header]="Content-Length: 30"' "[dir]=corpus"
+
+MBOX1=$MAIL_DIR/some-mboxes/first.mbox
+for x in $MAIL_DIR/corpus/*; do
+echo "From MAILER-DAEMON Sat Jan  3 01:05:34 1996" >> $MBOX1
+cat $x >> $MBOX1
+# Final newline
+echo >> $MBOX1
+done
+
+notmuch config set database.path $MAIL_DIR/database
+notmuch config set new.scan mbox://$MAIL_DIR/some-mboxes
+
+test_begin_subtest "read a small mbox (5 messages)"
+output=$(NOTMUCH_NEW)
+test_expect_equal "$output" "Added 5 new messages to the database."
+
+test_begin_subtest "search"
+output=$(notmuch search '*' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; 
Test message #1 (inbox unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Test message #2 (inbox 
unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Test message #3 (inbox 
unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Test message #4 (inbox 
unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Test message #5 (inbox 
unread)"
+
+test_begin_subtest "show (mboxcl)"
+output=$(notmuch show "Test message #1" | grep -o "filename:[^ ]*")
+test_expect_equal "$output" 
"filename:mbox://$MAIL_DIR/some-mboxes/first.mbox#44+246"
+
+test_begin_subtest "show doesn't append an extra space at the end (mboxcl)"
+output=$(notmuch show --format=raw "Test message #1" )
+original=$(cat $MAIL_DIR/corpus/msg-001)
+test_expect_equal "$output" "$original"
+
+test_begin_subtest "show (non-cl)"
+output=$(notmuch show "Test message #3" | grep -o "filename:[^ ]*")
+test_expect_equal "$output" 
"filename:mbox://$MAIL_DIR/some-mboxes/first.mbox#634+227"
+
+test_begin_subtest "show doesn't append an extra space at the end (non-cl)"
+output=$(notmuch show --format=raw "Test message #3" )
+original=$(cat $MAIL_DIR/corpus/msg-003)
+test_expect_equal "$output" "$original"
+
+test_done
diff --git a/test/notmuch-test b/test/notmuch-test
index bfad5d3..8cbb2cd 100755
--- a/test/notmuch-test
+++ b/test/notmuch-test
@@ -47,6 +47,7 @@ TESTS="
   emacs-large-search-buffer
   emacs-subject-to-filename
   maildir-sync
+  mbox
   crypto
   symbol-hiding
   search-folder-coherence
-- 
1.7.9.5



[RFC PATCH 12/14] mailstore: support for mbox:// URIs

2012-06-25 Thread Ethan Glasser-Camp

Signed-off-by: Ethan Glasser-Camp 
---
 lib/mailstore.c |   85 +++
 1 file changed, 85 insertions(+)

diff --git a/lib/mailstore.c b/lib/mailstore.c
index ae02c12..e8d9bc1 100644
--- a/lib/mailstore.c
+++ b/lib/mailstore.c
@@ -19,6 +19,7 @@
  */
 #include 
 #include 
+#include 

 #include "notmuch-private.h"

@@ -28,6 +29,74 @@ notmuch_mailstore_basic_open (const char *filename)
 return fopen (filename, "r");
 }

+/* Since we have to return a FILE*, we use fmemopen to turn buffers
+ * into FILE* streams. But when we close these streams, we have to
+ * free() the buffers. Use a hash to associate the two.
+ */
+static GHashTable *_mbox_files_to_strings = NULL;
+
+static void
+_ensure_mbox_files_to_strings () {
+if (_mbox_files_to_strings == NULL)
+_mbox_files_to_strings = g_hash_table_new (NULL, NULL);
+}
+
+static FILE *
+notmuch_mailstore_mbox_open (UriUriA *uri)
+{
+FILE *ret = NULL, *mbox = NULL;
+char *filename, *message, *length_s;
+const char *error;
+long int offset, length, this_read;
+_ensure_mbox_files_to_strings ();
+
+offset = strtol (uri->fragment.first, &length_s, 10);
+length = strtol (length_s+1, NULL, 10);
+
+filename = talloc_strndup (NULL, uri->pathHead->text.first-1,
+   
uri->pathTail->text.afterLast-uri->pathHead->text.first+1);
+
+if (filename == NULL)
+goto DONE;
+
+mbox = fopen (filename, "r");
+if (mbox == NULL) {
+fprintf (stderr, "Couldn't open message %s: %s.\n", uri->scheme.first,
+ strerror (errno));
+goto DONE;
+}
+
+message = talloc_array (NULL, char, length);
+fseek (mbox, offset, SEEK_SET);
+
+this_read = fread (message, sizeof(char), length, mbox);
+if (this_read != length) {
+if (feof (mbox))
+error = "end of file reached";
+if (ferror (mbox))
+error = strerror (ferror (mbox));
+
+fprintf (stderr, "Couldn't read message %s: %s.\n", uri->scheme.first, 
error);
+goto DONE;
+}
+
+ret = fmemopen (message, length, "r");
+if (ret == NULL) {
+/* No fclose will ever be called, so let's free message now */
+talloc_free (message);
+goto DONE;
+}
+
+g_hash_table_insert (_mbox_files_to_strings, ret, message);
+DONE:
+if (filename)
+talloc_free (filename);
+if (mbox)
+fclose (mbox);
+
+return ret;
+}
+
 FILE *
 notmuch_mailstore_open (const char *filename)
 {
@@ -57,6 +126,14 @@ notmuch_mailstore_open (const char *filename)
 goto DONE;
 }

+if (0 == strncmp (parsed.scheme.first, "mbox",
+  parsed.scheme.afterLast-parsed.scheme.first)) {
+/* mbox URI of the form mbox:///path/to/file#offset+length.
+ * Just pass the parsed URI. */
+ret = notmuch_mailstore_mbox_open (&parsed);
+goto DONE;
+}
+
 DONE:
 uriFreeUriMembersA (&parsed);
 return ret;
@@ -65,5 +142,13 @@ DONE:
 int
 notmuch_mailstore_close (FILE *file)
 {
+char *file_buffer;
+if (_mbox_files_to_strings != NULL) {
+file_buffer = g_hash_table_lookup (_mbox_files_to_strings, file);
+if (file_buffer != NULL) {
+talloc_free (file_buffer);
+}
+g_hash_table_remove (_mbox_files_to_strings, file);
+}
 return fclose (file);
 }
-- 
1.7.9.5



[RFC PATCH 11/14] notmuch-new: pull out useful bits of add_files_recursive

2012-06-25 Thread Ethan Glasser-Camp
This is part of notmuch-new refactor phase 1: make add_files stuff
safe for other backends. add_files_recursive is essentially a
maildir-crawling function that periodically adds files to the database
or adds filenames to remove_files or remove_directory lists. I don't
see an easy way to adapt add_files_recursive for other backends who
might not have concepts of directories with other directories inside
of them, so instead just provide an add_files method for each backend.

This patch pulls some bits out of add_files_recursive which will be
useful for other backends: two reporting functions
_report_before_adding_file and _report_added_file, as well as
_add_message, which actually does the message adding.

Signed-off-by: Ethan Glasser-Camp 
---
 notmuch-new.c |  192 +++--
 1 file changed, 119 insertions(+), 73 deletions(-)

diff --git a/notmuch-new.c b/notmuch-new.c
index 57b27bf..1bf4e25 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -249,6 +249,122 @@ add_files_uri (unused(notmuch_database_t *notmuch),
 return NOTMUCH_STATUS_SUCCESS;
 }

+/* Progress-reporting function.
+ *
+ * Can be used by any mailstore-crawling function that wants to alert
+ * users what message it's about to add. Subsequent errors will be due
+ * to this message ;)
+ */
+static void
+_report_before_adding_file (add_files_state_t *state, const char *filename)
+{
+state->processed_files++;
+
+if (state->verbose) {
+   if (state->output_is_a_tty)
+   printf("\r\033[K");
+
+   printf ("%i/%i: %s",
+   state->processed_files,
+   state->total_files,
+   filename);
+
+   putchar((state->output_is_a_tty) ? '\r' : '\n');
+   fflush (stdout);
+}
+}
+
+/* Progress-reporting function.
+ *
+ * Call this to respond to the signal handler for SIGALRM.
+ */
+static void
+_report_added_file (add_files_state_t *state)
+{
+if (do_print_progress) {
+   do_print_progress = 0;
+   generic_print_progress ("Processed", "files", state->tv_start,
+   state->processed_files, state->total_files);
+}
+}
+
+
+/* Atomically handles adding a message to the database.
+ *
+ * Should be used by any mailstore-crawling function that finds a new
+ * message to add.
+ */
+static notmuch_status_t
+_add_message (add_files_state_t *state, notmuch_database_t *notmuch,
+ const char *filename)
+{
+notmuch_status_t status, ret = NOTMUCH_STATUS_SUCCESS;
+notmuch_message_t *message;
+const char **tag;
+
+status = notmuch_database_begin_atomic (notmuch);
+if (status) {
+   ret = status;
+   goto DONE;
+}
+
+status = notmuch_database_add_message (notmuch, filename, &message);
+
+switch (status) {
+/* success */
+case NOTMUCH_STATUS_SUCCESS:
+   state->added_messages++;
+   notmuch_message_freeze (message);
+   for (tag=state->new_tags; *tag != NULL; tag++)
+   notmuch_message_add_tag (message, *tag);
+   if (state->synchronize_flags == TRUE)
+   notmuch_message_maildir_flags_to_tags (message);
+   notmuch_message_thaw (message);
+   break;
+/* Non-fatal issues (go on to next file) */
+case NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID:
+   if (state->synchronize_flags == TRUE)
+   notmuch_message_maildir_flags_to_tags (message);
+   break;
+case NOTMUCH_STATUS_FILE_NOT_EMAIL:
+   fprintf (stderr, "Note: Ignoring non-mail file: %s\n",
+filename);
+   break;
+/* Fatal issues. Don't process anymore. */
+case NOTMUCH_STATUS_READ_ONLY_DATABASE:
+case NOTMUCH_STATUS_XAPIAN_EXCEPTION:
+case NOTMUCH_STATUS_OUT_OF_MEMORY:
+   fprintf (stderr, "Error: %s. Halting processing.\n",
+notmuch_status_to_string (status));
+   ret = status;
+   goto DONE;
+default:
+case NOTMUCH_STATUS_FILE_ERROR:
+case NOTMUCH_STATUS_NULL_POINTER:
+case NOTMUCH_STATUS_TAG_TOO_LONG:
+case NOTMUCH_STATUS_UNBALANCED_FREEZE_THAW:
+case NOTMUCH_STATUS_UNBALANCED_ATOMIC:
+case NOTMUCH_STATUS_LAST_STATUS:
+   INTERNAL_ERROR ("add_message returned unexpected value: %d",  status);
+   ret = status;
+   goto DONE;
+}
+
+status = notmuch_database_end_atomic (notmuch);
+if (status) {
+   ret = status;
+   goto DONE;
+}
+
+  DONE:
+if (message) {
+   notmuch_message_destroy (message);
+   message = NULL;
+}
+
+return ret;
+}
+
 /* Examine 'path' recursively as follows:
  *
  *   o Ask the filesystem for the mtime of 'path' (fs_mtime)
@@ -300,7 +416,6 @@ add_files (notmuch_database_t *notmuch,
 char *next = NULL, *path_uri = NULL;
 time_t fs_mtime, db_mtime;
 notmuch_status_t status, ret = NOTMUCH_STATUS_SUCCESS;
-notmuch_message_t *message = NULL;
 struct dirent **fs_entries = NULL;
 int i, num_fs_entries = 0, entry_type;
 notmuch_directory_t *directory;
@@ -309,7 +

[RFC PATCH 10/14] new: add "scan" option

2012-06-25 Thread Ethan Glasser-Camp
This is just a quick hack to get started on adding an mbox backend.

The fact that the default maildir is scanned "automagically" is a
little weird, but it doesn't do any harm unless you decide to put mail
there that you really don't want indexed.

Signed-off-by: Ethan Glasser-Camp 
---
 notmuch-client.h |9 +
 notmuch-config.c |   30 +-
 notmuch-new.c|   18 ++
 test/config  |1 +
 4 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/notmuch-client.h b/notmuch-client.h
index 9b63eae..9d922fe 100644
--- a/notmuch-client.h
+++ b/notmuch-client.h
@@ -256,6 +256,15 @@ notmuch_config_set_new_ignore (notmuch_config_t *config,
   const char *new_ignore[],
   size_t length);

+const char **
+notmuch_config_get_new_scan (notmuch_config_t *config,
+  size_t *length);
+
+void
+notmuch_config_set_new_scan (notmuch_config_t *config,
+  const char *new_scan[],
+  size_t length);
+
 notmuch_bool_t
 notmuch_config_get_maildir_synchronize_flags (notmuch_config_t *config);

diff --git a/notmuch-config.c b/notmuch-config.c
index 3e37a2d..e9d99ea 100644
--- a/notmuch-config.c
+++ b/notmuch-config.c
@@ -50,7 +50,10 @@ static const char new_config_comment[] =
 "\tthat will not be searched for messages by \"notmuch new\".\n"
 "\n"
 "\tNOTE: *Every* file/directory that goes by one of those names 
will\n"
-"\tbe ignored, independent of its depth/location in the mail 
store.\n";
+"\tbe ignored, independent of its depth/location in the mail 
store.\n"
+"\n"
+"\tscanA list (separated by ';') of mail URLs to scan.\n"
+"\tThe maildir located at database.path, above, will automatically 
be added.\n";

 static const char user_config_comment[] =
 " User configuration\n"
@@ -113,6 +116,8 @@ struct _notmuch_config {
 size_t new_tags_length;
 const char **new_ignore;
 size_t new_ignore_length;
+const char **new_scan;
+size_t new_scan_length;
 notmuch_bool_t maildir_synchronize_flags;
 const char **search_exclude_tags;
 size_t search_exclude_tags_length;
@@ -274,6 +279,8 @@ notmuch_config_open (void *ctx,
 config->new_tags_length = 0;
 config->new_ignore = NULL;
 config->new_ignore_length = 0;
+config->new_scan = NULL;
+config->new_scan_length = 0;
 config->maildir_synchronize_flags = TRUE;
 config->search_exclude_tags = NULL;
 config->search_exclude_tags_length = 0;
@@ -375,6 +382,10 @@ notmuch_config_open (void *ctx,
notmuch_config_set_new_ignore (config, NULL, 0);
 }

+if (notmuch_config_get_new_scan (config, &tmp) == NULL) {
+   notmuch_config_set_new_scan (config, NULL, 0);
+}
+
 if (notmuch_config_get_search_exclude_tags (config, &tmp) == NULL) {
if (is_new) {
const char *tags[] = { "deleted", "spam" };
@@ -631,6 +642,14 @@ notmuch_config_get_new_ignore (notmuch_config_t *config, 
size_t *length)
 &(config->new_ignore_length), length);
 }

+const char **
+notmuch_config_get_new_scan (notmuch_config_t *config, size_t *length)
+{
+return _config_get_list (config, "new", "scan",
+&(config->new_scan),
+&(config->new_scan_length), length);
+}
+
 void
 notmuch_config_set_user_other_email (notmuch_config_t *config,
 const char *list[],
@@ -658,6 +677,15 @@ notmuch_config_set_new_ignore (notmuch_config_t *config,
 &(config->new_ignore));
 }

+void
+notmuch_config_set_new_scan (notmuch_config_t *config,
+const char *list[],
+size_t length)
+{
+_config_set_list (config, "new", "scan", list, length,
+&(config->new_scan));
+}
+
 const char **
 notmuch_config_get_search_exclude_tags (notmuch_config_t *config, size_t 
*length)
 {
diff --git a/notmuch-new.c b/notmuch-new.c
index 1f11b2c..57b27bf 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -239,6 +239,16 @@ _entry_in_ignore_list (const char *entry, 
add_files_state_t *state)
 return FALSE;
 }

+/* Call out to the appropriate add_files function, based on the URI. */
+static notmuch_status_t
+add_files_uri (unused(notmuch_database_t *notmuch),
+  unused(const char *uri),
+  unused(add_files_state_t *state))
+{
+/* Stub for now */
+return NOTMUCH_STATUS_SUCCESS;
+}
+
 /* Examine 'path' recursively as follows:
  *
  *   o Ask the filesystem for the mtime of 'path' (fs_mtime)
@@ -843,6 +853,8 @@ notmuch_new_command (void *ctx, int argc, char *argv[])
 int ret = 0;
 struct stat st;
 const char *db_path;
+const char **new_scan;
+size_t new_scan_length, new_scan_i;
 char *dot_notmuch_path;
 struct sigaction action;
 _

[RFC PATCH 09/14] Fix atomicity test to work without relocatable mailstores

2012-06-25 Thread Ethan Glasser-Camp
Instead of assuming that the mailstore doesn't store its absolute
filenames, we use a symlink that can change back and forth. As long as
filenames contain this symlink, they can work in either the real
database, or the current snapshot.

Signed-off-by: Ethan Glasser-Camp 
---
 test/atomicity |   10 +-
 test/atomicity.gdb |   11 ---
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/test/atomicity b/test/atomicity
index 6df0a00..7b62ec7 100755
--- a/test/atomicity
+++ b/test/atomicity
@@ -49,13 +49,13 @@ if test_require_external_prereq gdb; then
 rm $MAIL_DIR/.remove-dir/remove-directory-duplicate:2,
 rmdir $MAIL_DIR/.remove-dir

-# Prepare a snapshot of the updated maildir.  The gdb script will
-# update the database in this snapshot as it goes.
+# Copy the mail database. We will run on this database concurrently.
 cp -ra $MAIL_DIR $MAIL_DIR.snap
-cp ${NOTMUCH_CONFIG} ${NOTMUCH_CONFIG}.snap
-NOTMUCH_CONFIG=${NOTMUCH_CONFIG}.snap notmuch config set database.path 
$MAIL_DIR.snap
-

+# Use a symlink instead of the real path. This way, we can change the 
symlink,
+# without filenames having to change.
+mv $MAIL_DIR $MAIL_DIR.real
+ln -s $MAIL_DIR.real $MAIL_DIR

 # Execute notmuch new and, at every call to rename, snapshot the
 # database, run notmuch new again on the snapshot, and capture the
diff --git a/test/atomicity.gdb b/test/atomicity.gdb
index fd67525..3d4e210 100644
--- a/test/atomicity.gdb
+++ b/test/atomicity.gdb
@@ -38,12 +38,17 @@ shell mv backtrace backtrace.`cat outcount`
 # Snapshot the database
 shell rm -r $MAIL_DIR.snap/.notmuch
 shell cp -r $MAIL_DIR/.notmuch $MAIL_DIR.snap/.notmuch
+shell rm $MAIL_DIR
+shell ln -s $MAIL_DIR.snap $MAIL_DIR
 # Restore the mtime of $MAIL_DIR.snap, which we just changed
-shell touch -r $MAIL_DIR $MAIL_DIR.snap
+shell touch -r $MAIL_DIR.real $MAIL_DIR.snap
 # Run notmuch new to completion on the snapshot
-shell NOTMUCH_CONFIG=${NOTMUCH_CONFIG}.snap XAPIAN_FLUSH_THRESHOLD=1000 
notmuch new > /dev/null
-shell NOTMUCH_CONFIG=${NOTMUCH_CONFIG}.snap notmuch search '*' > search.`cat 
outcount` 2>&1
+shell NOTMUCH_CONFIG=${NOTMUCH_CONFIG} XAPIAN_FLUSH_THRESHOLD=1000 notmuch new 
> /dev/null
+shell NOTMUCH_CONFIG=${NOTMUCH_CONFIG} notmuch search '*' > search.`cat 
outcount` 2>&1
 shell echo $(expr $(cat outcount) + 1) > outcount
+# restore symlink to correct database before resuming
+shell rm $MAIL_DIR
+shell ln -s $MAIL_DIR.real $MAIL_DIR
 cont
 end

-- 
1.7.9.5



[RFC PATCH 08/14] Don't cache corpus.mail

2012-06-25 Thread Ethan Glasser-Camp
corpus.mail has already been processed by notmuch-new, so it seems
like a good target to cache, but since filenames are no longer being
stored relative to the database, it isn't. Recopy on each test, or
else filenames from other tests will show up.

Signed-off-by: Ethan Glasser-Camp 
---
 test/test-lib.sh |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/test/test-lib.sh b/test/test-lib.sh
index 195158c..def3760 100644
--- a/test/test-lib.sh
+++ b/test/test-lib.sh
@@ -441,7 +441,7 @@ add_email_corpus ()
 else
cp -a $TEST_DIRECTORY/corpus ${MAIL_DIR}
notmuch new >/dev/null
-   cp -a ${MAIL_DIR} $TEST_DIRECTORY/corpus.mail
+   #cp -a ${MAIL_DIR} $TEST_DIRECTORY/corpus.mail
 fi
 }

-- 
1.7.9.5



[RFC PATCH 07/14] Update tests that need to see filenames to use URIs

2012-06-25 Thread Ethan Glasser-Camp
This fixes all tests except atomicity, which should be next.

Signed-off-by: Ethan Glasser-Camp 
---
 test/emacs   |2 +-
 test/json|4 ++--
 test/maildir-sync|7 ---
 test/multipart   |4 ++--
 test/new |6 +++---
 test/search-folder-coherence |2 +-
 test/search-output   |4 ++--
 test/test-lib.sh |3 +++
 8 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/test/emacs b/test/emacs
index e9f954c..c08791e 100755
--- a/test/emacs
+++ b/test/emacs
@@ -621,7 +621,7 @@ Stash my stashables
 id:"bought"
 bought
 inbox,stashtest
-${gen_msg_filename}
+${gen_msg_uri}
 http://mid.gmane.org/bought
 http://marc.info/?i=bought
 http://mail-archive.com/search?l=mid&q=bought
diff --git a/test/json b/test/json
index 6439788..be29fac 100755
--- a/test/json
+++ b/test/json
@@ -5,7 +5,7 @@ test_description="--format=json output"
 test_begin_subtest "Show message: json"
 add_message "[subject]=\"json-show-subject\"" "[date]=\"Sat, 01 Jan 2000 
12:00:00 -\"" "[body]=\"json-show-message\""
 output=$(notmuch show --format=json "json-show-message")
-test_expect_equal "$output" "[[[{\"id\": \"${gen_msg_id}\", \"match\": true, 
\"excluded\": false, \"filename\": \"${gen_msg_filename}\", \"timestamp\": 
946728000, \"date_relative\": \"2000-01-01\", \"tags\": [\"inbox\",\"unread\"], 
\"headers\": {\"Subject\": \"json-show-subject\", \"From\": \"Notmuch Test 
Suite \", \"To\": \"Notmuch Test Suite 
\", \"Date\": \"Sat, 01 Jan 2000 12:00:00 
+\"}, \"body\": [{\"id\": 1, \"content-type\": \"text/plain\", \"content\": 
\"json-show-message\n\"}]}, ["
+test_expect_equal "$output" "[[[{\"id\": \"${gen_msg_id}\", \"match\": true, 
\"excluded\": false, \"filename\": \"${gen_msg_uri}\", \"timestamp\": 
946728000, \"date_relative\": \"2000-01-01\", \"tags\": [\"inbox\",\"unread\"], 
\"headers\": {\"Subject\": \"json-show-subject\", \"From\": \"Notmuch Test 
Suite \", \"To\": \"Notmuch Test Suite 
\", \"Date\": \"Sat, 01 Jan 2000 12:00:00 
+\"}, \"body\": [{\"id\": 1, \"content-type\": \"text/plain\", \"content\": 
\"json-show-message\n\"}]}, ["

 test_begin_subtest "Search message: json"
 add_message "[subject]=\"json-search-subject\"" "[date]=\"Sat, 01 Jan 2000 
12:00:00 -\"" "[body]=\"json-search-message\""
@@ -22,7 +22,7 @@ test_expect_equal "$output" "[{\"thread\": \"XXX\",
 test_begin_subtest "Show message: json, utf-8"
 add_message "[subject]=\"json-show-utf8-body-s?bj?ct\"" "[date]=\"Sat, 01 Jan 
2000 12:00:00 -\"" "[body]=\"js?n-show-m?ssage\""
 output=$(notmuch show --format=json "js?n-show-m?ssage")
-test_expect_equal "$output" "[[[{\"id\": \"${gen_msg_id}\", \"match\": true, 
\"excluded\": false, \"filename\": \"${gen_msg_filename}\", \"timestamp\": 
946728000, \"date_relative\": \"2000-01-01\", \"tags\": [\"inbox\",\"unread\"], 
\"headers\": {\"Subject\": \"json-show-utf8-body-s?bj?ct\", \"From\": \"Notmuch 
Test Suite \", \"To\": \"Notmuch Test Suite 
\", \"Date\": \"Sat, 01 Jan 2000 12:00:00 
+\"}, \"body\": [{\"id\": 1, \"content-type\": \"text/plain\", \"content\": 
\"js?n-show-m?ssage\n\"}]}, ["
+test_expect_equal "$output" "[[[{\"id\": \"${gen_msg_id}\", \"match\": true, 
\"excluded\": false, \"filename\": \"${gen_msg_uri}\", \"timestamp\": 
946728000, \"date_relative\": \"2000-01-01\", \"tags\": [\"inbox\",\"unread\"], 
\"headers\": {\"Subject\": \"json-show-utf8-body-s?bj?ct\", \"From\": \"Notmuch 
Test Suite \", \"To\": \"Notmuch Test Suite 
\", \"Date\": \"Sat, 01 Jan 2000 12:00:00 
+\"}, \"body\": [{\"id\": 1, \"content-type\": \"text/plain\", \"content\": 
\"js?n-show-m?ssage\n\"}]}, ["

 test_begin_subtest "Show message: json, inline attachment filename"
 subject='json-show-inline-attachment-filename'
diff --git a/test/maildir-sync b/test/maildir-sync
index 01348d3..a2e110e 100755
--- a/test/maildir-sync
+++ b/test/maildir-sync
@@ -8,7 +8,7 @@ test_description="maildir synchronization"
 # --format=json" output includes some newlines. Also, need to avoid
 # including the local value of MAIL_DIR in the result.
 filter_show_json() {
-sed -e 's/, /,\n/g'  | sed -e "s|${MAIL_DIR}/|MAIL_DIR/|"
+sed -e 's/, /,\n/g'  | sed -e "s|${MAIL_URI}/|MAIL_DIR/|"
 echo
 }

@@ -102,8 +102,9 @@ No new mail. Detected 1 file rename.
 thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; Removing S flag (inbox 
unread)"

 test_begin_subtest "Removing info from filename leaves tags unchanged"
-add_message [subject]='"Message to lose maildir info"' 
[filename]='message-to-lose-maildir-info' [dir]=cur
-notmuch tag -unread subject:"Message to lose maildir info"
+generate_message [subject]='"Message to lose maildir info"' 
[filename]='message-to-lose-maildir-info' [dir]=cur
+notmuch new > hrngh.new
+notmuch tag -unread subject:"Message to lose maildir info" > hrngh.tag
 mv "$MAIL_DIR/cur/message-to-lose-maildir-info:2,S" 
"$MAIL_DIR/cur/message-without-maild

[RFC PATCH 06/14] maildir URIs can be used in tags_to_maildir_flags

2012-06-25 Thread Ethan Glasser-Camp
A better fix would probably be based on scheme.

Signed-off-by: Ethan Glasser-Camp 
---
 lib/message.cc |   51 ++-
 1 file changed, 46 insertions(+), 5 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index c9857f5..8ecec71 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -23,6 +23,7 @@

 #include 

+#include 
 #include 

 struct visible _notmuch_message {
@@ -1093,7 +1094,6 @@ notmuch_message_maildir_flags_to_tags (notmuch_message_t 
*message)
 {
filename = notmuch_filenames_get (filenames);
dir = _filename_is_in_maildir (filename);
-
if (! dir)
continue;

@@ -1304,12 +1304,46 @@ _new_maildir_filename (void *ctx,
 return filename_new;
 }

+/* Parses a maildir URI and returns the filename corresponding to its
+ * path.
+ *
+ * Returns NULL if either the URI couldn't be parsed or if the
+ * scheme isn't maildir:.
+ */
+static char *
+_get_maildir_filename (const char *filename)
+{
+UriParserStateA parser_state;
+UriUriA parsed;
+char *path;
+parser_state.uri = &parsed;
+
+if (uriParseUriA (&parser_state, filename) != URI_SUCCESS) {
+   uriFreeUriMembersA (&parsed);
+   return NULL;
+}
+
+if (parsed.scheme.first != NULL &&
+   0 != strncmp(parsed.scheme.first, "maildir",
+parsed.scheme.afterLast-parsed.scheme.first)) {
+   /* Full URI with non-maildir scheme. */
+   uriFreeUriMembersA (&parsed);
+   return NULL;
+}
+
+path = (char *)parsed.pathHead->text.first - 1;
+uriFreeUriMembersA (&parsed);
+return path;
+
+}
+
+
 notmuch_status_t
 notmuch_message_tags_to_maildir_flags (notmuch_message_t *message)
 {
 notmuch_filenames_t *filenames;
 const char *filename;
-char *filename_new;
+char *filename_new, *filename_old, *filename_new_uri;
 char *to_set, *to_clear;
 notmuch_status_t status = NOTMUCH_STATUS_SUCCESS;

@@ -1324,16 +1358,22 @@ notmuch_message_tags_to_maildir_flags 
(notmuch_message_t *message)
if (! _filename_is_in_maildir (filename))
continue;

-   filename_new = _new_maildir_filename (message, filename,
+   filename_old = _get_maildir_filename (filename);
+   if (filename_old == NULL)
+   continue;
+
+   filename_new = _new_maildir_filename (message, filename_old,
  to_set, to_clear);
if (filename_new == NULL)
continue;

+   filename_new_uri = talloc_asprintf (message, "maildir://%s", 
filename_new);
+
if (strcmp (filename, filename_new)) {
int err;
notmuch_status_t new_status;

-   err = rename (filename, filename_new);
+   err = rename (filename_old, filename_new);
if (err)
continue;

@@ -1347,7 +1387,7 @@ notmuch_message_tags_to_maildir_flags (notmuch_message_t 
*message)
}

new_status = _notmuch_message_add_filename (message,
-   filename_new);
+   filename_new_uri);
/* Hold on to only the first error. */
if (! status && new_status) {
status = new_status;
@@ -1358,6 +1398,7 @@ notmuch_message_tags_to_maildir_flags (notmuch_message_t 
*message)
}

talloc_free (filename_new);
+   talloc_free (filename_new_uri);
 }

 talloc_free (to_set);
-- 
1.7.9.5



[RFC PATCH 05/14] new: use new URL-based filenames for messages

2012-06-25 Thread Ethan Glasser-Camp
This commit breaks a bunch of tests; fixes follow.

Signed-off-by: Ethan Glasser-Camp 
---
 notmuch-new.c |   27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/notmuch-new.c b/notmuch-new.c
index 938ae29..1f11b2c 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -287,7 +287,7 @@ add_files (notmuch_database_t *notmuch,
 {
 DIR *dir = NULL;
 struct dirent *entry = NULL;
-char *next = NULL;
+char *next = NULL, *path_uri = NULL;
 time_t fs_mtime, db_mtime;
 notmuch_status_t status, ret = NOTMUCH_STATUS_SUCCESS;
 notmuch_message_t *message = NULL;
@@ -315,7 +315,16 @@ add_files (notmuch_database_t *notmuch,

 fs_mtime = st.st_mtime;

-status = notmuch_database_get_directory (notmuch, path, &directory);
+/* maildir URIs should never have a hostname component, but
+ * uriparser doesn't parse paths correctly if they start with //,
+ * as in scheme://host//path.
+ */
+if (path[0] == '/')
+   path_uri = talloc_asprintf (notmuch, "maildir://%s", path);
+else
+   path_uri = talloc_asprintf (notmuch, "maildir:///%s", path);
+
+status = notmuch_database_get_directory (notmuch, path_uri, &directory);
 if (status) {
ret = status;
goto DONE;
@@ -423,7 +432,7 @@ add_files (notmuch_database_t *notmuch,
   strcmp (notmuch_filenames_get (db_files), entry->d_name) < 0)
{
char *absolute = talloc_asprintf (state->removed_files,
- "%s/%s", path,
+ "%s/%s", path_uri,
  notmuch_filenames_get (db_files));

_filename_list_add (state->removed_files, absolute);
@@ -439,7 +448,7 @@ add_files (notmuch_database_t *notmuch,
if (strcmp (filename, entry->d_name) < 0)
{
char *absolute = talloc_asprintf (state->removed_directories,
- "%s/%s", path, filename);
+ "%s/%s", path_uri, filename);

_filename_list_add (state->removed_directories, absolute);
}
@@ -467,7 +476,7 @@ add_files (notmuch_database_t *notmuch,

/* We're now looking at a regular file that doesn't yet exist
 * in the database, so add it. */
-   next = talloc_asprintf (notmuch, "%s/%s", path, entry->d_name);
+   next = talloc_asprintf (notmuch, "%s/%s", path_uri, entry->d_name);

state->processed_files++;

@@ -559,7 +568,7 @@ add_files (notmuch_database_t *notmuch,
 while (notmuch_filenames_valid (db_files))
 {
char *absolute = talloc_asprintf (state->removed_files,
- "%s/%s", path,
+ "%s/%s", path_uri,
  notmuch_filenames_get (db_files));

_filename_list_add (state->removed_files, absolute);
@@ -570,7 +579,7 @@ add_files (notmuch_database_t *notmuch,
 while (notmuch_filenames_valid (db_subdirs))
 {
char *absolute = talloc_asprintf (state->removed_directories,
- "%s/%s", path,
+ "%s/%s", path_uri,
  notmuch_filenames_get (db_subdirs));

_filename_list_add (state->removed_directories, absolute);
@@ -584,9 +593,11 @@ add_files (notmuch_database_t *notmuch,
  * same second.  This may lead to unnecessary re-scans, but it
  * avoids overlooking messages. */
 if (fs_mtime != stat_time)
-   _filename_list_add (state->directory_mtimes, path)->mtime = fs_mtime;
+   _filename_list_add (state->directory_mtimes, path_uri)->mtime = 
fs_mtime;

   DONE:
+if (path_uri)
+   talloc_free (path_uri);
 if (next)
talloc_free (next);
 if (dir)
-- 
1.7.9.5



[RFC PATCH 04/14] Not all filenames need to be converted to absolute paths

2012-06-25 Thread Ethan Glasser-Camp
_notmuch_message_ensure_filename_list converts "relative" paths, such
as those stored in Xapian until now, to "absolute" paths. However,
URLs are already absolute, and prepending the database path will just
confuse matters.

Signed-off-by: Ethan Glasser-Camp 
---
 lib/message.cc |   14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 978de06..c9857f5 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -700,9 +700,17 @@ _notmuch_message_ensure_filename_list (notmuch_message_t 
*message)
  message->notmuch,
  directory_id);

-   if (strlen (directory))
-   filename = talloc_asprintf (message, "%s/%s/%s",
-   db_path, directory, basename);
+   if (strlen (directory)) {
+   /* If directory is a URI, we don't need to append the db_path;
+* it is already an absolute path. */
+   /* This is just a quick hack instead of actually parsing the URL. */
+   if (strstr (directory, "://") == NULL)
+   filename = talloc_asprintf (message, "%s/%s/%s",
+   db_path, directory, basename);
+   else
+   filename = talloc_asprintf (message, "%s/%s",
+   directory, basename);
+   }
else
filename = talloc_asprintf (message, "%s/%s",
db_path, basename);
-- 
1.7.9.5



[RFC PATCH 03/14] mailstore can read from maildir: URLs

2012-06-25 Thread Ethan Glasser-Camp
No code uses this yet.

Signed-off-by: Ethan Glasser-Camp 
---
 lib/mailstore.c |   37 -
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/lib/mailstore.c b/lib/mailstore.c
index 48acd47..ae02c12 100644
--- a/lib/mailstore.c
+++ b/lib/mailstore.c
@@ -17,14 +17,49 @@
  *
  * Author: Carl Worth 
  */
+#include 
 #include 

 #include "notmuch-private.h"

+static FILE *
+notmuch_mailstore_basic_open (const char *filename)
+{
+return fopen (filename, "r");
+}
+
 FILE *
 notmuch_mailstore_open (const char *filename)
 {
-return fopen (filename, "r");
+FILE *ret = NULL;
+UriUriA parsed;
+UriParserStateA state;
+state.uri = &parsed;
+if (uriParseUriA (&state, filename) != URI_SUCCESS) {
+/* Failure. Fall back to fopen and hope for the best. */
+ret = notmuch_mailstore_basic_open (filename);
+goto DONE;
+}
+
+if (parsed.scheme.first == NULL) {
+/* No scheme. Probably not really a URL but just an ordinary filename.
+ * Fall back to fopen for backwards compatibility. */
+ret = notmuch_mailstore_basic_open (filename);
+goto DONE;
+}
+
+if (0 == strncmp (parsed.scheme.first, "maildir",
+  parsed.scheme.afterLast-parsed.scheme.first)) {
+/* Maildir URI of the form maildir:///path/to/file.
+ * We want to fopen("/path/to/file").
+ * pathHead starts at "path/to/file". */
+ret = notmuch_mailstore_basic_open (parsed.pathHead->text.first - 1);
+goto DONE;
+}
+
+DONE:
+uriFreeUriMembersA (&parsed);
+return ret;
 }

 int
-- 
1.7.9.5



[RFC PATCH 02/14] Introduce uriparser

2012-06-25 Thread Ethan Glasser-Camp
Seeing as there is no glib-standard way to parse URIs, an external
library is needed. This commit introduces another program in compat/
and a stanza in ./configure to test if uriparser is there.

Signed-off-by: Ethan Glasser-Camp 
---
 Makefile.local  |2 +-
 compat/have_uriparser.c |   17 +
 configure   |   23 ---
 3 files changed, 38 insertions(+), 4 deletions(-)
 create mode 100644 compat/have_uriparser.c

diff --git a/Makefile.local b/Makefile.local
index a890df2..084f44e 100644
--- a/Makefile.local
+++ b/Makefile.local
@@ -41,7 +41,7 @@ PV_FILE=bindings/python/notmuch/version.py
 # Smash together user's values with our extra values
 FINAL_CFLAGS = -DNOTMUCH_VERSION=$(VERSION) $(CFLAGS) $(WARN_CFLAGS) 
$(CONFIGURE_CFLAGS) $(extra_cflags)
 FINAL_CXXFLAGS = $(CXXFLAGS) $(WARN_CXXFLAGS) $(CONFIGURE_CXXFLAGS) 
$(extra_cflags) $(extra_cxxflags)
-FINAL_NOTMUCH_LDFLAGS = $(LDFLAGS) -Lutil -lutil -Llib -lnotmuch 
$(AS_NEEDED_LDFLAGS) $(GMIME_LDFLAGS) $(TALLOC_LDFLAGS)
+FINAL_NOTMUCH_LDFLAGS = $(LDFLAGS) -Lutil -lutil -Llib -lnotmuch 
$(AS_NEEDED_LDFLAGS) $(GMIME_LDFLAGS) $(TALLOC_LDFLAGS) $(URIPARSER_LDFLAGS)
 FINAL_NOTMUCH_LINKER = CC
 ifneq ($(LINKER_RESOLVES_LIBRARY_DEPENDENCIES),1)
 FINAL_NOTMUCH_LDFLAGS += $(CONFIGURE_LDFLAGS)
diff --git a/compat/have_uriparser.c b/compat/have_uriparser.c
new file mode 100644
index 000..d79e51d
--- /dev/null
+++ b/compat/have_uriparser.c
@@ -0,0 +1,17 @@
+#include 
+
+int
+main (int argc, char *argv[])
+{
+UriParserStateA state;
+UriUriA uri;
+char *uriS = NULL;
+
+state.uri = &uri;
+if (uriParseUriA (&state, uriS) != URI_SUCCESS) {
+/* Failure */
+uriFreeUriMembersA (&uri);
+}
+
+return 0;
+}
diff --git a/configure b/configure
index 3fad424..80aa13c 100755
--- a/configure
+++ b/configure
@@ -313,6 +313,19 @@ else
 errors=$((errors + 1))
 fi

+printf "Checking for uriparser... "
+if ${CC} -o compat/have_uriparser "$srcdir"/compat/have_uriparser.c 
-luriparser > /dev/null 2>&1
+then
+printf "Yes.\n"
+uriparser_ldflags="-luriparser"
+have_uriparser=1
+else
+printf "No.\n"
+have_uriparser=0
+errors=$((errors + 1))
+fi
+rm -f compat/have_uriparser
+
 printf "Checking for valgrind development files... "
 if pkg-config --exists valgrind; then
 printf "Yes.\n"
@@ -431,11 +444,11 @@ case a simple command will install everything you need. 
For example:

 On Debian and similar systems:

-   sudo apt-get install libxapian-dev libgmime-2.6-dev libtalloc-dev
+   sudo apt-get install libxapian-dev libgmime-2.6-dev libtalloc-dev 
liburiparser-dev

 Or on Fedora and similar systems:

-   sudo yum install xapian-core-devel gmime-devel libtalloc-devel
+   sudo yum install xapian-core-devel gmime-devel libtalloc-devel 
uriparser-devel

 On other systems, similar commands can be used, but the details of the
 package names may be different.
@@ -669,6 +682,9 @@ GMIME_LDFLAGS = ${gmime_ldflags}
 TALLOC_CFLAGS = ${talloc_cflags}
 TALLOC_LDFLAGS = ${talloc_ldflags}

+# Flags needed to link against uriparser
+URIPARSER_LDFLAGS = ${uriparser_ldflags}
+
 # Flags needed to have linker set rpath attribute
 RPATH_LDFLAGS = ${rpath_ldflags}

@@ -698,5 +714,6 @@ CONFIGURE_CXXFLAGS = -DHAVE_GETLINE=\$(HAVE_GETLINE) 
\$(GMIME_CFLAGS)\\
 \$(TALLOC_CFLAGS) -DHAVE_VALGRIND=\$(HAVE_VALGRIND) \\
 \$(VALGRIND_CFLAGS) \$(XAPIAN_CXXFLAGS) \\
  -DHAVE_STRCASESTR=\$(HAVE_STRCASESTR)
-CONFIGURE_LDFLAGS =  \$(GMIME_LDFLAGS) \$(TALLOC_LDFLAGS) \$(XAPIAN_LDFLAGS)
+CONFIGURE_LDFLAGS =  \$(GMIME_LDFLAGS) \$(TALLOC_LDFLAGS) \$(XAPIAN_LDFLAGS) \\
+ \$(URIPARSER_LDFLAGS)
 EOF
-- 
1.7.9.5



[RFC PATCH 01/14] All access to mail files goes through the mailstore module

2012-06-25 Thread Ethan Glasser-Camp
This commit introduces the mailstore module which provides two
functions, notmuch_mailstore_open and notmuch_mailstore_close. These
functions are currently just stub calls to fopen and fclose, but later
can be made more complex in order to support mail storage systems
where one message might not be one file.

Signed-off-by: Ethan Glasser-Camp 
---
 lib/Makefile.local|1 +
 lib/database.cc   |2 +-
 lib/index.cc  |2 +-
 lib/mailstore.c   |   34 
 lib/message-file.c|6 ++---
 lib/notmuch-private.h |3 +++
 lib/notmuch.h |   16 +++
 lib/sha1.c|   70 +
 mime-node.c   |4 +--
 notmuch-show.c|   12 -
 10 files changed, 120 insertions(+), 30 deletions(-)
 create mode 100644 lib/mailstore.c

diff --git a/lib/Makefile.local b/lib/Makefile.local
index 8a9aa28..cfc77bb 100644
--- a/lib/Makefile.local
+++ b/lib/Makefile.local
@@ -51,6 +51,7 @@ libnotmuch_c_srcs =   \
$(dir)/filenames.c  \
$(dir)/string-list.c\
$(dir)/libsha1.c\
+   $(dir)/mailstore.c  \
$(dir)/message-file.c   \
$(dir)/messages.c   \
$(dir)/sha1.c   \
diff --git a/lib/database.cc b/lib/database.cc
index 761dc1a..c035edc 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1773,7 +1773,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
if (message_id == NULL ) {
/* No message-id at all, let's generate one by taking a
 * hash over the file's contents. */
-   char *sha1 = notmuch_sha1_of_file (filename);
+   char *sha1 = notmuch_sha1_of_message (filename);

/* If that failed too, something is really wrong. Give up. */
if (sha1 == NULL) {
diff --git a/lib/index.cc b/lib/index.cc
index e377732..b607e82 100644
--- a/lib/index.cc
+++ b/lib/index.cc
@@ -441,7 +441,7 @@ _notmuch_message_index_file (notmuch_message_t *message,
initialized = 1;
 }

-file = fopen (filename, "r");
+file = notmuch_mailstore_open (filename);
 if (! file) {
fprintf (stderr, "Error opening %s: %s\n", filename, strerror (errno));
ret = NOTMUCH_STATUS_FILE_ERROR;
diff --git a/lib/mailstore.c b/lib/mailstore.c
new file mode 100644
index 000..48acd47
--- /dev/null
+++ b/lib/mailstore.c
@@ -0,0 +1,34 @@
+/* mailstore.c - code to access individual messages
+ *
+ * Copyright ? 2009 Carl Worth
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/ .
+ *
+ * Author: Carl Worth 
+ */
+#include 
+
+#include "notmuch-private.h"
+
+FILE *
+notmuch_mailstore_open (const char *filename)
+{
+return fopen (filename, "r");
+}
+
+int
+notmuch_mailstore_close (FILE *file)
+{
+return fclose (file);
+}
diff --git a/lib/message-file.c b/lib/message-file.c
index 915aba8..271389c 100644
--- a/lib/message-file.c
+++ b/lib/message-file.c
@@ -86,7 +86,7 @@ _notmuch_message_file_destructor (notmuch_message_file_t 
*message)
g_hash_table_destroy (message->headers);

 if (message->file)
-   fclose (message->file);
+   notmuch_mailstore_close (message->file);

 return 0;
 }
@@ -104,7 +104,7 @@ _notmuch_message_file_open_ctx (void *ctx, const char 
*filename)

 talloc_set_destructor (message, _notmuch_message_file_destructor);

-message->file = fopen (filename, "r");
+message->file = notmuch_mailstore_open (filename);
 if (message->file == NULL)
goto FAIL;

@@ -361,7 +361,7 @@ notmuch_message_file_get_header (notmuch_message_file_t 
*message,
 }

 if (message->parsing_finished) {
-fclose (message->file);
+notmuch_mailstore_close (message->file);
 message->file = NULL;
 }

diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index bfb4111..5dbe821 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -468,6 +468,9 @@ notmuch_sha1_of_string (const char *str);
 char *
 notmuch_sha1_of_file (const char *filename);

+char *
+notmuch_sha1_of_message (const char *filename);
+
 /* string-list.c */

 typedef struct _notmuch_string_node {
diff --git a/lib/notmuch.h b/lib/notmuch.h
index 3633bed..0ca367b 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -1233,6 +1233,22 @@ notmuch_message_thaw (notmuch_message_t *message);
 void
 notmuch_message

[RFC PATCH 00/14] modular mail stores based on URIs

2012-06-25 Thread Ethan Glasser-Camp
Hi guys,

Sorry for dropping off the mailing list after I sent my last patch series 
(http://notmuchmail.org/pipermail/notmuch/2012/009470.html). I haven't had the 
time or a stable enough email address to really follow notmuch development :)

I signed onto #notmuch a week or two ago and asked what I would need to do to 
get a feature like this one into mainline. j4ni told me that he agreed with the 
feedback to my original patch series, and suggested that I follow mjw1009's 
advice of having filenames encode all information about mail storage 
transparently, and that this would solve the problem with the original patch 
series of sprinkling mail storage parameters all over the place. bremner 
suggested that he had been thinking about how to support mbox or other 
multiple-message archives, and also commented that he wasn't crazy about so 
much of the API being in strings.

Based on this advice, I decided to revise my approach to this patchset, one 
that is based around the stated desire to work with mbox formats. This 
approach, in contrast to the mailstore approach that Michal Sojka proposed and 
I revised, encodes all mail access information as URIs. These URIs are stored 
in Xapian the way that relative paths are right now. Examples might be:

maildir:///home/ethan/Mail/folder/cur/filename:2,S
mbox:///home/ethan/Mail/folder/file.mbox#byte-offset+lenght
couchdb://ethan:password at localhost:8080/some-doc-id

Personally, this isn't my favorite approach, for the following reasons:

1. Notmuch, at some point in its history, chose to store file paths relative to 
a "mail database", with the intent that if this mail database was moved, 
filenames would not change and everything would Just Work (tm). The above 
scheme completely reverses this design decision, and in general completely 
breaks this relocatability. I don't see any easy way to handle this problem. 
This isn't just a wishlist feature; at least two things in the test suite 
(caching of corpus.mail, and the atomicity tests) rely on this behavior.

2. Mail access information, i.e. open connections, etc. can only be stored in 
variables global to the mailstore code, and cannot be stored as private members 
of a mailstore object. This is more an aesthetic concern than a functional one.

Anyhow, the following (enormous) patch series implement this design. I used 
uriparser as an external library to parse URIs. The API for this library is a 
little idiosyncratic. uriparser supports parsing Unicode URIs (strings of 
wchar_t), but I just used ASCII filenames because I think that's what comes out 
of Xapian.

Patch 11 is borrowed directly from the last patch series.

The last four or five patches add mbox support, including a few tests. That 
part of the series is still very first-draft: I added a new config option to 
specify URIs to scan, and ">From " lines still need to be unescaped. However, 
we support scanning mbox files whether messages have content-length or not.

I will try to receive feedback on this series more gratefully than the last 
one. :)

Thanks again for your time,

Ethan



Re: [PATCH 0/3] Speed up notmuch new for unchanged directories

2012-06-25 Thread Sascha Silbe
Austin Clements  writes:

> On Sun, 24 Jun 2012, Sascha Silbe  wrote:

["notmuch new" listing every directory, even if it's unchanged]
> I haven't looked over your patches yet, but this result surprises me.
> Could you explain your setup a little more?  How much mail do you have
> and across how many directories?  What file system are you using?

As mentioned in passing already, I have a total of about 900k unique
mails (sometimes several copies of them, received over different paths,
e.g. mailing list and a direct CC). Most of that is "old" mails, in
directories that are not getting updated. If notmuch would support mbox,
I'd use that instead for those old mails. The total number of
directories in the mail store is about 29k and the total number of files
(including the git repository and mbox files that sup used) is about
1.25M.

Since a housekeeping job last weekend, the number of mails in
directories that are still getting updated is about 4k, i.e. about 5‰ of
the total number of mails or 3‰ of the total number of files. The number
of directories getting updated is 104, i.e. about 4‰ of the total number
of directories.

Ideally, we'd get the run-time of "notmuch new" down by a similar
factor. With just plain POSIX and no additional information that won't
be possible, but providing a way to channel information about updates
into notmuch (rather than having it scan everything over and over again)
should help. That information is already available as output from the
mail fetching process (rsync in my case). Of course, it would be purely
optional: "notmuch new" without additional information would simply
continue to scan everything.


> I'm also surprised that your new approach helps.  This directory listing
> has to be read off disk one way or the other, but listing directories is
> the bread-and-butter of file systems, whereas I would think that Xapian
> would require more IO to accomplish the same effect.

"notmuch new" needs to iterate over a list of all directories to find
those with new mails (and potentially new subdirectories). However, it
does not need to list the *contents* of those folders. I'm surprised as
well, but rather in the opposite direction: Based on a naive
calculation, we'd expect to see a speedup on the order of
(1.25M+29k)/29k = 44. The actual results suggest that stat()ing (done
29k times both before and after the patch) is taking about 19 times as
long as listing a directory entry (before the patch we listed 1M
entries, now we list none if nothing has changed). (*)

In practice, the speedup achieved by my patch is larger than what the
benchmark suggests because there are other processes running that use
RAM. If we need to read a lot from disk (like "notmuch new" did before
my patch), there's a good chance it's already been evicted from the
cache since the last run. The fewer we need to read, the more likely it
is to still be in the cache. Similarly, reading lots of data from disk
will displace other data in the cache. These effects are not covered by
the pure "hot cache" and "cold cache" timings.


> Does your patch win because you can specifically list subdirectories
> out of Xapian, making the IO proportional to the number of
> subdirectories instead of the number of subdirectories and files (even
> though the constant factors probably favor reading from the file
> system)?

It wins because the factor is the number of files in each directory, not
just some low constant based on file system overhead vs. Xapian
overhead.


> I like the idea of these patches, I just want to make sure I have a firm
> grip on what's being optimized and why it wins.

Certainly a good idea. Thanks for taking the time!

Sascha

(*) float(linsolve([29000*x + 125*y = 3.3 * 29000*x], [x])); in
maxima, if you'd like to check the math.
-- 
http://sascha.silbe.org/
http://www.infra-silbe.de/


pgpk3mdTVr6yA.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 0/3] Speed up notmuch new for unchanged directories

2012-06-25 Thread Austin Clements
On Sun, 24 Jun 2012, Sascha Silbe  wrote:
> All the time I thought what makes "notmuch new" so abysmally slow is the
> stat() for each maildir. But as it continued to be slow even after I
> moved most mails out of 'new' (into 'new-20120624'), I strace'd notmuch
> and noticed it listed even unchanged directories, thereby listing and
> iterating over each and every single of the 900k mails in my mail store.
>
> There's still quite some room for further improvements as it continues
> to take several minutes to scan < 100 new mails in changed directories
> containing < 1000 mails in total. Even the rsync run that fetches the
> new mails is faster.

I haven't looked over your patches yet, but this result surprises me.
Could you explain your setup a little more?  How much mail do you have
and across how many directories?  What file system are you using?

I'm also surprised that your new approach helps.  This directory listing
has to be read off disk one way or the other, but listing directories is
the bread-and-butter of file systems, whereas I would think that Xapian
would require more IO to accomplish the same effect.  Does your patch
win because you can specifically list subdirectories out of Xapian,
making the IO proportional to the number of subdirectories instead of
the number of subdirectories and files (even though the constant factors
probably favor reading from the file system)?

I like the idea of these patches, I just want to make sure I have a firm
grip on what's being optimized and why it wins.


bug related to ical

2012-06-25 Thread Robert Horn
I've noticed a problem related to handling of ical attachments.  I'm
using Notmuch 0.13 on Emacs 23.3.1.  I've done some basic
troubleshooting.

The problem arises with emails from Concur that include an ical
attachment being viewed with the notmuch message viewer.  The problems
are:
 1. When opening the email there is sometimes the following mesage and
 error in Emacs message buffer:
  Converting icalendar...done
  notmuch-show-insert-bodypart-internal: Wrong type argument: stringp, nil

 2. Some (not all) of the view commands fail, e.g. "v", "V", "w".
 Others work, like "m", and "q".

 3. Examination of the /tmp directory shows notmuch-ical temp files being
 created but they are zero length.

This is related to the ical attachment.  When I editted one of the emails to
remove the attachment, the problem went away.  I suspect it is related
to the attachments being base64 encoded.  The header of the mime
attachment shows:

Content-Type: application/octet-stream;
name="ConcurCalendarEntry.ics"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="ConcurCalendarEntry.ics"

The encoding is correct.  The attachment decodes and looks right.  With
some details obscured the attachment contains:

BEGIN:VCALENDAR
VERSION:2.0
METHOD:PUBLISH
BEGIN:VEVENT
DTSTART:properly-formatted
DTEND:properly-formatted
DTSTAMP:properly-formatted
LOCATION:
SUMMARY:Concur Travel Itinerary
DESCRIPTION:Lots of stuff
 with about 80 lines of description. All indented properly.
UID:properly-formatted
PRIORITY:3
TRANSP:TRANSPARENT
END:VEVENT
END:VCALENDAR

I can live without the ics files, so fixing this is not a priority for
me.  If there is someone interested in figuring this out, I've saved an
email and can answer questions.  I got lost trying to follow the lisp
code paths for attachments, so I'm not sure whether it's the text or the
base64 that is being handed off to icalendar.

R Horn
rjhorn at alum.mit.edu


extract attachments from multiple mails

2012-06-25 Thread David Belohrad

Dear All,

someone can give an advice? I have many emails containing
attachment. This is typically an output of copy-machine, which fragments
a scan into multiple attachments.

I'd like to extract those attached files in a one batch into a specific
directory. Is there any way how to programmatically fetch those files?

thanks
..d..


Re: [PATCH 0/3] Speed up notmuch new for unchanged directories

2012-06-25 Thread Austin Clements
On Sun, 24 Jun 2012, Sascha Silbe  wrote:
> All the time I thought what makes "notmuch new" so abysmally slow is the
> stat() for each maildir. But as it continued to be slow even after I
> moved most mails out of 'new' (into 'new-20120624'), I strace'd notmuch
> and noticed it listed even unchanged directories, thereby listing and
> iterating over each and every single of the 900k mails in my mail store.
>
> There's still quite some room for further improvements as it continues
> to take several minutes to scan < 100 new mails in changed directories
> containing < 1000 mails in total. Even the rsync run that fetches the
> new mails is faster.

I haven't looked over your patches yet, but this result surprises me.
Could you explain your setup a little more?  How much mail do you have
and across how many directories?  What file system are you using?

I'm also surprised that your new approach helps.  This directory listing
has to be read off disk one way or the other, but listing directories is
the bread-and-butter of file systems, whereas I would think that Xapian
would require more IO to accomplish the same effect.  Does your patch
win because you can specifically list subdirectories out of Xapian,
making the IO proportional to the number of subdirectories instead of
the number of subdirectories and files (even though the constant factors
probably favor reading from the file system)?

I like the idea of these patches, I just want to make sure I have a firm
grip on what's being optimized and why it wins.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


bug related to ical

2012-06-25 Thread Robert Horn
I've noticed a problem related to handling of ical attachments.  I'm
using Notmuch 0.13 on Emacs 23.3.1.  I've done some basic
troubleshooting.

The problem arises with emails from Concur that include an ical
attachment being viewed with the notmuch message viewer.  The problems
are:
 1. When opening the email there is sometimes the following mesage and
 error in Emacs message buffer:
  Converting icalendar...done
  notmuch-show-insert-bodypart-internal: Wrong type argument: stringp, nil

 2. Some (not all) of the view commands fail, e.g. "v", "V", "w".
 Others work, like "m", and "q".

 3. Examination of the /tmp directory shows notmuch-ical temp files being
 created but they are zero length.

This is related to the ical attachment.  When I editted one of the emails to
remove the attachment, the problem went away.  I suspect it is related
to the attachments being base64 encoded.  The header of the mime
attachment shows:

Content-Type: application/octet-stream;
name="ConcurCalendarEntry.ics"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="ConcurCalendarEntry.ics"

The encoding is correct.  The attachment decodes and looks right.  With
some details obscured the attachment contains:

BEGIN:VCALENDAR
VERSION:2.0
METHOD:PUBLISH
BEGIN:VEVENT
DTSTART:properly-formatted
DTEND:properly-formatted
DTSTAMP:properly-formatted
LOCATION:
SUMMARY:Concur Travel Itinerary
DESCRIPTION:Lots of stuff
 with about 80 lines of description. All indented properly.
UID:properly-formatted
PRIORITY:3
TRANSP:TRANSPARENT
END:VEVENT
END:VCALENDAR

I can live without the ics files, so fixing this is not a priority for
me.  If there is someone interested in figuring this out, I've saved an
email and can answer questions.  I got lost trying to follow the lisp
code paths for attachments, so I'm not sure whether it's the text or the
base64 that is being handed off to icalendar.

R Horn
rjh...@alum.mit.edu
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: extract attachments from multiple mails

2012-06-25 Thread Jameson Graef Rollins
On Mon, Jun 25 2012, Jameson Graef Rollins  wrote:
> I hacked up something simple below that will extract parts from messages
> matching a search term into the current directory (tested).

Improved/bug fixed version attached.

jamie.



jnotmuch-extract-parts
Description: Binary data


pgpJ9MjqSSAJ4.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


extract attachments from multiple mails

2012-06-25 Thread Jameson Graef Rollins
On Mon, Jun 25 2012, Jameson Graef Rollins  
wrote:
> I hacked up something simple below that will extract parts from messages
> matching a search term into the current directory (tested).

Improved/bug fixed version attached.

jamie.

-- next part --
A non-text attachment was scrubbed...
Name: jnotmuch-extract-parts
Type: application/octet-stream
Size: 1046 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20120625/cfab64bf/attachment.obj>
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20120625/cfab64bf/attachment.pgp>


Re: extract attachments from multiple mails

2012-06-25 Thread Jameson Graef Rollins
On Mon, Jun 25 2012, David Belohrad  wrote:
> someone can give an advice? I have many emails containing
> attachment. This is typically an output of copy-machine, which fragments
> a scan into multiple attachments.
>
> I'd like to extract those attached files in a one batch into a specific
> directory. Is there any way how to programmatically fetch those files?

notmuch show has a --part option for outputting a single part from a
MIME message.  Unfortunately there's currently no clean way to determine
the number of parts in a message.  But sort of hackily, you could do
something like:

for id in $(notmuch search --output=messages tag:files-to-extract); do
for part in $(seq 1 10); do
notmuch show --part=$part  --format=raw $id > $id.$part
done
done

That will also save any multipart parts, which aren't really that
useful, so you'll have to sort through them.

You can make something much cleaner with python, using the notmuch and
email python bindings:

http://packages.python.org/notmuch/
http://docs.python.org/library/email-examples.html

I hacked up something simple below that will extract parts from messages
matching a search term into the current directory (tested).

hth.

jamie.


#!/usr/bin/env python

import subprocess
import sys
import os
import notmuch
import email
import errno
import mimetypes

dbpath = subprocess.check_output(['notmuch', 'config', 'get', 
'database.path']).strip()
db = notmuch.Database(dbpath)
query = notmuch.Query(db, sys.argv[1])
for msg in query.search_messages():
with open(msg.get_filename(), 'r') as f:
msg = email.message_from_file(f)
counter = 1
for part in msg.walk():
if part.get_content_maintype() == 'multipart': continue
filename = part.get_filename()
if not filename:
ext = mimetypes.guess_extension(part.get_content_type())
if not ext:
ext = '.bin'
filename = 'part-%03d%s' % (counter, ext)
counter += 1
print filename
with open(filename, 'wb') as f:
f.write(part.get_payload(decode=True))


pgpI2jFwpIn3y.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


extract attachments from multiple mails

2012-06-25 Thread Jameson Graef Rollins
On Mon, Jun 25 2012, David Belohrad  wrote:
> someone can give an advice? I have many emails containing
> attachment. This is typically an output of copy-machine, which fragments
> a scan into multiple attachments.
>
> I'd like to extract those attached files in a one batch into a specific
> directory. Is there any way how to programmatically fetch those files?

notmuch show has a --part option for outputting a single part from a
MIME message.  Unfortunately there's currently no clean way to determine
the number of parts in a message.  But sort of hackily, you could do
something like:

for id in $(notmuch search --output=messages tag:files-to-extract); do
for part in $(seq 1 10); do
notmuch show --part=$part  --format=raw $id > $id.$part
done
done

That will also save any multipart parts, which aren't really that
useful, so you'll have to sort through them.

You can make something much cleaner with python, using the notmuch and
email python bindings:

http://packages.python.org/notmuch/
http://docs.python.org/library/email-examples.html

I hacked up something simple below that will extract parts from messages
matching a search term into the current directory (tested).

hth.

jamie.


#!/usr/bin/env python

import subprocess
import sys
import os
import notmuch
import email
import errno
import mimetypes

dbpath = subprocess.check_output(['notmuch', 'config', 'get', 
'database.path']).strip()
db = notmuch.Database(dbpath)
query = notmuch.Query(db, sys.argv[1])
for msg in query.search_messages():
with open(msg.get_filename(), 'r') as f:
msg = email.message_from_file(f)
counter = 1
for part in msg.walk():
if part.get_content_maintype() == 'multipart': continue
filename = part.get_filename()
if not filename:
ext = mimetypes.guess_extension(part.get_content_type())
if not ext:
ext = '.bin'
filename = 'part-%03d%s' % (counter, ext)
counter += 1
print filename
with open(filename, 'wb') as f:
f.write(part.get_payload(decode=True))
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20120625/e5b2a3de/attachment.pgp>


[PATCH v2] ruby: extern linkage portability improvement

2012-06-25 Thread David Bremner
Tomi Ollila  writes:

> Some C compilers are stricter when it comes to (tentative) definition
> of a variable -- in those compilers introducing variable without 'extern'
> keyword always allocates new 'storage' to the variable and linking all
> these modules fails due to duplicate symbols.

LGTM



extract attachments from multiple mails

2012-06-25 Thread David Belohrad

Dear All,

someone can give an advice? I have many emails containing
attachment. This is typically an output of copy-machine, which fragments
a scan into multiple attachments.

I'd like to extract those attached files in a one batch into a specific
directory. Is there any way how to programmatically fetch those files?

thanks
..d..
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH v2] ruby: extern linkage portability improvement

2012-06-25 Thread David Bremner
Tomi Ollila  writes:

> Some C compilers are stricter when it comes to (tentative) definition
> of a variable -- in those compilers introducing variable without 'extern'
> keyword always allocates new 'storage' to the variable and linking all
> these modules fails due to duplicate symbols.

LGTM

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v2] ruby: extern linkage portability improvement

2012-06-25 Thread Ali Polatel
2012/6/24 Tomi Ollila :
> Some C compilers are stricter when it comes to (tentative) definition
> of a variable -- in those compilers introducing variable without 'extern'
> keyword always allocates new 'storage' to the variable and linking all
> these modules fails due to duplicate symbols.
>
> This is reimplementation of Charlie Allom's patch:
> id:"1336481467-66356-1-git-send-email-charlie at mediasp.com",
> written originally by Ali Polatel. This version has
> more accurate commit message.
> ---
> ?bindings/ruby/defs.h | ? 46 +++---
> ?bindings/ruby/init.c | ? 26 ++
> ?2 files changed, 49 insertions(+), 23 deletions(-)
>
> diff --git a/bindings/ruby/defs.h b/bindings/ruby/defs.h
> index 3f9512b..fe81b3f 100644
> --- a/bindings/ruby/defs.h
> +++ b/bindings/ruby/defs.h
> @@ -24,31 +24,31 @@
> ?#include 
> ?#include 
>
> -VALUE notmuch_rb_cDatabase;
> -VALUE notmuch_rb_cDirectory;
> -VALUE notmuch_rb_cFileNames;
> -VALUE notmuch_rb_cQuery;
> -VALUE notmuch_rb_cThreads;
> -VALUE notmuch_rb_cThread;
> -VALUE notmuch_rb_cMessages;
> -VALUE notmuch_rb_cMessage;
> -VALUE notmuch_rb_cTags;
> -
> -VALUE notmuch_rb_eBaseError;
> -VALUE notmuch_rb_eDatabaseError;
> -VALUE notmuch_rb_eMemoryError;
> -VALUE notmuch_rb_eReadOnlyError;
> -VALUE notmuch_rb_eXapianError;
> -VALUE notmuch_rb_eFileError;
> -VALUE notmuch_rb_eFileNotEmailError;
> -VALUE notmuch_rb_eNullPointerError;
> -VALUE notmuch_rb_eTagTooLongError;
> -VALUE notmuch_rb_eUnbalancedFreezeThawError;
> -VALUE notmuch_rb_eUnbalancedAtomicError;
> -
> -ID ID_call;
> -ID ID_db_create;
> -ID ID_db_mode;
> +extern VALUE notmuch_rb_cDatabase;
> +extern VALUE notmuch_rb_cDirectory;
> +extern VALUE notmuch_rb_cFileNames;
> +extern VALUE notmuch_rb_cQuery;
> +extern VALUE notmuch_rb_cThreads;
> +extern VALUE notmuch_rb_cThread;
> +extern VALUE notmuch_rb_cMessages;
> +extern VALUE notmuch_rb_cMessage;
> +extern VALUE notmuch_rb_cTags;
> +
> +extern VALUE notmuch_rb_eBaseError;
> +extern VALUE notmuch_rb_eDatabaseError;
> +extern VALUE notmuch_rb_eMemoryError;
> +extern VALUE notmuch_rb_eReadOnlyError;
> +extern VALUE notmuch_rb_eXapianError;
> +extern VALUE notmuch_rb_eFileError;
> +extern VALUE notmuch_rb_eFileNotEmailError;
> +extern VALUE notmuch_rb_eNullPointerError;
> +extern VALUE notmuch_rb_eTagTooLongError;
> +extern VALUE notmuch_rb_eUnbalancedFreezeThawError;
> +extern VALUE notmuch_rb_eUnbalancedAtomicError;
> +
> +extern ID ID_call;
> +extern ID ID_db_create;
> +extern ID ID_db_mode;
>
> ?/* RSTRING_PTR() is new in ruby-1.9 */
> ?#if !defined(RSTRING_PTR)
> diff --git a/bindings/ruby/init.c b/bindings/ruby/init.c
> index 3fe60fb..f4931d3 100644
> --- a/bindings/ruby/init.c
> +++ b/bindings/ruby/init.c
> @@ -20,6 +20,32 @@
>
> ?#include "defs.h"
>
> +VALUE notmuch_rb_cDatabase;
> +VALUE notmuch_rb_cDirectory;
> +VALUE notmuch_rb_cFileNames;
> +VALUE notmuch_rb_cQuery;
> +VALUE notmuch_rb_cThreads;
> +VALUE notmuch_rb_cThread;
> +VALUE notmuch_rb_cMessages;
> +VALUE notmuch_rb_cMessage;
> +VALUE notmuch_rb_cTags;
> +
> +VALUE notmuch_rb_eBaseError;
> +VALUE notmuch_rb_eDatabaseError;
> +VALUE notmuch_rb_eMemoryError;
> +VALUE notmuch_rb_eReadOnlyError;
> +VALUE notmuch_rb_eXapianError;
> +VALUE notmuch_rb_eFileError;
> +VALUE notmuch_rb_eFileNotEmailError;
> +VALUE notmuch_rb_eNullPointerError;
> +VALUE notmuch_rb_eTagTooLongError;
> +VALUE notmuch_rb_eUnbalancedFreezeThawError;
> +VALUE notmuch_rb_eUnbalancedAtomicError;
> +
> +ID ID_call;
> +ID ID_db_create;
> +ID ID_db_mode;
> +
> ?/*
> ?* Document-module: Notmuch
> ?*
> --
> 1.7.1
>
> ___
> notmuch mailing list
> notmuch at notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

Looks highly familiar yet strangely good to me.


[PATCH] manpages: consistent "format" for NAME section

2012-06-25 Thread Tomi Ollila
The NAME section in manpages generally doesn't start with capital
letter (unless the word is 'proper noun') and doesn't end with
period. Notmuch manual pages now matches that "format".
---

See http://notmuchmail.org/manpages/ for reference.


 man/man1/notmuch-config.1   |2 +-
 man/man1/notmuch-count.1|2 +-
 man/man1/notmuch-dump.1 |2 +-
 man/man1/notmuch-new.1  |2 +-
 man/man1/notmuch-reply.1|2 +-
 man/man1/notmuch-restore.1  |2 +-
 man/man1/notmuch-search.1   |2 +-
 man/man1/notmuch-show.1 |2 +-
 man/man1/notmuch-tag.1  |2 +-
 man/man7/notmuch-search-terms.7 |2 +-
 10 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/man/man1/notmuch-config.1 b/man/man1/notmuch-config.1
index 4f7985c..2ee555d 100644
--- a/man/man1/notmuch-config.1
+++ b/man/man1/notmuch-config.1
@@ -1,6 +1,6 @@
 .TH NOTMUCH-CONFIG 1 2012-06-01 "Notmuch 0.13.2"
 .SH NAME
-notmuch-config \- Access notmuch configuration file.
+notmuch-config \- access notmuch configuration file
 .SH SYNOPSIS

 .B notmuch config get
diff --git a/man/man1/notmuch-count.1 b/man/man1/notmuch-count.1
index 8029174..8551ab2 100644
--- a/man/man1/notmuch-count.1
+++ b/man/man1/notmuch-count.1
@@ -1,6 +1,6 @@
 .TH NOTMUCH-COUNT 1 2012-06-01 "Notmuch 0.13.2"
 .SH NAME
-notmuch-count \- Count messages matching the given search terms.
+notmuch-count \- count messages matching the given search terms
 .SH SYNOPSIS

 .B notmuch count
diff --git a/man/man1/notmuch-dump.1 b/man/man1/notmuch-dump.1
index 9c7dd84..64abf01 100644
--- a/man/man1/notmuch-dump.1
+++ b/man/man1/notmuch-dump.1
@@ -1,6 +1,6 @@
 .TH NOTMUCH-DUMP 1 2012-06-01 "Notmuch 0.13.2"
 .SH NAME
-notmuch-dump \- Creates a plain-text dump of the tags of each message.
+notmuch-dump \- creates a plain-text dump of the tags of each message

 .SH SYNOPSIS

diff --git a/man/man1/notmuch-new.1 b/man/man1/notmuch-new.1
index cd83a88..e01f2eb 100644
--- a/man/man1/notmuch-new.1
+++ b/man/man1/notmuch-new.1
@@ -1,6 +1,6 @@
 .TH NOTMUCH-NEW 1 2012-06-01 "Notmuch 0.13.2"
 .SH NAME
-notmuch-new \- Incorporate new mail into the notmuch database.
+notmuch-new \- incorporate new mail into the notmuch database
 .SH SYNOPSIS

 .B notmuch new
diff --git a/man/man1/notmuch-reply.1 b/man/man1/notmuch-reply.1
index fb5114c..5aa86c0 100644
--- a/man/man1/notmuch-reply.1
+++ b/man/man1/notmuch-reply.1
@@ -1,6 +1,6 @@
 .TH NOTMUCH-REPLY 1 2012-06-01 "Notmuch 0.13.2"
 .SH NAME
-notmuch-reply \- Constructs a reply template for a set of messages.
+notmuch-reply \- constructs a reply template for a set of messages

 .SH SYNOPSIS

diff --git a/man/man1/notmuch-restore.1 b/man/man1/notmuch-restore.1
index 3156af7..18281c7 100644
--- a/man/man1/notmuch-restore.1
+++ b/man/man1/notmuch-restore.1
@@ -1,6 +1,6 @@
 .TH NOTMUCH-RESTORE 1 2012-06-01 "Notmuch 0.13.2"
 .SH NAME
-notmuch-restore \- Restores the tags from the given file (see notmuch dump).
+notmuch-restore \- restores the tags from the given file (see notmuch dump)

 .SH SYNOPSIS

diff --git a/man/man1/notmuch-search.1 b/man/man1/notmuch-search.1
index 5c72c4b..b42eb2c 100644
--- a/man/man1/notmuch-search.1
+++ b/man/man1/notmuch-search.1
@@ -1,6 +1,6 @@
 .TH NOTMUCH-SEARCH 1 2012-06-01 "Notmuch 0.13.2"
 .SH NAME
-notmuch-search \- Search for messages matching the given search terms.
+notmuch-search \- search for messages matching the given search terms
 .SH SYNOPSIS

 .B notmuch search
diff --git a/man/man1/notmuch-show.1 b/man/man1/notmuch-show.1
index 4aab17c..b51a54c 100644
--- a/man/man1/notmuch-show.1
+++ b/man/man1/notmuch-show.1
@@ -1,6 +1,6 @@
 .TH NOTMUCH-SHOW 1 2012-06-01 "Notmuch 0.13.2"
 .SH NAME
-notmuch-show \- Show messages matching the given search terms.
+notmuch-show \- show messages matching the given search terms
 .SH SYNOPSIS

 .B notmuch show
diff --git a/man/man1/notmuch-tag.1 b/man/man1/notmuch-tag.1
index 27e682e..d810e1b 100644
--- a/man/man1/notmuch-tag.1
+++ b/man/man1/notmuch-tag.1
@@ -1,6 +1,6 @@
 .TH NOTMUCH-TAG 1 2012-06-01 "Notmuch 0.13.2"
 .SH NAME
-notmuch-tag \- Add/remove tags for all messages matching the search terms.
+notmuch-tag \- add/remove tags for all messages matching the search terms

 .SH SYNOPSIS
 .B notmuch tag
diff --git a/man/man7/notmuch-search-terms.7 b/man/man7/notmuch-search-terms.7
index c559ed6..b8ab52d 100644
--- a/man/man7/notmuch-search-terms.7
+++ b/man/man7/notmuch-search-terms.7
@@ -1,7 +1,7 @@
 .TH NOTMUCH-SEARCH-TERMS 7 2012-06-01 "Notmuch 0.13.2"

 .SH NAME
-notmuch-search-terms \- Syntax for notmuch queries
+notmuch-search-terms \- syntax for notmuch queries

 .SH SYNOPSIS

-- 
1.7.1