[PATCH 2/2] Improve heuristic for guessing best from address in replies
We now look at Envelope-To: and Original-To: headers Then concat all of the Received headers and walk through them to find either a "for email at add.res" clause or a host in a known domain. This should deal with most of the fetchmail and mail hoster induced pain (and failure) of the old heuristic. Signed-off-by: Dirk Hohndel --- notmuch-reply.c | 125 +-- 1 files changed, 94 insertions(+), 31 deletions(-) diff --git a/notmuch-reply.c b/notmuch-reply.c index 230cacc..78d3914 100644 --- a/notmuch-reply.c +++ b/notmuch-reply.c @@ -305,33 +305,95 @@ add_recipients_from_message (GMimeMessage *reply, static const char * guess_from_received_header (notmuch_config_t *config, notmuch_message_t *message) { -const char *received,*primary; -char **other; -char *by,*mta,*ptr,*token; +const char *received,*primary,*by; +char **other,*tohdr; +char *mta,*ptr,*token; char *domain=NULL; char *tld=NULL; const char *delim=". \t"; size_t i,other_len; -received = notmuch_message_get_header (message, "received"); -by = strstr (received, " by "); -if (by && *(by+4)) { - /* sadly, the format of Received: headers is a bit inconsistent, -* depending on the MTA used. So we try to extract just the MTA -* here by removing leading whitespace and assuming that the MTA -* name ends at the next whitespace -* we test for *(by+4) to be non-'\0' to make sure there's something -* there at all - and then assume that the first whitespace delimited -* token that follows is the last receiving server +const char *to_headers[] = {"Envelope-to", "X-Original-To"}; + +primary = notmuch_config_get_user_primary_email (config); +other = notmuch_config_get_user_other_email (config, _len); + +/* sadly, there is no standard way to find out to which email + * address a mail was delivered - what is in the headers depends + * on the MTAs used along the way. So we are trying a number of + * heuristics which hopefully will answer this question. + + * We only got here if none of the users email addresses are in + * the To: or Cc: header. From here we try the following in order: + * 1) check for an Envelope-to: header + * 2) check for an X-Original-To: header + * 3) check for a (for ) clause in Received: headers + * 4) check for the domain part of known email addresses in the + *'by' part of Received headers + * If none of these work, we give up and return NULL + */ +for (i = 0; i < sizeof(to_headers)/sizeof(*to_headers); i++) { + tohdr = xstrdup(notmuch_message_get_header (message, to_headers[i])); + if (tohdr && *tohdr) { + /* tohdr is potentialy a list of email addresses, so here we +* check if one of the email addresses is a substring of tohdr +*/ + if (strcasestr(tohdr, primary)) { + free(tohdr); + return primary; + } + for (i = 0; i < other_len; i++) + if (strcasestr (tohdr, other[i])) { + free(tohdr); + return other[i]; + } + free(tohdr); + } +} + +/* We get the concatenated Received: headers and search from the + * front (last Received: header added) and try to extract from + * them indications to which email address this message was + * delivered. + */ +received = notmuch_message_get_concat_header (message, "received"); +/* First we look for a " for " in the received + * header + */ +ptr = strstr (received, " for "); +if (ptr) { + /* the text following is potentialy a list of email addresses, +* so again we check if one of the email addresses is a +* substring of ptr */ - mta = strdup (by+4); - if (mta == NULL) - return NULL; + if (strcasestr(ptr, primary)) { + return primary; + } + for (i = 0; i < other_len; i++) + if (strcasestr (ptr, other[i])) { + return other[i]; + } +} +/* Finally, we parse all the " by MTA ..." headers to guess the + * email address that this was originally delivered to. + * We extract just the MTA here by removing leading whitespace and + * assuming that the MTA name ends at the next whitespace. + * We test for *(by+4) to be non-'\0' to make sure there's + * something there at all - and then assume that the first + * whitespace delimited token that follows is the receiving + * system in this step of the receive chain + */ +by = received; +while((by = strstr (by, " by ")) != NULL) { + by += 4; + if (*by == '\0') + break; + mta = xstrdup (by); token = strtok(mta," \t"); if (token == NULL) - return NULL; + break; /*
[PATCH 1/2] Add interface to obtain the concatenation of all instances of a specified header
notmuch_message_get_header only returns the first instance of the specified header in a message. notmuch_message_get_concat_header concatenates the values from ALL instances of that header in a message. This is useful for example to get the full delivery path as captured in all of the Received: headers. Signed-off-by: Dirk Hohndel --- lib/database.cc | 14 +++--- lib/message-file.c| 49 +++-- lib/message.cc| 12 +++- lib/notmuch-private.h |2 +- lib/notmuch.h | 16 5 files changed, 70 insertions(+), 23 deletions(-) diff --git a/lib/database.cc b/lib/database.cc index 6842faf..d706263 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -1289,11 +1289,11 @@ _notmuch_database_link_message_to_parents (notmuch_database_t *notmuch, parents = g_hash_table_new_full (g_str_hash, g_str_equal, _my_talloc_free_for_g_hash, NULL); -refs = notmuch_message_file_get_header (message_file, "references"); +refs = notmuch_message_file_get_header (message_file, "references", 0); parse_references (message, notmuch_message_get_message_id (message), parents, refs); -in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to"); +in_reply_to = notmuch_message_file_get_header (message_file, "in-reply-to", 0); parse_references (message, notmuch_message_get_message_id (message), parents, in_reply_to); @@ -1506,9 +1506,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch, * let's make sure that what we're looking at looks like an * actual email message. */ - from = notmuch_message_file_get_header (message_file, "from"); - subject = notmuch_message_file_get_header (message_file, "subject"); - to = notmuch_message_file_get_header (message_file, "to"); + from = notmuch_message_file_get_header (message_file, "from", 0); + subject = notmuch_message_file_get_header (message_file, "subject", 0); + to = notmuch_message_file_get_header (message_file, "to", 0); if ((from == NULL || *from == '\0') && (subject == NULL || *subject == '\0') && @@ -1521,7 +1521,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch, /* Now that we're sure it's mail, the first order of business * is to find a message ID (or else create one ourselves). */ - header = notmuch_message_file_get_header (message_file, "message-id"); + header = notmuch_message_file_get_header (message_file, "message-id", 0); if (header && *header != '\0') { message_id = _parse_message_id (message_file, header, NULL); @@ -1580,7 +1580,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch, if (ret) goto DONE; - date = notmuch_message_file_get_header (message_file, "date"); + date = notmuch_message_file_get_header (message_file, "date", 0); _notmuch_message_set_date (message, date); _notmuch_message_index_file (message, filename); diff --git a/lib/message-file.c b/lib/message-file.c index 0c152a3..a01adbb 100644 --- a/lib/message-file.c +++ b/lib/message-file.c @@ -209,15 +209,21 @@ copy_header_unfolding (header_value_closure_t *value, /* As a special-case, a value of NULL for header_desired will force * the entire header to be parsed if it is not parsed already. This is - * used by the _notmuch_message_file_get_headers_end function. */ + * used by the _notmuch_message_file_get_headers_end function. + * If concat is 'true' then it parses the whole message and + * concatenates all instances of the header in question. This is + * currently used to get a complete Received: header when analyzing + * the path the mail has taken from sender to recipient. + */ const char * notmuch_message_file_get_header (notmuch_message_file_t *message, -const char *header_desired) +const char *header_desired, +int concat) { int contains; -char *header, *decoded_value; +char *header, *decoded_value, *header_sofar, *combined_header; const char *s, *colon; -int match; +int match, newhdr, hdrsofar; static int initialized = 0; if (! initialized) { @@ -227,7 +233,7 @@ notmuch_message_file_get_header (notmuch_message_file_t *message, message->parsing_started = 1; -if (header_desired == NULL) +if (concat || header_desired == NULL) contains = 0; else contains = g_hash_table_lookup_extended (message->headers, @@ -237,6 +243,9 @@ notmuch_message_file_get_header (notmuch_message_file_t *message, if (contains && decoded_value) return decoded_value; +if (concat) + message->parsing_finished = 0; + if (message->parsing_finished) return
improve from-header guessing
The following two patches should address most of the concerns raised to my previous series. The first patch simply adds an interface to obtain a concatenation of all instances of a specific header from an email. The second patch uses that in order to get the full Received: headers. It now looks at Envelope-to: and X-Original-To: headers, then at the concatenated Received headers for either a "for email at add.res" clause that matches a configured address or for a " by " clause that matches the domain of a configured address. What is still missing is the check if the host from which the mail was received in this last case had a routable IP address.
Opening the merge window for 0.3
I'm officially opening the merge window for the upcoming 0.3 release (about a week from now). I know that I want to merge David Edmunson's rewrite of the emacs interface to be built on top of --format=json and add a ton of features, (better attachment handling, notmuch-hello, etc.). I think that's more than enough to justify a new release right there. But if you have other things you'd like to see in this release, please send a message to the list, (either as a new message or a reply to a previous post where the feature/bug-fix was originally proposed). Please don't send feature requests as replies to this message. The best merge requests will be things that have existing, tested patches already. I know that some of the most desired features right now are folder: searching and maildir-flag synchronization. Unfortunately, I think both of these will be postponed until the 0.4 release, but I could be wrong. -Carl PS. I still never sent a list of the features which were proposed for the 0.2 release but postponed. I'll assemble that list soon with my comments on where each of the features stand. -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100416/86b2c7b6/attachment.pgp>
[Announce] notmuch release 0.2 now available
more: --build --infodir --libexecdir --localstatedir --disable-maintainer-mode --disable-dependency-tracking Install emacs client in "make install" rather than requiring a separate "make install-emacs". Automatically compute versions numbers between releases. This support uses the git-describe notation, so a version such as 0.1-144-g43cbbfc indicates a version that is 144 commits since the 0.1 release and is available as git commit "43cbbfc". Add a new "make test" target to run the test suite and actually verify its results. What is notmuch === Notmuch is a system for indexing, searching, reading, and tagging large collections of email messages in maildir or mh format. It uses the Xapian library to provide fast, full-text search with a convenient search syntax. For more about notmuch, see http://notmuchmail.org -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100416/5bbdfba8/attachment.pgp>
[PATCH] First tests for JSON output and UTF-8 in mail body and subject
On Wed, 14 Apr 2010 17:35:44 -0700, Carl Worth wrote: > [*] I say "should" because I don't believe we have any actual > specification of the data coming out of the JSON output yet. One other > thing that seems odd is the name of "date_unix" in the show output and > "timestamp" in the search output for what is effectively the same > field. The show output is updated to use `timestamp' in id:1271418469-19031-1-git-send-email-dme at dme.org (just sent). dme. -- David Edmondson, http://dme.org
[PATCH] json: Replace `date_unix' with `timestamp' in show output
Search output was already using `timestamp' for a very similar field, so follow that. --- notmuch-show.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/notmuch-show.c b/notmuch-show.c index 76873a1..26449fa 100644 --- a/notmuch-show.c +++ b/notmuch-show.c @@ -145,7 +145,7 @@ format_message_json (const void *ctx, notmuch_message_t *message, unused (int in date = notmuch_message_get_date (message); relative_date = notmuch_time_relative_date (ctx, date); -printf ("\"id\": %s, \"match\": %s, \"filename\": %s, \"date_unix\": %ld, \"date_relative\": \"%s\", \"tags\": [", +printf ("\"id\": %s, \"match\": %s, \"filename\": %s, \"timestamp\": %ld, \"date_relative\": \"%s\", \"tags\": [", json_quote_str (ctx_quote, notmuch_message_get_message_id (message)), notmuch_message_get_flag (message, NOTMUCH_MESSAGE_FLAG_MATCH) ? "true" : "false", json_quote_str (ctx_quote, notmuch_message_get_filename (message)), -- 1.7.0
notmuchsync --move (was: add a number of new feature ideas to TODO file)
On 2010-04-16, Dirk Hohndel wrote: > +Thirdparty apps > +--- > +(not sure this is the best spot to collect requests like this) > + > +notmuchsync > + > +Add feature to move files in the maildir hierarchy > + > + notmuchsync --move "searchstring" "targetfolder" > + Where searchstring is any valid notmuch search > + You can remove that bit from the patch, it is implemented now :-) notmuchsync --move "querystring" "targetfolder" (use with --dry-run and -d to preview changes) once folder: search is implemented you can e.g. simply do: notmuchsync --move "not tag:inbox and folder:inbox" /home/spaetz/mail/archive/cur and make your IMAP web clients (or iphones) happy. This works right now already: notmuchsync --move "not tag:inbox" /home/spaetz/mail/archive/cur but is of course slower (still ok) as it has to traverse through most of your mails. Sebastian
"bouncing" messages
On Fri, 16 Apr 2010 10:34:53 +0200, Peter Wiersig wrote: > On Thu, 15 Apr 2010 17:27:17 -0400, Jameson Rollins finestructure.net> wrote: > > Does anyone know how to "bounce" a message, > > pipe the message to "sendmail user at axample.com" > > Well, ok, mutt adds "Resent-*" headers to the bounced message, so there > it's not unaltered. Great, thanks so much for the suggestion, Peter. That's easy enough. jamie. -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100416/fa824523/attachment.pgp>
"bouncing" messages
On Thu, 15 Apr 2010 17:27:17 -0400, Jameson Rollins wrote: > Does anyone know how to "bounce" a message, pipe the message to "sendmail user at axample.com" Well, ok, mutt adds "Resent-*" headers to the bounced message, so there it's not unaltered. Peter
[PATCH] First tests for JSON output and UTF-8 in mail body and subject
> But you might actually like that change since it's one you requested in > your first version of the modular test suite. I'm dropping the annoying > execute_expecting macro that both runs notmuch and tests the > output. There's now a much cleaner separation such as: > > output=$($NOTMUCH search for-something) > pass_if_equal "$output" "something was found" It's definitely better than before. The current implementation of pass_if_equal has IMHO one drawback - if it compares multiline text and there is a difference, it is quite hard to see where. In my tests for maildir synchronization I use this approach: notmuch search tag:inbox | filter_output > actual && diff -u - actual <
[notmuch] Bulk message tagging
On 15 April 2010 21:46, Carl Worth wrote: [...] > We'll probably need to arrange for notmuch to accept search > specifications on stdin or so. Or a daemon mode with a pipe or DBus interface. Servilio
[PATCH] notmuch.c: Shorten version string
We previously output "notmuch version 0.1" as response to notmuch --version. Shorten this to "notmuch 0.1" as we know that we will receive a version number when we explicitely ask for it. Signed-off-by: Sebastian Spaeth --- notmuch.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/notmuch.c b/notmuch.c index dcfda32..0eea5e1 100644 --- a/notmuch.c +++ b/notmuch.c @@ -474,7 +474,7 @@ main (int argc, char *argv[]) return notmuch_help_command (NULL, 0, NULL); if (STRNCMP_LITERAL (argv[1], "--version") == 0) { - printf ("notmuch version " STRINGIFY(NOTMUCH_VERSION) "\n"); + printf ("notmuch " STRINGIFY(NOTMUCH_VERSION) "\n"); return 0; } -- 1.7.0.4
[PATCH 3/3] notmuch-tag: don't sort messages before applying tag changes
It's not neccessary to sort the results before we apply tags. Xapian contributor Olly Betts says that savings might be bigger with a cold file cache and (as unsorted implies really sorted by document id) a better cache locality when applying tags to messages. Signed-off-by: Sebastian Spaeth --- notmuch-tag.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/notmuch-tag.c b/notmuch-tag.c index 8b6f7dc..fd54bc7 100644 --- a/notmuch-tag.c +++ b/notmuch-tag.c @@ -107,6 +107,9 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[])) return 1; } +/* tagging is not interested in any special sort order */ +notmuch_query_set_sort (query, NOTMUCH_SORT_UNSORTED); + for (messages = notmuch_query_search_messages (query); notmuch_messages_valid (messages) && !interrupted; notmuch_messages_move_to_next (messages)) -- 1.7.0.4
[PATCH 2/3] notmuch-search: Introduce --sort=unsorted
In some cases, we might not be interested in any special sort order, so this introduces a --sort=unsorted command line option together with its documentation. Signed-off-by: Sebastian Spaeth --- notmuch-search.c |2 ++ notmuch.1| 10 ++ notmuch.c|7 --- 3 files changed, 12 insertions(+), 7 deletions(-) diff --git a/notmuch-search.c b/notmuch-search.c index 4e3514b..854a9ae 100644 --- a/notmuch-search.c +++ b/notmuch-search.c @@ -217,6 +217,8 @@ notmuch_search_command (void *ctx, int argc, char *argv[]) sort = NOTMUCH_SORT_OLDEST_FIRST; } else if (strcmp (opt, "newest-first") == 0) { sort = NOTMUCH_SORT_NEWEST_FIRST; + } else if (strcmp (opt, "unsorted") == 0) { + sort = NOTMUCH_SORT_UNSORTED; } else { fprintf (stderr, "Invalid value for --sort: %s\n", opt); return 1; diff --git a/notmuch.1 b/notmuch.1 index 86830f4..6d4beaf 100644 --- a/notmuch.1 +++ b/notmuch.1 @@ -152,12 +152,14 @@ Presents the results in either JSON or plain-text (default). .RE .RS 4 .TP 4 -.BR \-\-sort= ( newest\-first | oldest\-first ) +.BR \-\-sort= ( newest\-first | oldest\-first | unsorted) This option can be used to present results in either chronological order -.RB ( oldest\-first ) -or reverse chronological order -.RB ( newest\-first ). +.RB ( oldest\-first ), +reverse chronological order +.RB ( newest\-first ) +or without any defined sort order +.RB ( unsorted ). Note: The thread order will be distinct between these two options (beyond being simply reversed). When sorting by diff --git a/notmuch.c b/notmuch.c index dcfda32..e31dd88 100644 --- a/notmuch.c +++ b/notmuch.c @@ -165,11 +165,12 @@ command_t commands[] = { "\t\tPresents the results in either JSON or\n" "\t\tplain-text (default)\n" "\n" - "\t--sort=(newest-first|oldest-first)\n" + "\t--sort=(newest-first|oldest-first|unsorted)\n" "\n" "\t\tPresent results in either chronological order\n" - "\t\t(oldest-first) or reverse chronological order\n" - "\t\t(newest-first), which is the default.\n" + "\t\t(oldest-first),reverse chronological order\n" + "\t\t(newest-first), which is the default or\n" + "\t\t(unsorted) without any special sort order.\n" "\n" "\tSee \"notmuch help search-terms\" for details of the search\n" "\tterms syntax." }, -- 1.7.0.4
[PATCH 1/3] query.cc: allow to return query results unsorted
Previously, we always sorted the returned results by some string value, (newest-to-oldest by default), however in some cases (as when applying tags to a search result) we are not interested in any special order. This introduces a NOTMUCH_SORT_UNSORTED value that does just that. It is not used at the moment anywhere in the code. Signed-off-by: Sebastian Spaeth --- lib/notmuch.h |3 ++- lib/query.cc |2 ++ 2 files changed, 4 insertions(+), 1 deletions(-) diff --git a/lib/notmuch.h b/lib/notmuch.h index a7e66dd..bae48a6 100644 --- a/lib/notmuch.h +++ b/lib/notmuch.h @@ -346,7 +346,8 @@ notmuch_query_create (notmuch_database_t *database, typedef enum { NOTMUCH_SORT_OLDEST_FIRST, NOTMUCH_SORT_NEWEST_FIRST, -NOTMUCH_SORT_MESSAGE_ID +NOTMUCH_SORT_MESSAGE_ID, +NOTMUCH_SORT_UNSORTED } notmuch_sort_t; /* Specify the sorting desired for this query. */ diff --git a/lib/query.cc b/lib/query.cc index 10f8dc8..4148f9b 100644 --- a/lib/query.cc +++ b/lib/query.cc @@ -148,6 +148,8 @@ notmuch_query_search_messages (notmuch_query_t *query) case NOTMUCH_SORT_MESSAGE_ID: enquire.set_sort_by_value (NOTMUCH_VALUE_MESSAGE_ID, FALSE); break; +case NOTMUCH_SORT_UNSORTED: + break; } #if DEBUG_QUERY -- 1.7.0.4
[PATCH] allow to not sort the search results
On 2010-04-15, Olly Betts wrote: > > I would be happy to have it called --sort=relevance too, the unsorted > > points out potential performance improvements a bit better, IMHO > > (although they seem to be really small with a warm cache). > > When using the results of a search to add/remove tags, there's likely to be > an additional win from --sort=unsorted as documents will now be processed > in docid order which will tend to have a more cache friendly locality of > access. Olly was right in that even for "notmuch tag" we were sorting the results by date before applying tag changes. I have slightly reworked my patch to have notmuch tag avoid doing that. I also split up the patch in 3 patches that do one thing each. The patches do: 1: Introduce NOTMUCH_SORT_UNSORTED 2: Introduce notmuch search --sort=unsorted 3: Make notmuch tag not sort results by date #2 is the one I am least sure about, I don't know if there is a use case for notmuch search returning unsorted results. But 1 & 3 are useful at least. > Also, sorting by relevance requires more calculations and may require fetching > additional data (document length for example). > > So I think it would make sense for --sort=relevance and --sort=unsorted to be > separate options. Now I am a bit confused. The API docs state that sort_by_relevance is the default. So by skipping any sort_by_value() will that incur the additional calculations (with our BoolWeight set?). All I want is the fasted way to return a searched set of docs :-). Patches 1-3 follow as reply to this one Sebastian
[PATCH] allow to not sort the search results
On Fri, Apr 16, 2010 at 08:37:04AM +0200, Sebastian Spaeth wrote: > On 2010-04-15, Olly Betts wrote: > > Also, sorting by relevance requires more calculations and may require > > fetching additional data (document length for example). > > > > So I think it would make sense for --sort=relevance and --sort=unsorted to > > be separate options. > > Now I am a bit confused. The API docs state that sort_by_relevance is > the default. So by skipping any sort_by_value() will that incur the additional > calculations (with our BoolWeight set?). All I want is the fasted way > to return a searched set of docs :-). Yes, sort_by_relevance() is the default. But if you set BoolWeight as the weighting scheme then the relevance is simply zero, and Xapian doesn't have to fetch any statistics and calculate a score from them. When documents have exactly equal relevance weight, then the docid order is used. So although sort_by_relevance() is technically still on with BoolWeight, by "sorting by relevance" I wasn't talking about this case. So --sort=unsorted and --sort=relevance would only differ in code by the former setting BoolWeight and the latter not. Cheers, Olly
[notmuch] Bulk message tagging
On Thu, 15 Apr 2010 18:46:56 -0700, Carl Worth wrote: > On Thu, 15 Apr 2010 16:04:38 -0400, Jesse Rosenthal > wrote: > > the region command only executes one "notmuch tag" command over > > "id:X or id:Y or id:Z or ...". > > ...this operation is all set up to run into "argument list too long" > errors. I've never run into this error. Is there a specific length that triggers it? If so, we could chunk the tagging command. Or does the max length depend on the machine and system?
[PATCH] notmuch.c: Shorten version string
On Fri, 16 Apr 2010 09:06:02 +0200, Sebastian Spaeth wrote: > We previously output "notmuch version 0.1" as response to notmuch --version. > Shorten this to "notmuch 0.1" as we know that we will receive a version > number when we explicitely ask for it. Thanks for the reminder. Pushed. -Carl -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100416/16b4ac6a/attachment.pgp>
Re: [PATCH] allow to not sort the search results
On 2010-04-15, Olly Betts wrote: I would be happy to have it called --sort=relevance too, the unsorted points out potential performance improvements a bit better, IMHO (although they seem to be really small with a warm cache). When using the results of a search to add/remove tags, there's likely to be an additional win from --sort=unsorted as documents will now be processed in docid order which will tend to have a more cache friendly locality of access. Olly was right in that even for notmuch tag we were sorting the results by date before applying tag changes. I have slightly reworked my patch to have notmuch tag avoid doing that. I also split up the patch in 3 patches that do one thing each. The patches do: 1: Introduce NOTMUCH_SORT_UNSORTED 2: Introduce notmuch search --sort=unsorted 3: Make notmuch tag not sort results by date #2 is the one I am least sure about, I don't know if there is a use case for notmuch search returning unsorted results. But 1 3 are useful at least. Also, sorting by relevance requires more calculations and may require fetching additional data (document length for example). So I think it would make sense for --sort=relevance and --sort=unsorted to be separate options. Now I am a bit confused. The API docs state that sort_by_relevance is the default. So by skipping any sort_by_value() will that incur the additional calculations (with our BoolWeight set?). All I want is the fasted way to return a searched set of docs :-). Patches 1-3 follow as reply to this one Sebastian ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[PATCH 2/3] notmuch-search: Introduce --sort=unsorted
In some cases, we might not be interested in any special sort order, so this introduces a --sort=unsorted command line option together with its documentation. Signed-off-by: Sebastian Spaeth sebast...@sspaeth.de --- notmuch-search.c |2 ++ notmuch.1| 10 ++ notmuch.c|7 --- 3 files changed, 12 insertions(+), 7 deletions(-) diff --git a/notmuch-search.c b/notmuch-search.c index 4e3514b..854a9ae 100644 --- a/notmuch-search.c +++ b/notmuch-search.c @@ -217,6 +217,8 @@ notmuch_search_command (void *ctx, int argc, char *argv[]) sort = NOTMUCH_SORT_OLDEST_FIRST; } else if (strcmp (opt, newest-first) == 0) { sort = NOTMUCH_SORT_NEWEST_FIRST; + } else if (strcmp (opt, unsorted) == 0) { + sort = NOTMUCH_SORT_UNSORTED; } else { fprintf (stderr, Invalid value for --sort: %s\n, opt); return 1; diff --git a/notmuch.1 b/notmuch.1 index 86830f4..6d4beaf 100644 --- a/notmuch.1 +++ b/notmuch.1 @@ -152,12 +152,14 @@ Presents the results in either JSON or plain-text (default). .RE .RS 4 .TP 4 -.BR \-\-sort= ( newest\-first | oldest\-first ) +.BR \-\-sort= ( newest\-first | oldest\-first | unsorted) This option can be used to present results in either chronological order -.RB ( oldest\-first ) -or reverse chronological order -.RB ( newest\-first ). +.RB ( oldest\-first ), +reverse chronological order +.RB ( newest\-first ) +or without any defined sort order +.RB ( unsorted ). Note: The thread order will be distinct between these two options (beyond being simply reversed). When sorting by diff --git a/notmuch.c b/notmuch.c index dcfda32..e31dd88 100644 --- a/notmuch.c +++ b/notmuch.c @@ -165,11 +165,12 @@ command_t commands[] = { \t\tPresents the results in either JSON or\n \t\tplain-text (default)\n \n - \t--sort=(newest-first|oldest-first)\n + \t--sort=(newest-first|oldest-first|unsorted)\n \n \t\tPresent results in either chronological order\n - \t\t(oldest-first) or reverse chronological order\n - \t\t(newest-first), which is the default.\n + \t\t(oldest-first),reverse chronological order\n + \t\t(newest-first), which is the default or\n + \t\t(unsorted) without any special sort order.\n \n \tSee \notmuch help search-terms\ for details of the search\n \tterms syntax. }, -- 1.7.0.4 ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[PATCH 3/3] notmuch-tag: don't sort messages before applying tag changes
It's not neccessary to sort the results before we apply tags. Xapian contributor Olly Betts says that savings might be bigger with a cold file cache and (as unsorted implies really sorted by document id) a better cache locality when applying tags to messages. Signed-off-by: Sebastian Spaeth sebast...@sspaeth.de --- notmuch-tag.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/notmuch-tag.c b/notmuch-tag.c index 8b6f7dc..fd54bc7 100644 --- a/notmuch-tag.c +++ b/notmuch-tag.c @@ -107,6 +107,9 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[])) return 1; } +/* tagging is not interested in any special sort order */ +notmuch_query_set_sort (query, NOTMUCH_SORT_UNSORTED); + for (messages = notmuch_query_search_messages (query); notmuch_messages_valid (messages) !interrupted; notmuch_messages_move_to_next (messages)) -- 1.7.0.4 ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH] allow to not sort the search results
On Fri, Apr 16, 2010 at 08:37:04AM +0200, Sebastian Spaeth wrote: On 2010-04-15, Olly Betts wrote: Also, sorting by relevance requires more calculations and may require fetching additional data (document length for example). So I think it would make sense for --sort=relevance and --sort=unsorted to be separate options. Now I am a bit confused. The API docs state that sort_by_relevance is the default. So by skipping any sort_by_value() will that incur the additional calculations (with our BoolWeight set?). All I want is the fasted way to return a searched set of docs :-). Yes, sort_by_relevance() is the default. But if you set BoolWeight as the weighting scheme then the relevance is simply zero, and Xapian doesn't have to fetch any statistics and calculate a score from them. When documents have exactly equal relevance weight, then the docid order is used. So although sort_by_relevance() is technically still on with BoolWeight, by sorting by relevance I wasn't talking about this case. So --sort=unsorted and --sort=relevance would only differ in code by the former setting BoolWeight and the latter not. Cheers, Olly ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[PATCH] notmuch.c: Shorten version string
We previously output notmuch version 0.1 as response to notmuch --version. Shorten this to notmuch 0.1 as we know that we will receive a version number when we explicitely ask for it. Signed-off-by: Sebastian Spaeth sebast...@sspaeth.de --- notmuch.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/notmuch.c b/notmuch.c index dcfda32..0eea5e1 100644 --- a/notmuch.c +++ b/notmuch.c @@ -474,7 +474,7 @@ main (int argc, char *argv[]) return notmuch_help_command (NULL, 0, NULL); if (STRNCMP_LITERAL (argv[1], --version) == 0) { - printf (notmuch version STRINGIFY(NOTMUCH_VERSION) \n); + printf (notmuch STRINGIFY(NOTMUCH_VERSION) \n); return 0; } -- 1.7.0.4 ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH] First tests for JSON output and UTF-8 in mail body and subject
But you might actually like that change since it's one you requested in your first version of the modular test suite. I'm dropping the annoying execute_expecting macro that both runs notmuch and tests the output. There's now a much cleaner separation such as: output=$($NOTMUCH search for-something) pass_if_equal $output something was found It's definitely better than before. The current implementation of pass_if_equal has IMHO one drawback - if it compares multiline text and there is a difference, it is quite hard to see where. In my tests for maildir synchronization I use this approach: notmuch search tag:inbox | filter_output actual diff -u - actual EOF thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; test message 3 (inbox) EOF Thanks to the usee of diff, I immediately see only the differences. -Michal ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
notmuchsync --move (was: add a number of new feature ideas to TODO file)
On 2010-04-16, Dirk Hohndel wrote: +Thirdparty apps +--- +(not sure this is the best spot to collect requests like this) + +notmuchsync + +Add feature to move files in the maildir hierarchy + + notmuchsync --move searchstring targetfolder + Where searchstring is any valid notmuch search + You can remove that bit from the patch, it is implemented now :-) notmuchsync --move querystring targetfolder (use with --dry-run and -d to preview changes) once folder: search is implemented you can e.g. simply do: notmuchsync --move not tag:inbox and folder:inbox /home/spaetz/mail/archive/cur and make your IMAP web clients (or iphones) happy. This works right now already: notmuchsync --move not tag:inbox /home/spaetz/mail/archive/cur but is of course slower (still ok) as it has to traverse through most of your mails. Sebastian ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [notmuch] Bulk message tagging
On Thu, 15 Apr 2010 18:46:56 -0700, Carl Worth cwo...@cworth.org wrote: On Thu, 15 Apr 2010 16:04:38 -0400, Jesse Rosenthal jrosent...@jhu.edu wrote: the region command only executes one notmuch tag command over id:X or id:Y or id:Z or ...this operation is all set up to run into argument list too long errors. I've never run into this error. Is there a specific length that triggers it? If so, we could chunk the tagging command. Or does the max length depend on the machine and system? ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: bouncing messages
On Thu, 15 Apr 2010 17:27:17 -0400, Jameson Rollins jroll...@finestructure.net wrote: Does anyone know how to bounce a message, pipe the message to sendmail u...@axample.com Well, ok, mutt adds Resent-* headers to the bounced message, so there it's not unaltered. Peter ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: bouncing messages
On Fri, 16 Apr 2010 10:34:53 +0200, Peter Wiersig fri...@london087.server4you.de wrote: On Thu, 15 Apr 2010 17:27:17 -0400, Jameson Rollins jroll...@finestructure.net wrote: Does anyone know how to bounce a message, pipe the message to sendmail u...@axample.com Well, ok, mutt adds Resent-* headers to the bounced message, so there it's not unaltered. Great, thanks so much for the suggestion, Peter. That's easy enough. jamie. pgpTzC23t4nKT.pgp Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
improve from-header guessing
The following two patches should address most of the concerns raised to my previous series. The first patch simply adds an interface to obtain a concatenation of all instances of a specific header from an email. The second patch uses that in order to get the full Received: headers. It now looks at Envelope-to: and X-Original-To: headers, then at the concatenated Received headers for either a for em...@add.res clause that matches a configured address or for a by clause that matches the domain of a configured address. What is still missing is the check if the host from which the mail was received in this last case had a routable IP address. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch