[PATCH 2/2] Improve heuristic for guessing best from address in replies

2010-04-16 Thread Dirk Hohndel
We now look at Envelope-To: and Original-To: headers
Then concat all of the Received headers and walk through them to find
either a "for email at add.res" clause or a host in a known domain.

This should deal with most of the fetchmail and mail hoster induced
pain (and failure) of the old heuristic.

Signed-off-by: Dirk Hohndel 
---
 notmuch-reply.c |  125 +--
 1 files changed, 94 insertions(+), 31 deletions(-)

diff --git a/notmuch-reply.c b/notmuch-reply.c
index 230cacc..78d3914 100644
--- a/notmuch-reply.c
+++ b/notmuch-reply.c
@@ -305,33 +305,95 @@ add_recipients_from_message (GMimeMessage *reply,
 static const char *
 guess_from_received_header (notmuch_config_t *config, notmuch_message_t 
*message)
 {
-const char *received,*primary;
-char **other;
-char *by,*mta,*ptr,*token;
+const char *received,*primary,*by;
+char **other,*tohdr;
+char *mta,*ptr,*token;
 char *domain=NULL;
 char *tld=NULL;
 const char *delim=". \t";
 size_t i,other_len;

-received = notmuch_message_get_header (message, "received");
-by = strstr (received, " by ");
-if (by && *(by+4)) {
-   /* sadly, the format of Received: headers is a bit inconsistent,
-* depending on the MTA used. So we try to extract just the MTA
-* here by removing leading whitespace and assuming that the MTA
-* name ends at the next whitespace
-* we test for *(by+4) to be non-'\0' to make sure there's something
-* there at all - and then assume that the first whitespace delimited
-* token that follows is the last receiving server
+const char *to_headers[] = {"Envelope-to", "X-Original-To"};
+
+primary = notmuch_config_get_user_primary_email (config);
+other = notmuch_config_get_user_other_email (config, _len);
+
+/* sadly, there is no standard way to find out to which email
+ * address a mail was delivered - what is in the headers depends
+ * on the MTAs used along the way. So we are trying a number of
+ * heuristics which hopefully will answer this question.
+
+ * We only got here if none of the users email addresses are in
+ * the To: or Cc: header. From here we try the following in order:
+ * 1) check for an Envelope-to: header
+ * 2) check for an X-Original-To: header
+ * 3) check for a (for ) clause in Received: headers
+ * 4) check for the domain part of known email addresses in the 
+ *'by' part of Received headers
+ * If none of these work, we give up and return NULL
+ */
+for (i = 0; i < sizeof(to_headers)/sizeof(*to_headers); i++) {
+   tohdr = xstrdup(notmuch_message_get_header (message, to_headers[i]));
+   if (tohdr && *tohdr) {
+   /* tohdr is potentialy a list of email addresses, so here we
+* check if one of the email addresses is a substring of tohdr
+*/
+   if (strcasestr(tohdr, primary)) {
+   free(tohdr);
+   return primary;
+   }
+   for (i = 0; i < other_len; i++)
+   if (strcasestr (tohdr, other[i])) {
+   free(tohdr);
+   return other[i];
+   }
+   free(tohdr);
+   }
+}
+  
+/* We get the concatenated Received: headers and search from the
+ * front (last Received: header added) and try to extract from
+ * them indications to which email address this message was
+ * delivered.
+ */
+received = notmuch_message_get_concat_header (message, "received");
+/* First we look for a " for " in the received
+ * header
+ */
+ptr = strstr (received, " for ");
+if (ptr) {
+   /* the text following is potentialy a list of email addresses,
+* so again we check if one of the email addresses is a
+* substring of ptr
 */
-   mta = strdup (by+4);
-   if (mta == NULL)
-   return NULL;
+   if (strcasestr(ptr, primary)) {
+   return primary;
+   }
+   for (i = 0; i < other_len; i++)
+   if (strcasestr (ptr, other[i])) {
+   return other[i];
+   }
+}
+/* Finally, we parse all the " by MTA ..." headers to guess the
+ * email address that this was originally delivered to.
+ * We extract just the MTA here by removing leading whitespace and
+ * assuming that the MTA name ends at the next whitespace.
+ * We test for *(by+4) to be non-'\0' to make sure there's
+ * something there at all - and then assume that the first
+ * whitespace delimited token that follows is the receiving
+ * system in this step of the receive chain
+ */
+by = received;
+while((by = strstr (by, " by ")) != NULL) {
+   by += 4;
+   if (*by == '\0')
+   break;
+   mta = xstrdup (by);
token = strtok(mta," \t");
if (token == NULL)
-   return NULL;
+   break;
/* 

[PATCH 1/2] Add interface to obtain the concatenation of all instances of a specified header

2010-04-16 Thread Dirk Hohndel
notmuch_message_get_header only returns the first instance of the specified
header in a message.
notmuch_message_get_concat_header concatenates the values from ALL instances
of that header in a message. This is useful for example to get the full
delivery path as captured in all of the Received: headers.

Signed-off-by: Dirk Hohndel 
---
 lib/database.cc   |   14 +++---
 lib/message-file.c|   49 +++--
 lib/message.cc|   12 +++-
 lib/notmuch-private.h |2 +-
 lib/notmuch.h |   16 
 5 files changed, 70 insertions(+), 23 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 6842faf..d706263 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1289,11 +1289,11 @@ _notmuch_database_link_message_to_parents 
(notmuch_database_t *notmuch,
 parents = g_hash_table_new_full (g_str_hash, g_str_equal,
 _my_talloc_free_for_g_hash, NULL);

-refs = notmuch_message_file_get_header (message_file, "references");
+refs = notmuch_message_file_get_header (message_file, "references", 0);
 parse_references (message, notmuch_message_get_message_id (message),
  parents, refs);

-in_reply_to = notmuch_message_file_get_header (message_file, 
"in-reply-to");
+in_reply_to = notmuch_message_file_get_header (message_file, 
"in-reply-to", 0);
 parse_references (message, notmuch_message_get_message_id (message),
  parents, in_reply_to);

@@ -1506,9 +1506,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
 * let's make sure that what we're looking at looks like an
 * actual email message.
 */
-   from = notmuch_message_file_get_header (message_file, "from");
-   subject = notmuch_message_file_get_header (message_file, "subject");
-   to = notmuch_message_file_get_header (message_file, "to");
+   from = notmuch_message_file_get_header (message_file, "from", 0);
+   subject = notmuch_message_file_get_header (message_file, "subject", 0);
+   to = notmuch_message_file_get_header (message_file, "to", 0);

if ((from == NULL || *from == '\0') &&
(subject == NULL || *subject == '\0') &&
@@ -1521,7 +1521,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
/* Now that we're sure it's mail, the first order of business
 * is to find a message ID (or else create one ourselves). */

-   header = notmuch_message_file_get_header (message_file, "message-id");
+   header = notmuch_message_file_get_header (message_file, "message-id", 
0);
if (header && *header != '\0') {
message_id = _parse_message_id (message_file, header, NULL);

@@ -1580,7 +1580,7 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
if (ret)
goto DONE;

-   date = notmuch_message_file_get_header (message_file, "date");
+   date = notmuch_message_file_get_header (message_file, "date", 0);
_notmuch_message_set_date (message, date);

_notmuch_message_index_file (message, filename);
diff --git a/lib/message-file.c b/lib/message-file.c
index 0c152a3..a01adbb 100644
--- a/lib/message-file.c
+++ b/lib/message-file.c
@@ -209,15 +209,21 @@ copy_header_unfolding (header_value_closure_t *value,

 /* As a special-case, a value of NULL for header_desired will force
  * the entire header to be parsed if it is not parsed already. This is
- * used by the _notmuch_message_file_get_headers_end function. */
+ * used by the _notmuch_message_file_get_headers_end function. 
+ * If concat is 'true' then it parses the whole message and
+ * concatenates all instances of the header in question. This is
+ * currently used to get a complete Received: header when analyzing
+ * the path the mail has taken from sender to recipient.
+ */
 const char *
 notmuch_message_file_get_header (notmuch_message_file_t *message,
-const char *header_desired)
+const char *header_desired,
+int concat)
 {
 int contains;
-char *header, *decoded_value;
+char *header, *decoded_value, *header_sofar, *combined_header;
 const char *s, *colon;
-int match;
+int match, newhdr, hdrsofar;
 static int initialized = 0;

 if (! initialized) {
@@ -227,7 +233,7 @@ notmuch_message_file_get_header (notmuch_message_file_t 
*message,

 message->parsing_started = 1;

-if (header_desired == NULL)
+if (concat || header_desired == NULL) 
contains = 0;
 else
contains = g_hash_table_lookup_extended (message->headers,
@@ -237,6 +243,9 @@ notmuch_message_file_get_header (notmuch_message_file_t 
*message,
 if (contains && decoded_value)
return decoded_value;

+if (concat)
+   message->parsing_finished = 0;
+
 if (message->parsing_finished)
return 

improve from-header guessing

2010-04-16 Thread Dirk Hohndel
The following two patches should address most of the concerns raised 
to my previous series. 

The first patch simply adds an interface to obtain a concatenation of
all instances of a specific header from an email.
The second patch uses that in order to get the full Received: headers.
It now looks at Envelope-to: and X-Original-To: headers, then at the
concatenated Received headers for either a "for email at add.res" clause
that matches a configured address or for a " by " clause that matches
the domain of a configured address.

What is still missing is the check if the host from which the mail was
received in this last case had a routable IP address.



Opening the merge window for 0.3

2010-04-16 Thread Carl Worth
I'm officially opening the merge window for the upcoming 0.3 release
(about a week from now).

I know that I want to merge David Edmunson's rewrite of the emacs
interface to be built on top of --format=json and add a ton of features,
(better attachment handling, notmuch-hello, etc.). I think that's more
than enough to justify a new release right there.

But if you have other things you'd like to see in this release, please
send a message to the list, (either as a new message or a reply to a
previous post where the feature/bug-fix was originally proposed). Please
don't send feature requests as replies to this message.

The best merge requests will be things that have existing, tested
patches already.

I know that some of the most desired features right now are folder:
searching and maildir-flag synchronization. Unfortunately, I think both
of these will be postponed until the 0.4 release, but I could be wrong.

-Carl

PS. I still never sent a list of the features which were proposed for
the 0.2 release but postponed. I'll assemble that list soon with my
comments on where each of the features stand.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20100416/86b2c7b6/attachment.pgp>


[Announce] notmuch release 0.2 now available

2010-04-16 Thread Carl Worth
more:

--build --infodir --libexecdir --localstatedir
--disable-maintainer-mode --disable-dependency-tracking

Install emacs client in "make install" rather than requiring a
separate "make install-emacs".

Automatically compute versions numbers between releases.

  This support uses the git-describe notation, so a version such as
  0.1-144-g43cbbfc indicates a version that is 144 commits since the
  0.1 release and is available as git commit "43cbbfc".

Add a new "make test" target to run the test suite and actually verify
its results.

What is notmuch
===
Notmuch is a system for indexing, searching, reading, and tagging
large collections of email messages in maildir or mh format. It uses
the Xapian library to provide fast, full-text search with a convenient
search syntax.

For more about notmuch, see http://notmuchmail.org
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20100416/5bbdfba8/attachment.pgp>


[PATCH] First tests for JSON output and UTF-8 in mail body and subject

2010-04-16 Thread David Edmondson
On Wed, 14 Apr 2010 17:35:44 -0700, Carl Worth  wrote:
> [*] I say "should" because I don't believe we have any actual
> specification of the data coming out of the JSON output yet. One other
> thing that seems odd is the name of "date_unix" in the show output and
> "timestamp" in the search output for what is effectively the same
> field.

The show output is updated to use `timestamp' in
id:1271418469-19031-1-git-send-email-dme at dme.org (just sent).

dme.
-- 
David Edmondson, http://dme.org


[PATCH] json: Replace `date_unix' with `timestamp' in show output

2010-04-16 Thread David Edmondson
Search output was already using `timestamp' for a very similar field,
so follow that.
---
 notmuch-show.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/notmuch-show.c b/notmuch-show.c
index 76873a1..26449fa 100644
--- a/notmuch-show.c
+++ b/notmuch-show.c
@@ -145,7 +145,7 @@ format_message_json (const void *ctx, notmuch_message_t 
*message, unused (int in
 date = notmuch_message_get_date (message);
 relative_date = notmuch_time_relative_date (ctx, date);

-printf ("\"id\": %s, \"match\": %s, \"filename\": %s, \"date_unix\": %ld, 
\"date_relative\": \"%s\", \"tags\": [",
+printf ("\"id\": %s, \"match\": %s, \"filename\": %s, \"timestamp\": %ld, 
\"date_relative\": \"%s\", \"tags\": [",
json_quote_str (ctx_quote, notmuch_message_get_message_id 
(message)),
notmuch_message_get_flag (message, NOTMUCH_MESSAGE_FLAG_MATCH) ? 
"true" : "false",
json_quote_str (ctx_quote, notmuch_message_get_filename (message)),
-- 
1.7.0



notmuchsync --move (was: add a number of new feature ideas to TODO file)

2010-04-16 Thread Sebastian Spaeth
On 2010-04-16, Dirk Hohndel wrote:
> +Thirdparty apps
> +---
> +(not sure this is the best spot to collect requests like this)
> +
> +notmuchsync
> +
> +Add feature to move files in the maildir hierarchy
> +
> + notmuchsync --move "searchstring" "targetfolder"
> + Where searchstring is any valid notmuch search
> +

You can remove that bit from the patch, it is implemented now :-)

notmuchsync --move "querystring" "targetfolder"
(use with --dry-run and -d to preview changes)

once folder: search is implemented you can e.g. simply do:

notmuchsync --move "not tag:inbox and folder:inbox"
/home/spaetz/mail/archive/cur

and make your IMAP web clients (or iphones) happy.

This works right now already:

notmuchsync --move "not tag:inbox" /home/spaetz/mail/archive/cur

but is of course slower (still ok)  as it has to traverse through most
of your mails.

Sebastian


"bouncing" messages

2010-04-16 Thread Jameson Rollins
On Fri, 16 Apr 2010 10:34:53 +0200, Peter Wiersig  wrote:
> On Thu, 15 Apr 2010 17:27:17 -0400, Jameson Rollins  finestructure.net> wrote:
> > Does anyone know how to "bounce" a message,
> 
> pipe the message to "sendmail user at axample.com"
> 
> Well, ok, mutt adds "Resent-*" headers to the bounced message, so there
> it's not unaltered.

Great, thanks so much for the suggestion, Peter.  That's easy enough.

jamie.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20100416/fa824523/attachment.pgp>


"bouncing" messages

2010-04-16 Thread Peter Wiersig
On Thu, 15 Apr 2010 17:27:17 -0400, Jameson Rollins  wrote:
> Does anyone know how to "bounce" a message,

pipe the message to "sendmail user at axample.com"

Well, ok, mutt adds "Resent-*" headers to the bounced message, so there
it's not unaltered.

Peter


[PATCH] First tests for JSON output and UTF-8 in mail body and subject

2010-04-16 Thread Michal Sojka
> But you might actually like that change since it's one you requested in
> your first version of the modular test suite. I'm dropping the annoying
> execute_expecting macro that both runs notmuch and tests the
> output. There's now a much cleaner separation such as:
> 
>   output=$($NOTMUCH search for-something)
>   pass_if_equal "$output" "something was found"

It's definitely better than before. The current implementation of
pass_if_equal has IMHO one drawback - if it compares multiline text and
there is a difference, it is quite hard to see where.

In my tests for maildir synchronization I use this approach:

  notmuch search tag:inbox | filter_output > actual &&
  diff -u - actual <

[notmuch] Bulk message tagging

2010-04-16 Thread Servilio Afre Puentes
On 15 April 2010 21:46, Carl Worth  wrote:
[...]
> We'll probably need to arrange for notmuch to accept search
> specifications on stdin or so.

Or a daemon mode with a pipe or DBus interface.

Servilio


[PATCH] notmuch.c: Shorten version string

2010-04-16 Thread Sebastian Spaeth
We previously output "notmuch version 0.1" as response to notmuch --version.
Shorten this to "notmuch 0.1" as we know that we will receive a version
number when we explicitely ask for it.

Signed-off-by: Sebastian Spaeth 
---
 notmuch.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/notmuch.c b/notmuch.c
index dcfda32..0eea5e1 100644
--- a/notmuch.c
+++ b/notmuch.c
@@ -474,7 +474,7 @@ main (int argc, char *argv[])
return notmuch_help_command (NULL, 0, NULL);

 if (STRNCMP_LITERAL (argv[1], "--version") == 0) {
-   printf ("notmuch version " STRINGIFY(NOTMUCH_VERSION) "\n");
+   printf ("notmuch " STRINGIFY(NOTMUCH_VERSION) "\n");
return 0;
 }

-- 
1.7.0.4



[PATCH 3/3] notmuch-tag: don't sort messages before applying tag changes

2010-04-16 Thread Sebastian Spaeth
It's not neccessary to sort the results before we apply tags. Xapian
contributor Olly Betts says that savings might be bigger with a cold
file cache and (as unsorted implies really sorted by document id) a better
cache locality when applying tags to messages.

Signed-off-by: Sebastian Spaeth 
---
 notmuch-tag.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/notmuch-tag.c b/notmuch-tag.c
index 8b6f7dc..fd54bc7 100644
--- a/notmuch-tag.c
+++ b/notmuch-tag.c
@@ -107,6 +107,9 @@ notmuch_tag_command (void *ctx, unused (int argc), unused 
(char *argv[]))
return 1;
 }

+/* tagging is not interested in any special sort order */
+notmuch_query_set_sort (query, NOTMUCH_SORT_UNSORTED);
+
 for (messages = notmuch_query_search_messages (query);
 notmuch_messages_valid (messages) && !interrupted;
 notmuch_messages_move_to_next (messages))
-- 
1.7.0.4



[PATCH 2/3] notmuch-search: Introduce --sort=unsorted

2010-04-16 Thread Sebastian Spaeth
In some cases, we might not be interested in any special sort order, so
this introduces a --sort=unsorted command line option together with its
documentation.

Signed-off-by: Sebastian Spaeth 
---
 notmuch-search.c |2 ++
 notmuch.1|   10 ++
 notmuch.c|7 ---
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/notmuch-search.c b/notmuch-search.c
index 4e3514b..854a9ae 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -217,6 +217,8 @@ notmuch_search_command (void *ctx, int argc, char *argv[])
sort = NOTMUCH_SORT_OLDEST_FIRST;
} else if (strcmp (opt, "newest-first") == 0) {
sort = NOTMUCH_SORT_NEWEST_FIRST;
+   } else if (strcmp (opt, "unsorted") == 0) {
+   sort = NOTMUCH_SORT_UNSORTED;
} else {
fprintf (stderr, "Invalid value for --sort: %s\n", opt);
return 1;
diff --git a/notmuch.1 b/notmuch.1
index 86830f4..6d4beaf 100644
--- a/notmuch.1
+++ b/notmuch.1
@@ -152,12 +152,14 @@ Presents the results in either JSON or plain-text 
(default).
 .RE
 .RS 4
 .TP 4
-.BR \-\-sort= ( newest\-first | oldest\-first )
+.BR \-\-sort= ( newest\-first | oldest\-first | unsorted)

 This option can be used to present results in either chronological order
-.RB ( oldest\-first )
-or reverse chronological order
-.RB ( newest\-first ).
+.RB ( oldest\-first ),
+reverse chronological order
+.RB ( newest\-first )
+or without any defined sort order
+.RB ( unsorted ).

 Note: The thread order will be distinct between these two options
 (beyond being simply reversed). When sorting by
diff --git a/notmuch.c b/notmuch.c
index dcfda32..e31dd88 100644
--- a/notmuch.c
+++ b/notmuch.c
@@ -165,11 +165,12 @@ command_t commands[] = {
   "\t\tPresents the results in either JSON or\n"
   "\t\tplain-text (default)\n"
   "\n"
-  "\t--sort=(newest-first|oldest-first)\n"
+  "\t--sort=(newest-first|oldest-first|unsorted)\n"
   "\n"
   "\t\tPresent results in either chronological order\n"
-  "\t\t(oldest-first) or reverse chronological order\n"
-  "\t\t(newest-first), which is the default.\n"
+  "\t\t(oldest-first),reverse chronological order\n"
+  "\t\t(newest-first), which is the default or\n"
+  "\t\t(unsorted) without any special sort order.\n"
   "\n"
   "\tSee \"notmuch help search-terms\" for details of the search\n"
   "\tterms syntax." },
-- 
1.7.0.4



[PATCH 1/3] query.cc: allow to return query results unsorted

2010-04-16 Thread Sebastian Spaeth
Previously, we always sorted the returned results by some string value,
(newest-to-oldest by default), however in some cases (as when applying
tags to a search result) we are not interested in any special order.

This introduces a NOTMUCH_SORT_UNSORTED value that does just that. It is
not used at the moment anywhere in the code.

Signed-off-by: Sebastian Spaeth 
---
 lib/notmuch.h |3 ++-
 lib/query.cc  |2 ++
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/lib/notmuch.h b/lib/notmuch.h
index a7e66dd..bae48a6 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -346,7 +346,8 @@ notmuch_query_create (notmuch_database_t *database,
 typedef enum {
 NOTMUCH_SORT_OLDEST_FIRST,
 NOTMUCH_SORT_NEWEST_FIRST,
-NOTMUCH_SORT_MESSAGE_ID
+NOTMUCH_SORT_MESSAGE_ID,
+NOTMUCH_SORT_UNSORTED
 } notmuch_sort_t;

 /* Specify the sorting desired for this query. */
diff --git a/lib/query.cc b/lib/query.cc
index 10f8dc8..4148f9b 100644
--- a/lib/query.cc
+++ b/lib/query.cc
@@ -148,6 +148,8 @@ notmuch_query_search_messages (notmuch_query_t *query)
case NOTMUCH_SORT_MESSAGE_ID:
enquire.set_sort_by_value (NOTMUCH_VALUE_MESSAGE_ID, FALSE);
break;
+case NOTMUCH_SORT_UNSORTED:
+   break;
}

 #if DEBUG_QUERY
-- 
1.7.0.4



[PATCH] allow to not sort the search results

2010-04-16 Thread Sebastian Spaeth
On 2010-04-15, Olly Betts wrote:

> > I would be happy to have it called --sort=relevance too, the unsorted
> > points out potential performance improvements a bit better, IMHO
> > (although they seem to be really small with a warm cache).
> 
> When using the results of a search to add/remove tags, there's likely to be
> an additional win from --sort=unsorted as documents will now be processed
> in docid order which will tend to have a more cache friendly locality of
> access.

Olly was right in that even for "notmuch tag" we were sorting the
results by date before applying tag changes. I have slightly reworked my
patch to have notmuch tag avoid doing that. I also split up the patch in
3 patches that do one thing each.

The patches do:
1: Introduce NOTMUCH_SORT_UNSORTED
2: Introduce notmuch search --sort=unsorted
3: Make notmuch tag not sort results by date

#2 is the one I am least sure about, I don't know if there is a use case
for notmuch search returning unsorted results. But 1 & 3 are useful at
least.

> Also, sorting by relevance requires more calculations and may require fetching
> additional data (document length for example).
> 
> So I think it would make sense for --sort=relevance and --sort=unsorted to be
> separate options.

Now I am a bit confused. The API docs state that sort_by_relevance is
the default. So by skipping any sort_by_value() will that incur the additional
calculations (with our BoolWeight set?). All I want is the fasted way
to return a searched set of docs :-).

Patches 1-3 follow as reply to this one
Sebastian


[PATCH] allow to not sort the search results

2010-04-16 Thread Olly Betts
On Fri, Apr 16, 2010 at 08:37:04AM +0200, Sebastian Spaeth wrote:
> On 2010-04-15, Olly Betts wrote:
> > Also, sorting by relevance requires more calculations and may require
> > fetching additional data (document length for example).
> > 
> > So I think it would make sense for --sort=relevance and --sort=unsorted to
> > be separate options.
> 
> Now I am a bit confused. The API docs state that sort_by_relevance is
> the default. So by skipping any sort_by_value() will that incur the additional
> calculations (with our BoolWeight set?). All I want is the fasted way
> to return a searched set of docs :-).

Yes, sort_by_relevance() is the default.  But if you set BoolWeight as the
weighting scheme then the relevance is simply zero, and Xapian doesn't have
to fetch any statistics and calculate a score from them.  When documents
have exactly equal relevance weight, then the docid order is used.  So
although sort_by_relevance() is technically still on with BoolWeight, by
"sorting by relevance" I wasn't talking about this case.

So --sort=unsorted and --sort=relevance would only differ in code by the former
setting BoolWeight and the latter not.

Cheers,
Olly


[notmuch] Bulk message tagging

2010-04-16 Thread Jesse Rosenthal

On Thu, 15 Apr 2010 18:46:56 -0700, Carl Worth  wrote:
> On Thu, 15 Apr 2010 16:04:38 -0400, Jesse Rosenthal  
> wrote:
> > the region command only executes one "notmuch tag" command over
> > "id:X or id:Y or id:Z or ...".
>
> ...this operation is all set up to run into "argument list too long"
> errors.

I've never run into this error. Is there a specific length that triggers
it? If so, we could chunk the tagging command. Or does the max length
depend on the machine and system?



[PATCH] notmuch.c: Shorten version string

2010-04-16 Thread Carl Worth
On Fri, 16 Apr 2010 09:06:02 +0200, Sebastian Spaeth  
wrote:
> We previously output "notmuch version 0.1" as response to notmuch --version.
> Shorten this to "notmuch 0.1" as we know that we will receive a version
> number when we explicitely ask for it.

Thanks for the reminder. Pushed.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20100416/16b4ac6a/attachment.pgp>


Re: [PATCH] allow to not sort the search results

2010-04-16 Thread Sebastian Spaeth
On 2010-04-15, Olly Betts wrote:
 
  I would be happy to have it called --sort=relevance too, the unsorted
  points out potential performance improvements a bit better, IMHO
  (although they seem to be really small with a warm cache).
 
 When using the results of a search to add/remove tags, there's likely to be
 an additional win from --sort=unsorted as documents will now be processed
 in docid order which will tend to have a more cache friendly locality of
 access.

Olly was right in that even for notmuch tag we were sorting the
results by date before applying tag changes. I have slightly reworked my
patch to have notmuch tag avoid doing that. I also split up the patch in
3 patches that do one thing each.

The patches do:
1: Introduce NOTMUCH_SORT_UNSORTED
2: Introduce notmuch search --sort=unsorted
3: Make notmuch tag not sort results by date

#2 is the one I am least sure about, I don't know if there is a use case
for notmuch search returning unsorted results. But 1  3 are useful at
least.
 
 Also, sorting by relevance requires more calculations and may require fetching
 additional data (document length for example).
 
 So I think it would make sense for --sort=relevance and --sort=unsorted to be
 separate options.

Now I am a bit confused. The API docs state that sort_by_relevance is
the default. So by skipping any sort_by_value() will that incur the additional
calculations (with our BoolWeight set?). All I want is the fasted way
to return a searched set of docs :-).

Patches 1-3 follow as reply to this one
Sebastian
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 2/3] notmuch-search: Introduce --sort=unsorted

2010-04-16 Thread Sebastian Spaeth
In some cases, we might not be interested in any special sort order, so
this introduces a --sort=unsorted command line option together with its
documentation.

Signed-off-by: Sebastian Spaeth sebast...@sspaeth.de
---
 notmuch-search.c |2 ++
 notmuch.1|   10 ++
 notmuch.c|7 ---
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/notmuch-search.c b/notmuch-search.c
index 4e3514b..854a9ae 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -217,6 +217,8 @@ notmuch_search_command (void *ctx, int argc, char *argv[])
sort = NOTMUCH_SORT_OLDEST_FIRST;
} else if (strcmp (opt, newest-first) == 0) {
sort = NOTMUCH_SORT_NEWEST_FIRST;
+   } else if (strcmp (opt, unsorted) == 0) {
+   sort = NOTMUCH_SORT_UNSORTED;
} else {
fprintf (stderr, Invalid value for --sort: %s\n, opt);
return 1;
diff --git a/notmuch.1 b/notmuch.1
index 86830f4..6d4beaf 100644
--- a/notmuch.1
+++ b/notmuch.1
@@ -152,12 +152,14 @@ Presents the results in either JSON or plain-text 
(default).
 .RE
 .RS 4
 .TP 4
-.BR \-\-sort= ( newest\-first | oldest\-first )
+.BR \-\-sort= ( newest\-first | oldest\-first | unsorted)
 
 This option can be used to present results in either chronological order
-.RB ( oldest\-first )
-or reverse chronological order
-.RB ( newest\-first ).
+.RB ( oldest\-first ),
+reverse chronological order
+.RB ( newest\-first )
+or without any defined sort order
+.RB ( unsorted ).
 
 Note: The thread order will be distinct between these two options
 (beyond being simply reversed). When sorting by
diff --git a/notmuch.c b/notmuch.c
index dcfda32..e31dd88 100644
--- a/notmuch.c
+++ b/notmuch.c
@@ -165,11 +165,12 @@ command_t commands[] = {
   \t\tPresents the results in either JSON or\n
   \t\tplain-text (default)\n
   \n
-  \t--sort=(newest-first|oldest-first)\n
+  \t--sort=(newest-first|oldest-first|unsorted)\n
   \n
   \t\tPresent results in either chronological order\n
-  \t\t(oldest-first) or reverse chronological order\n
-  \t\t(newest-first), which is the default.\n
+  \t\t(oldest-first),reverse chronological order\n
+  \t\t(newest-first), which is the default or\n
+  \t\t(unsorted) without any special sort order.\n
   \n
   \tSee \notmuch help search-terms\ for details of the search\n
   \tterms syntax. },
-- 
1.7.0.4

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 3/3] notmuch-tag: don't sort messages before applying tag changes

2010-04-16 Thread Sebastian Spaeth
It's not neccessary to sort the results before we apply tags. Xapian
contributor Olly Betts says that savings might be bigger with a cold
file cache and (as unsorted implies really sorted by document id) a better
cache locality when applying tags to messages.

Signed-off-by: Sebastian Spaeth sebast...@sspaeth.de
---
 notmuch-tag.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/notmuch-tag.c b/notmuch-tag.c
index 8b6f7dc..fd54bc7 100644
--- a/notmuch-tag.c
+++ b/notmuch-tag.c
@@ -107,6 +107,9 @@ notmuch_tag_command (void *ctx, unused (int argc), unused 
(char *argv[]))
return 1;
 }
 
+/* tagging is not interested in any special sort order */
+notmuch_query_set_sort (query, NOTMUCH_SORT_UNSORTED);
+
 for (messages = notmuch_query_search_messages (query);
 notmuch_messages_valid (messages)  !interrupted;
 notmuch_messages_move_to_next (messages))
-- 
1.7.0.4

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] allow to not sort the search results

2010-04-16 Thread Olly Betts
On Fri, Apr 16, 2010 at 08:37:04AM +0200, Sebastian Spaeth wrote:
 On 2010-04-15, Olly Betts wrote:
  Also, sorting by relevance requires more calculations and may require
  fetching additional data (document length for example).
  
  So I think it would make sense for --sort=relevance and --sort=unsorted to
  be separate options.
 
 Now I am a bit confused. The API docs state that sort_by_relevance is
 the default. So by skipping any sort_by_value() will that incur the additional
 calculations (with our BoolWeight set?). All I want is the fasted way
 to return a searched set of docs :-).

Yes, sort_by_relevance() is the default.  But if you set BoolWeight as the
weighting scheme then the relevance is simply zero, and Xapian doesn't have
to fetch any statistics and calculate a score from them.  When documents
have exactly equal relevance weight, then the docid order is used.  So
although sort_by_relevance() is technically still on with BoolWeight, by
sorting by relevance I wasn't talking about this case.

So --sort=unsorted and --sort=relevance would only differ in code by the former
setting BoolWeight and the latter not.

Cheers,
Olly
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] notmuch.c: Shorten version string

2010-04-16 Thread Sebastian Spaeth
We previously output notmuch version 0.1 as response to notmuch --version.
Shorten this to notmuch 0.1 as we know that we will receive a version
number when we explicitely ask for it.

Signed-off-by: Sebastian Spaeth sebast...@sspaeth.de
---
 notmuch.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/notmuch.c b/notmuch.c
index dcfda32..0eea5e1 100644
--- a/notmuch.c
+++ b/notmuch.c
@@ -474,7 +474,7 @@ main (int argc, char *argv[])
return notmuch_help_command (NULL, 0, NULL);
 
 if (STRNCMP_LITERAL (argv[1], --version) == 0) {
-   printf (notmuch version  STRINGIFY(NOTMUCH_VERSION) \n);
+   printf (notmuch  STRINGIFY(NOTMUCH_VERSION) \n);
return 0;
 }
 
-- 
1.7.0.4

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] First tests for JSON output and UTF-8 in mail body and subject

2010-04-16 Thread Michal Sojka
 But you might actually like that change since it's one you requested in
 your first version of the modular test suite. I'm dropping the annoying
 execute_expecting macro that both runs notmuch and tests the
 output. There's now a much cleaner separation such as:
 
   output=$($NOTMUCH search for-something)
   pass_if_equal $output something was found

It's definitely better than before. The current implementation of
pass_if_equal has IMHO one drawback - if it compares multiline text and
there is a difference, it is quite hard to see where.

In my tests for maildir synchronization I use this approach:

  notmuch search tag:inbox | filter_output  actual 
  diff -u - actual EOF
  thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; test message 3 (inbox)
  EOF

Thanks to the usee of diff, I immediately see only the differences.

-Michal
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


notmuchsync --move (was: add a number of new feature ideas to TODO file)

2010-04-16 Thread Sebastian Spaeth
On 2010-04-16, Dirk Hohndel wrote:
 +Thirdparty apps
 +---
 +(not sure this is the best spot to collect requests like this)
 +
 +notmuchsync
 +
 +Add feature to move files in the maildir hierarchy
 +
 + notmuchsync --move searchstring targetfolder
 + Where searchstring is any valid notmuch search
 +

You can remove that bit from the patch, it is implemented now :-)

notmuchsync --move querystring targetfolder
(use with --dry-run and -d to preview changes)

once folder: search is implemented you can e.g. simply do:

notmuchsync --move not tag:inbox and folder:inbox
/home/spaetz/mail/archive/cur

and make your IMAP web clients (or iphones) happy.

This works right now already:

notmuchsync --move not tag:inbox /home/spaetz/mail/archive/cur

but is of course slower (still ok)  as it has to traverse through most
of your mails.

Sebastian
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] Bulk message tagging

2010-04-16 Thread Jesse Rosenthal

On Thu, 15 Apr 2010 18:46:56 -0700, Carl Worth cwo...@cworth.org wrote:
 On Thu, 15 Apr 2010 16:04:38 -0400, Jesse Rosenthal jrosent...@jhu.edu 
 wrote:
  the region command only executes one notmuch tag command over
  id:X or id:Y or id:Z or 

 ...this operation is all set up to run into argument list too long
 errors.

I've never run into this error. Is there a specific length that triggers
it? If so, we could chunk the tagging command. Or does the max length
depend on the machine and system?

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: bouncing messages

2010-04-16 Thread Peter Wiersig
On Thu, 15 Apr 2010 17:27:17 -0400, Jameson Rollins 
jroll...@finestructure.net wrote:
 Does anyone know how to bounce a message,

pipe the message to sendmail u...@axample.com

Well, ok, mutt adds Resent-* headers to the bounced message, so there
it's not unaltered.

Peter
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: bouncing messages

2010-04-16 Thread Jameson Rollins
On Fri, 16 Apr 2010 10:34:53 +0200, Peter Wiersig 
fri...@london087.server4you.de wrote:
 On Thu, 15 Apr 2010 17:27:17 -0400, Jameson Rollins 
 jroll...@finestructure.net wrote:
  Does anyone know how to bounce a message,
 
 pipe the message to sendmail u...@axample.com
 
 Well, ok, mutt adds Resent-* headers to the bounced message, so there
 it's not unaltered.

Great, thanks so much for the suggestion, Peter.  That's easy enough.

jamie.


pgpTzC23t4nKT.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


improve from-header guessing

2010-04-16 Thread Dirk Hohndel
The following two patches should address most of the concerns raised 
to my previous series. 

The first patch simply adds an interface to obtain a concatenation of
all instances of a specific header from an email.
The second patch uses that in order to get the full Received: headers.
It now looks at Envelope-to: and X-Original-To: headers, then at the
concatenated Received headers for either a for em...@add.res clause
that matches a configured address or for a  by  clause that matches
the domain of a configured address.

What is still missing is the check if the host from which the mail was
received in this last case had a routable IP address.

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch