Strange match to my query

2011-01-25 Thread Austin Clements
Well-constructed test message.  Xapian's query parser is actually doing the
right thing [1] and this is a bug in the way notmuch indexes address list
headers.  For each address, _notmuch_message_gen_terms resets the term
generator's term position, so your To header indexes with positions as
  c:1 hello:2 com:3 K:1 R:2 world:3 com:4
Thus, the phrase query "hello world" matches hello in position 2 and world
in position 3.  Probably the right thing for notmuch to do is to jump up the
term generator position between each address so phrase queries don't cross
them or span them.

[1] Your to:\'$WORD1@$WORD2\' query didn't work because Xapian doesn't
accept a single quote after a prefix.

On Tue, Jan 25, 2011 at 6:29 PM, Mark Anderson wrote:

> Hi guys, What's up? ("Notmuch")
>
> Apparently matching on email addresses doesn't work the way I hoped.
>
> While debugging why my to:x at y.com <to%3Ax at y.com> search was matching far
> too many
> entries, I whittled it down to this:
>
> WORD1=hello
> WORD2=goodbye
> MSGID=junk$(date +%s)
> TESTDIR=$(notmuch config get database.path)/.tmp/new
> TESTMAIL=$TESTDIR/$MSGID:2,
>
> mkdir -p $TESTDIR
>
> echo Testcase for $WORD1@$WORD2, msgid: $MSGID at junk.com
>
> echo "From: nobody at nobody.com
> To: c@${WORD1}.com, K-R@${WORD2}.com
> Date: Mon, 24 Jan 2011 23:41:34 -0600
> Subject: Error
> Message-ID: <$MSGID at junk.com>
>
> Not empty body.=
>
> " > $TESTMAIL
>
> notmuch new
> notmuch search --output=files to:$WORD1@$WORD2
> notmuch search --output=files to:\"$WORD1@$WORD2\"
>
> Why does that match, but this doesn't?
>
> notmuch search --output=files to:\'$WORD1@$WORD2\'
>
> Apparently single quotes are the only quote for Xapian's parser?
>
> I guess this is a strong vote for the quick integration of the custom
> parser with optimization passes that turn emails into phrases that can't
> match across multiple emails.
>
> This was just an egregious example of notmuch giving me notmuch of what
> I wanted, or actually, far too much of what I didn't want.
>
> Thanks,
> -Mark
>
> ___
> notmuch mailing list
> notmuch at notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch
>
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110125/9247a302/attachment-0001.html>


[PATCH 1/3] new: Do not defer maildir flag synchronization during the first run

2011-01-25 Thread Austin Clements
 * flags immediately, while the message is hot in
> +* disk cache. */
> +   notmuch_message_maildir_flags_to_tags (message);
> +   }
> +   }
>break;
>case NOTMUCH_STATUS_FILE_NOT_EMAIL:
>fprintf (stderr, "Note: Ignoring non-mail file: %s\n",
> --
> 1.7.2.3
>
> ___
> notmuch mailing list
> notmuch at notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch
>
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://notmuchmail.org/pipermail/notmuch/attachments/20110125/434faf0a/attachment.html>


[PATCH] Add --include-duplicates option to a couple of commands.

2011-01-25 Thread Carl Worth
This adds new functionality under the names of:

notmuch search --output=files --include-duplicates
notmuch show --include-duplicates
notmuch show --format=json --include-duplicates

These new commands behave similarly to the existing commands without
the --include-duplicates agument. The difference is that with the new
argument any duplicate mail files will be included in the
output. Here, files are considered duplicates if they contain
identical contents for the Message-Id header, (regardless of any other
differences in the content of the file). Without the
--include-duplicates argument, these commands would emit a single,
arbitrary file in the face of duplicates.

WARNING: This commit is not yet ready to be pushed to the notmuch
repository. There are at least two problems with the commit so far:

1. Nothing has been documented yet.

   Fixing this shouldn't be too hard. It's mostly just taking
   the text from above and shoving it into the
   documentation. I can do this easily enough myself.

2. show --format=json --include-duplicates doesn't work yet

   This is a more serious problem. I believe the JSON output
   with this patch is not correct and will likely break a
   client trying to consume it. It inserts the duplicate
   message into an array next to the existing message. Our
   current JSON schema isn't documented formally that I could
   find, except for a comment in the emacs code that consumes
   it:

A thread is a forest or list of trees. A tree is a two
element list where the first element is a message, and
the second element is a possibly empty forest of
replies.

   I believe this commit breaks the "two-element list"
   expectation. What we would want instead is the duplicate
   message to appear as a peer next to the original message,
   (and then perhaps have replies appear only to one of the
   messages).

My current need for --include-duplicates was recently satisfied, so I
won't likely pursue this further for now. But I wanted to put this
code out rather than losing it.

If someone wants to fix the patch to do the "right thing" with the
JSON output, then that would be great.

ALSO NOTE: I left the
json.expected-output/notmuch-show-thread-format-json-maildir-storage
out of this commit. It has lines in it that are too long to be sent
via git-send-email.
---
 notmuch-search.c   |   30 +-
 notmuch-show.c |   61 +--
 test/basic |2 +-
 test/json  |   33 ++-
 ...-show-thread-include-duplicates-maildir-storage |   94 
 .../notmuch-show-thread-maildir-storage|   47 
 test/search-output |  113 
 7 files changed, 361 insertions(+), 19 deletions(-)
 create mode 100644 
test/json.expected-output/notmuch-show-thread-include-duplicates-maildir-storage
 create mode 100644 
test/json.expected-output/notmuch-show-thread-maildir-storage

diff --git a/notmuch-search.c b/notmuch-search.c
index c628b36..6d032c2 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -247,7 +247,8 @@ static int
 do_search_messages (const void *ctx,
const search_format_t *format,
notmuch_query_t *query,
-   output_t output)
+   output_t output,
+   notmuch_bool_t include_duplicates)
 {
 notmuch_message_t *message;
 notmuch_messages_t *messages;
@@ -269,8 +270,25 @@ do_search_messages (const void *ctx,
fputs (format->item_sep, stdout);

if (output == OUTPUT_FILES) {
-   format->item_id (ctx, "",
-notmuch_message_get_filename (message));
+   if (include_duplicates) {
+   notmuch_filenames_t *filenames;
+   int first_filename = 1;
+
+   for (filenames = notmuch_message_get_filenames (message);
+notmuch_filenames_valid (filenames);
+notmuch_filenames_move_to_next (filenames))
+   {
+   if (! first_filename)
+   fputs (format->item_sep, stdout);
+   first_filename = 0;
+
+   format->item_id (ctx, "",
+notmuch_filenames_get (filenames));
+   }
+   } else {
+   format->item_id (ctx, "",
+notmuch_message_get_filename (message));
+   }
} else { /* output == OUTPUT_MESSAGES */
format->item_id (ctx, "id:",
 notmuch_message_get_message_id (message));
@@ -352,6 +370,7 @@ notmuch_search_command (void 

Strange match to my query

2011-01-25 Thread Mark Anderson
Hi guys, What's up? ("Notmuch")

Apparently matching on email addresses doesn't work the way I hoped.

While debugging why my to:x at y.com search was matching far too many
entries, I whittled it down to this:

WORD1=hello
WORD2=goodbye
MSGID=junk$(date +%s)
TESTDIR=$(notmuch config get database.path)/.tmp/new
TESTMAIL=$TESTDIR/$MSGID:2,

mkdir -p $TESTDIR

echo Testcase for $WORD1@$WORD2, msgid: $MSGID at junk.com

echo "From: nobody at nobody.com
To: c@${WORD1}.com, K-R@${WORD2}.com
Date: Mon, 24 Jan 2011 23:41:34 -0600
Subject: Error
Message-ID: <$MSGID at junk.com>

Not empty body.=

" > $TESTMAIL

notmuch new
notmuch search --output=files to:$WORD1@$WORD2
notmuch search --output=files to:\"$WORD1@$WORD2\"

Why does that match, but this doesn't?

notmuch search --output=files to:\'$WORD1@$WORD2\'

Apparently single quotes are the only quote for Xapian's parser?

I guess this is a strong vote for the quick integration of the custom
parser with optimization passes that turn emails into phrases that can't
match across multiple emails.

This was just an egregious example of notmuch giving me notmuch of what
I wanted, or actually, far too much of what I didn't want.

Thanks,
-Mark



Tag timestamps and synchronization

2011-01-25 Thread Michal Sojka
On Mon, 24 Jan 2011, dm-list-email-notmuch at scs.stanford.edu wrote:
> One of the features I would like to see from notmuch is an easier
> ability to synchronize tags across machines.  At the very least, I
> would need either incremental dump and restore, or some way to
> communicate arbitrary tags to a local imap server that shares
> notmuch's maildir (much as notmuch currently syncs the standard tags),
> so that I synchronize two maildirs with a tool like offlineimap.

[...]

> In the case of dovecot, the arbitrary tag format is very simple.  Each
> maildir has a file called dovecot-keywords mapping numbers 0, 1,
> ... to keywords.  Then mail file names contain lower-case letters for
> the flags they are marked with--0 => a, 1 => b, etc.--allowing up to
> 26 arbitrary tags for each maildir.  One could probably sync to
> dovecot's maildir format relatively easily in a script given
> incremental dump and restore of tags.  Or possibly notmuch could
> natively support dovecot as one of multiple back-end tag storage
> schemes.

Hi David,

here is my idea of solving the problem of synchronizing tags and all
message metadata. The problem, it seems, is that every program uses a
different format for message metadata. Maybe, it would be useful to
define a simple metadata format that could be used by multiple programs
(at least by notmuch, dovecot and maybe mutt) and base the
synchronization on this format. Currently, I'm thinking about a separate
file with the same base name as the message storing message metadata in
the same format as message headers so it could look like:

tag: inbox
tag: notmuch
timestamp: 2011-01-25 10:48:00 GMT
spam: no
...

Then, any program could do whatever it wants with the metadata, e.g.
index them in a database etc.

In the ideal it would work like this: Dovecot would store the metadata
in a file like described above. IMAP protocol would be extended to be
able to send such metadata corresponding to a particular UID.
offlineimap would be able to retrieve (and synchronize) the metadata
files with the IMAP server and notmuch would index the metadata
similarly as it index messages and would modify them when it change
tags.

What do you (and others) think? Is this too wild? Too longterm?

Cheers
Michal



Tag timestamps and synchronization

2011-01-25 Thread Tim Stoakes
dm-list-email-notmuch at scs.stanford.edu(dm-list-email-notmuch at 
scs.stanford.edu)@240111-11:10:
> One of the features I would like to see from notmuch is an easier
> ability to synchronize tags across machines.  At the very least, I
> would need either incremental dump and restore, or some way to
> communicate arbitrary tags to a local imap server that shares
> notmuch's maildir (much as notmuch currently syncs the standard tags),
> so that I synchronize two maildirs with a tool like offlineimap.

David,

I do something like this by using some shell scripts with formail, to
'store' notmuch tags into the X-Label headers of the individual mails.
Offlineimap then syncs these headers. If I need the tags to become
notmuch-ified on the target, I just scan all the mail's X-Label headers.

(Actually it's better than this, since I use maildrop to set notmuch
tags with notmuch-deliver, *and* set X-Label headers to the same thing,
at mail delivery time. Then I use keybindings and shell scripts in mutt
such that whenever I retag a message, it is pushed to both notmuch and
X-Label.)

I'm happy to share this hack glue if it would help.

This is not great for a few reasons - there are a ton of moving parts,
and some double-work. If notmuch could index X-Label headers (a coming
feature I hear) then this would be much cleaner.

This is just one way of doing it, that works for me...

Tim

> As Carl pointed out to me in private email, there has been some
> previous discussion in the following thread:
> 
>   notmuch show id:87hbfnmiux.fsf at yoom.home.cworth.org
> 
> Based on that thread, there seems to be some desire for notmuch to
> keep track of a per-message timestamp when the flags were last
> updated.  This would allow much easier expiration for people who want
> the deleted tag.  It would also allow incremental dump and restore of
> tags, which is exactly what I need to sync tags across servers with
> reasonable amounts of bandwidth.
> 
> Metadata timestamps are one of those things that probably have a lot
> of different applications, so since Carl is considering a new database
> format for the next release anyway, perhaps it also makes sense to add
> a metadata change time for each messages.
> 
> The timestamp would be included in "dump" output, and you could
> request a dump of changes since a particular time.  On restore, you
> might have several options:
> 
>   - overwrite: always set the new tags and timestamp in the database
> to the value in the restore data.
> 
>   - update: always set the tags, but update the to the current time.
> 
>   - conditional T: update only if the message metadata has not been
> updated since time T.
> 
> To sync flags, then you just need to keep track of the last time you
> synced with a particular server--call this time T.  Do a dump since
> time T, upload to server, do a conditional restore for time T on
> server.  Finally do a partial dump from time T on the server and an
> overwrite import on the client.  (This policy makes changes on the
> server always override conflicting ones on the client--perhaps people
> want other policies, like union of the tags, etc.)
> 
> 
> Second, there seems to be some desire in that thread to sync with IMAP
> flags.  This would be particularly great, but the easies way to do it
> is probably *not* to try to implement IMAP, but rather to use an
> existing IMAP server and just modify the maildir so that the IMAP
> server will pick up the flags.
> 
> In the case of dovecot, the arbitrary tag format is very simple.  Each
> maildir has a file called dovecot-keywords mapping numbers 0, 1,
> ... to keywords.  Then mail file names contain lower-case letters for
> the flags they are marked with--0 => a, 1 => b, etc.--allowing up to
> 26 arbitrary tags for each maildir.  One could probably sync to
> dovecot's maildir format relatively easily in a script given
> incremental dump and restore of tags.  Or possibly notmuch could
> natively support dovecot as one of multiple back-end tag storage
> schemes.
> 
> Having a static tag mapping in the .notmuch-config file would be much
> better than hard-coding flag2tag.  However, I'm not sure it's
> sufficient.  The reason is that if you ever completely delete a tag
> (e.g., you have "todo", and "meeting" tags and periodically have no
> messages in either categories in a given mail folder), then an IMAP
> server like dovecot might end up re-allocating the letters
> corresponding to those tags in a different order.  Also, at least for
> dovecot, the flag mappings are per-folder, which you kind of want
> since you are limited to 26 non-standard tags, so global values might
> not work.
> 
> I'm curious to hear people's thoughts/reactions?
> 
> David

-- 
Tim Stoakes


[PATCH 1/4] Import date/time parser from GNU coreutils

2011-01-25 Thread Michal Sojka
On Mon, 24 Jan 2011, Jameson Rollins wrote:
> On Sun, 23 Jan 2011 12:47:24 +0100, Michal Sojka  
> wrote:
> > This function have quite a lot dependencies. We may reduce them later it
> > it is a problem.
> > ---
> >  lib/c-ctype.c  |  398 +++
> >  lib/c-ctype.h  |  297 +
> >  lib/getdate.c  | 3497 
> > 
> >  lib/getdate.h  |   22 +
> >  lib/getdate.y  | 1572 +
> >  lib/gettime.c  |   48 +
> >  lib/intprops.h |   83 ++
> >  lib/timespec.h |   39 +
> >  lib/verify.h   |  140 +++
> >  9 files changed, 6096 insertions(+), 0 deletions(-)
> >  create mode 100644 lib/c-ctype.c
> >  create mode 100644 lib/c-ctype.h
> >  create mode 100644 lib/getdate.c
> >  create mode 100644 lib/getdate.h
> >  create mode 100644 lib/getdate.y
> >  create mode 100644 lib/gettime.c
> >  create mode 100644 lib/gettime.h
> >  create mode 100644 lib/intprops.h
> >  create mode 100644 lib/timespec.h
> >  create mode 100644 lib/verify.h
> 
> Hi, Michal.  I don't fully understand what's going on here, but it seems
> like you're embedding code copies from somewhere else.  If that's the
> case, is there a reason that we would need to do that, rather than just
> linking against an external library?

Well, if the embedded code is available in a library, it would be
definitely better to just use the library. But the above code is
statically linked to things like `date` command and is not available
separately.

Most of the dependencies could be eliminated since they usually
replicate functionality which is available in modern C library and are
there only for compatibility reasons.

On the other hand, if anybody knows a better date parser, perhaps in a
separate library, let me know.

-Michal


Re: [PATCH 1/4] Import date/time parser from GNU coreutils

2011-01-25 Thread Michal Sojka
On Mon, 24 Jan 2011, Jameson Rollins wrote:
 On Sun, 23 Jan 2011 12:47:24 +0100, Michal Sojka sojk...@fel.cvut.cz wrote:
  This function have quite a lot dependencies. We may reduce them later it
  it is a problem.
  ---
   lib/c-ctype.c  |  398 +++
   lib/c-ctype.h  |  297 +
   lib/getdate.c  | 3497 
  
   lib/getdate.h  |   22 +
   lib/getdate.y  | 1572 +
   lib/gettime.c  |   48 +
   lib/intprops.h |   83 ++
   lib/timespec.h |   39 +
   lib/verify.h   |  140 +++
   9 files changed, 6096 insertions(+), 0 deletions(-)
   create mode 100644 lib/c-ctype.c
   create mode 100644 lib/c-ctype.h
   create mode 100644 lib/getdate.c
   create mode 100644 lib/getdate.h
   create mode 100644 lib/getdate.y
   create mode 100644 lib/gettime.c
   create mode 100644 lib/gettime.h
   create mode 100644 lib/intprops.h
   create mode 100644 lib/timespec.h
   create mode 100644 lib/verify.h
 
 Hi, Michal.  I don't fully understand what's going on here, but it seems
 like you're embedding code copies from somewhere else.  If that's the
 case, is there a reason that we would need to do that, rather than just
 linking against an external library?

Well, if the embedded code is available in a library, it would be
definitely better to just use the library. But the above code is
statically linked to things like `date` command and is not available
separately.

Most of the dependencies could be eliminated since they usually
replicate functionality which is available in modern C library and are
there only for compatibility reasons.

On the other hand, if anybody knows a better date parser, perhaps in a
separate library, let me know.

-Michal
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Tag timestamps and synchronization

2011-01-25 Thread Michal Sojka
On Mon, 24 Jan 2011, dm-list-email-notm...@scs.stanford.edu wrote:
 One of the features I would like to see from notmuch is an easier
 ability to synchronize tags across machines.  At the very least, I
 would need either incremental dump and restore, or some way to
 communicate arbitrary tags to a local imap server that shares
 notmuch's maildir (much as notmuch currently syncs the standard tags),
 so that I synchronize two maildirs with a tool like offlineimap.

[...]

 In the case of dovecot, the arbitrary tag format is very simple.  Each
 maildir has a file called dovecot-keywords mapping numbers 0, 1,
 ... to keywords.  Then mail file names contain lower-case letters for
 the flags they are marked with--0 = a, 1 = b, etc.--allowing up to
 26 arbitrary tags for each maildir.  One could probably sync to
 dovecot's maildir format relatively easily in a script given
 incremental dump and restore of tags.  Or possibly notmuch could
 natively support dovecot as one of multiple back-end tag storage
 schemes.

Hi David,

here is my idea of solving the problem of synchronizing tags and all
message metadata. The problem, it seems, is that every program uses a
different format for message metadata. Maybe, it would be useful to
define a simple metadata format that could be used by multiple programs
(at least by notmuch, dovecot and maybe mutt) and base the
synchronization on this format. Currently, I'm thinking about a separate
file with the same base name as the message storing message metadata in
the same format as message headers so it could look like:

tag: inbox
tag: notmuch
timestamp: 2011-01-25 10:48:00 GMT
spam: no
...

Then, any program could do whatever it wants with the metadata, e.g.
index them in a database etc.

In the ideal it would work like this: Dovecot would store the metadata
in a file like described above. IMAP protocol would be extended to be
able to send such metadata corresponding to a particular UID.
offlineimap would be able to retrieve (and synchronize) the metadata
files with the IMAP server and notmuch would index the metadata
similarly as it index messages and would modify them when it change
tags.

What do you (and others) think? Is this too wild? Too longterm?

Cheers
Michal
 
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Strange match to my query

2011-01-25 Thread Carl Worth
On Tue, 25 Jan 2011 19:51:14 -0500, Austin Clements amdra...@gmail.com wrote:
 Well-constructed test message.  Xapian's query parser is actually doing the
 right thing [1] and this is a bug in the way notmuch indexes address list
 headers.  For each address, _notmuch_message_gen_terms resets the term
 generator's term position, so your To header indexes with positions as
   c:1 hello:2 com:3 K:1 R:2 world:3 com:4

Thanks, Austin!

I was actually giving a demo of notmuch to someone yesterday who was
really interested in the details of how Xapian actually stores things.

I dug around a bit with delve and we were both really surprised by the
position results we were seeing. Neither of us could make any sense of
them at all.

And thanks, Mark for the bug report and the nice test case. I'll add
this to the test suite, and fix it. And that will give us yet one more
reason for all of us to rebuild our databases after the upcoming
release.

-Carl

-- 
carl.d.wo...@intel.com


pgp04iN9DjrgH.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


fix notmuch.vim NM_compuse_get_user_email() (Patch)

2011-01-25 Thread Peter John Hartman
Here's a bitty patch to the vim plugin; it now calculates the primary email
of the user based on a call to notmuch config.  There's still a lot of work
that needs to get done on notmuch.vim, e.g., the ability to have multiple
emails/accounts.

Best, Peter

--- notmuch.vim 2010-11-18 17:26:14.0 -0500
+++ notmuch.vim.mine2011-01-25 23:57:50.0 -0500
@@ -18,7 +18,8 @@
  along with Notmuch.  If not, see http://www.gnu.org/licenses/.
 
  Authors: Bart Trojanowski b...@jukie.net
-
+ Contributors: Peter Hartman peterjohnhart...@gmail.com
+
  --- configuration defaults {{{1
 
 let s:notmuch_defaults = {
@@ -1024,11 +1025,9 @@
  --- --- compose screen helper functions {{{2
 
 function! s:NM_compose_get_user_email()
-let name = substitute(system('id -u -n'), '\v(^\s*|\s*$|\n)', '', 'g')
-let fqdn = substitute(system('hostname -f'), '\v(^\s*|\s*$|\n)', '', 
'g')
-
- TODO: do this properly
-return name . '@' . fqdn
+ TODO: do this properly (still), i.e., allow for multiple email 
accounts
+let email = substitute(system('notmuch config get 
user.primary_email'), '\v(^\s*|\s*$|\n)', '', 'g')
+   return email
 endfunction
 
 function! s:NM_compose_find_line_match(start, pattern, failure)

-- 
sic dicit magister P
PhD Candidate
Collaborative Programme in Ancient and Medieval Philosophy
University of Toronto
http://individual.utoronto.ca/peterjh
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch