Re: [Paul Wise] Bug#843127: notmuch: race condition in `notmuch new`?

2016-11-12 Thread Austin Clements
Quoth David Bremner on Nov 04 at  1:26 pm:
> 
> Paul Wise wrote:
> 
> > Last night I got this errorĀ from my `notmuch new --quiet` cron job. The
> > file that the error message complains about is now in the cur directory
> > of the maildir at the following path.
> >
> > /path/to/mail/cur/1478190211.H80553P18378.chianamo:2,
> >
> > I wonder if this some kind of race condition in `notmuch new` processing.
> > Perhaps it should be using inotify to find out about file movements?
> >
> > Unexpected error with file 
> > /path/to/mail/new/1478190211.H80553P18378.chianamo
> > add_file: Something went wrong trying to read or write a file
> > Error opening /path/to/mail/new/1478190211.H80553P18378.chianamo: No such 
> > file or directory
> > Note: A fatal error was encountered: Something went wrong trying to read or 
> > write a file
> 
> I agree it looks like a race condition. inotify sounds a bit
> overcomplicated and perhaps non-portable? It should probably just
> tolerate disappearing files better, consider that a warning.

Inotify really *is* the solution. This is a symptom of a much bigger
problem: scandir makes no guarantees in the presence of concurrent
directory modification. If you delete or rename a file while notmuch
new is running, it may think *completely unrelated* files in the same
directory were also deleted. Even if scandir were atomic, if you move
a mail from one directory to another between notmuch scanning the
destination directory and notmuch scanning the source directory, it'll
think the mail has been deleted and potentially remove it from the DB.

The "recommended" solution is to scandir is to start an inotify watch
before the scan and redo (or update) the scan if there are any
changes. For notmuch, it would make sense to extend that to watching
all directories to make sure it can catch renames during the scan.

A possible alternative, though I haven't worked out the details, might
be to keep a close eye on the directory mtimes. Roughly, for each
directory, check the mtime before scanning, wait if necessary until
the mtime != the current time, do the scan and process the files
optimistically. Once all directories are processed, re-check all of
the mtimes and if any have changed, do something like starting over
but hopefully more intelligent.

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: searching: '*analysis' vs 'reanalysis'

2016-06-06 Thread Austin Clements
Quoth Gaute Hope on Jun 06 at  8:08 pm:
> Austin Clements writes on juni 6, 2016 21:20:
> >
> >The experiment was specifically for regexp matching subject, but it should
> >work for any header we store a literal copy of in the database.
> 
> Does it work for terms in the body of the message?

No. It's not impossible that it could be made to work, but it might be
slow and unintuitive. It would have to iterate over all of the terms
in the database and see which ones match the regexp. These are
available, but I don't know how much time it takes to iterate over all
of them. It might be okay. It might not.

It could also expand to a very large query if the regexp matches many
terms, akin to how searching for "a*" can be quite expensive.

And it might not match what you expect. It could only match individual
terms, so a regexp containing any punctuation (including but not
limited to a space) simply wouldn't match anything.
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: searching: '*analysis' vs 'reanalysis'

2016-06-06 Thread Austin Clements
On Mon, Jun 6, 2016 at 1:29 PM, David Bremner  wrote:

> Sebastian Fischmeister  writes:
>
> >
> > I ran into this problem before as well. Storage is cheap. Notmuch could
> > index all emails with reversed text to get around some of this
> > problem. It doesn't solve the problem of *analysis*, but it's still an
> > improvement.
>
> It would probably be more useful to have brute force regexp searches on
> headers.  Austin did some experiments that sounded promising, where you
> basically postprocess the result of a xapian query with a regexp. OTOH,
> I don't know what kept him from proposing this for mainline. If it was
> just parser issues, those are probably more or less solved now, at least
> for people using xapian 1.3+
>

The experiment was specifically for regexp matching subject, but it should
work for any header we store a literal copy of in the database. The code is
here, though in its current form it builds on my custom query parser:
https://github.com/aclements/notmuch/commit/ce41b29aba4d9b84e2f1eb6ed8df67065196c960.
Based on my understanding of Xapian 1.3+ field processors, these days it
should be quite easy to hook the PostingSource in that commit into the
Xapian QueryProcessor.
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists

2016-04-19 Thread Austin Clements
On Mon, 11 Apr 2016, Daniel Kahn Gillmor  wrote:
> On Sun 2016-04-10 20:33:02 -0400, David Bremner  wrote:
>> Daniel Kahn Gillmor  writes:
>>
>>> To fully complete the ghost-on-removal-when-shared-thread-exists
>>> proposal, we need to clear all ghost messages when the last active
>>> message is removed from a thread.
>>
>> For me this patch breaks T530 as follows
>>
>> T530-upgrade: Testing database upgrade
>>  FAIL   ghost message retains thread ID
>>  --- T530-upgrade.13.expected2016-04-11 00:25:19.482196274 +
>>  +++ T530-upgrade.13.output  2016-04-11 00:25:19.482196274 +
>>  @@ -1 +1 @@
>>  -thread:001d
>>  +thread:0011
>> No new mail.
>> No new mail. Removed 1 message.
>>
>> I pushed my rebased version of the patches to
>>
>>   
>> https://git.notmuchmail.org/git?p=notmuch;a=shortlog;h=refs/heads/thread-fix
>>
>> in case the problem is some mistake on my part.
>
> After having done "make download-test-databases", I can confirm that
> this is happening for me as well: it's not an issue with bremner's
> config.
>
> however, i think this particular test is wrong, and should probably be
> removed.
>
> For review, here's the final test:
>
> --
> # Ghost messages are difficult to test since they're nearly invisible.
> # However, if the upgrade works correctly, the ghost message should
> # retain the right thread ID even if all of the original messages in
> # the thread are deleted.  That's what we test.  This won't detect if
> # the upgrade just plain didn't happen, but it should detect if
> # something went wrong.
> test_begin_subtest "ghost message retains thread ID"
> # Upgrade database
> notmuch new
> # Get thread ID of real message
> thread=$(notmuch search --output=threads id:4efc743a.3060...@april.org)
> # Delete all real messages in that thread
> rm $(notmuch search --output=files $thread)
> notmuch new
> # "Deliver" ghost message
> add_message '[subject]=Ghost' '[id]=4efc3931.6030...@april.org'
> # If the ghost upgrade worked, the new message should be attached to
> # the existing thread ID.
> nthread=$(notmuch search --output=threads id:4efc3931.6030...@april.org)
> test_expect_equal "$thread" "$nthread"
> -
>
> I don't think this reasoning is sensible.  If the entire thread is
> deleted, and a new message comes in, it should *not* get the same mesage
> ID.  ghosts should only exist in the database when other messages point
> to them.
>
> So i'd be fine with killing this entire last test, unless someone can
> propose a good reason to keep it.

I agree that it's fine to remove this test. From what I recall, it
wasn't intended to test that ghost messages retained their thread IDs;
this was just the implementation property it exploited to test that
ghosts were working.

I haven't looked through this series, but it would be nice if there was
still some tests that ghosts were working across upgrade.
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 2/6] lib: Add per-message last modification tracking

2015-08-08 Thread Austin Clements
Hi David. I haven't had a chance to look back at the original code, but
your follow-up expanded comment agrees with how I remember this code
working.
On Aug 7, 2015 4:41 PM, "David Bremner"  wrote:

> Daniel Schoepe  writes:
>
>
> > On Fri, 05 Jun 2015 19:28 +0200, David Bremner wrote:
> >> +/* Prior to NOTMUCH_FEATURE_LAST_MOD, messages did not
> >> + * track modification revisions.  Give all messages a
> >> + * revision of 1.
> >> + */
> >> +if (new_features & NOTMUCH_FEATURE_LAST_MOD)
> >> +_notmuch_message_upgrade_last_mod (message);
> >> [..]
> >> +/* Upgrade a message to support NOTMUCH_FEATURE_LAST_MOD.  The caller
> >> + * must call _notmuch_message_sync. */
> >> +void
> >> +_notmuch_message_upgrade_last_mod (notmuch_message_t *message)
> >> +{
> >> +/* _notmuch_message_sync will update the last modification
> >> + * revision; we just have to ask it to. */
> >> +message->modified = TRUE;
> >> +}
> >> +
> >
> > The comment in the first part says that message without LAST_MOD get a
> > revision of 1, but as far as I can tell, _notmuch_message_sync will
> > actually put the next revision number on the message. If this is what's
> > happening, either the comment or the behavior should be changed,
> > depending on what's intended.
>
> I think the behaviour is OK, but you're correct the comment is
> wrong. I'll see if Austin has any comment on that. I guess it would be
> harmless to initialize upgraded messages to some low revision number,
> but I don't see the benefit.
>
-- next part --
An HTML attachment was scrubbed...
URL: 



Re: [PATCH 2/6] lib: Add per-message last modification tracking

2015-08-08 Thread Austin Clements
Hi David. I haven't had a chance to look back at the original code, but
your follow-up expanded comment agrees with how I remember this code
working.
On Aug 7, 2015 4:41 PM, David Bremner da...@tethera.net wrote:

 Daniel Schoepe dan...@schoepe.org writes:


  On Fri, 05 Jun 2015 19:28 +0200, David Bremner wrote:
  +/* Prior to NOTMUCH_FEATURE_LAST_MOD, messages did not
  + * track modification revisions.  Give all messages a
  + * revision of 1.
  + */
  +if (new_features  NOTMUCH_FEATURE_LAST_MOD)
  +_notmuch_message_upgrade_last_mod (message);
  [..]
  +/* Upgrade a message to support NOTMUCH_FEATURE_LAST_MOD.  The caller
  + * must call _notmuch_message_sync. */
  +void
  +_notmuch_message_upgrade_last_mod (notmuch_message_t *message)
  +{
  +/* _notmuch_message_sync will update the last modification
  + * revision; we just have to ask it to. */
  +message-modified = TRUE;
  +}
  +
 
  The comment in the first part says that message without LAST_MOD get a
  revision of 1, but as far as I can tell, _notmuch_message_sync will
  actually put the next revision number on the message. If this is what's
  happening, either the comment or the behavior should be changed,
  depending on what's intended.

 I think the behaviour is OK, but you're correct the comment is
 wrong. I'll see if Austin has any comment on that. I guess it would be
 harmless to initialize upgraded messages to some low revision number,
 but I don't see the benefit.

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Understanding the "replied" tag

2015-06-13 Thread Austin Clements
Hi Xu. I may be misunderstanding your email, but it sounds like you want to 
know if a message has *any* reply message. That's not what the replied tag 
indicates. The replied tag indicates that *you* have sent a reply to a message. 
Mechanically, when you hit, say, r to start a reply and then send that message, 
notmuch tags the message you hit r on as "replied". That's the only time 
notmuch automatically sets this tag.

On June 11, 2015 10:25:44 AM PDT, Xu Wang  wrote:
>Dear all,
>
>First, I am extremely excited to be a part of this list now. notmuch
>has really helped me. Thank you go all individuals working to improve
>it and to help others to know how to use it.
>
>I would really like to know if a message has been replied to (e.g.
>using a certain message id). It seems that all I need to do is check
>for the "replied" tag. But often this tag is not there, even when
>there has been a reply (I have confirmed this through the thread
>display and checking the message that replied to the message to make
>sure it indeed has header "replied-to:").
>
>I have looked in mutt, and also I see many situations where there is
>no 'r' flag, especially for emails sent from me.
>
>The following returns true:
>notmuch config get maildir.synchronize_flags
>
>So at least both are giving same answer, but I'm not sure why not
>saying "replied" thread is correct.
>
>What is incorrect for my way of thinking about "replied"? Do I
>misunderstand what it is supposed to do or am I not updating the flags
>correctly?
>
>Kind regards,
>
>Xu
>___
>notmuch mailing list
>notmuch at notmuchmail.org
>http://notmuchmail.org/mailman/listinfo/notmuch



Re: Understanding the replied tag

2015-06-13 Thread Austin Clements
Hi Xu. I may be misunderstanding your email, but it sounds like you want to 
know if a message has *any* reply message. That's not what the replied tag 
indicates. The replied tag indicates that *you* have sent a reply to a message. 
Mechanically, when you hit, say, r to start a reply and then send that message, 
notmuch tags the message you hit r on as replied. That's the only time 
notmuch automatically sets this tag.

On June 11, 2015 10:25:44 AM PDT, Xu Wang xuwang...@gmail.com wrote:
Dear all,

First, I am extremely excited to be a part of this list now. notmuch
has really helped me. Thank you go all individuals working to improve
it and to help others to know how to use it.

I would really like to know if a message has been replied to (e.g.
using a certain message id). It seems that all I need to do is check
for the replied tag. But often this tag is not there, even when
there has been a reply (I have confirmed this through the thread
display and checking the message that replied to the message to make
sure it indeed has header replied-to:MSG-ID).

I have looked in mutt, and also I see many situations where there is
no 'r' flag, especially for emails sent from me.

The following returns true:
notmuch config get maildir.synchronize_flags

So at least both are giving same answer, but I'm not sure why not
saying replied thread is correct.

What is incorrect for my way of thinking about replied? Do I
misunderstand what it is supposed to do or am I not updating the flags
correctly?

Kind regards,

Xu
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


notmuch_thread_get_authors

2015-04-21 Thread Austin Clements
Quoth Ronny Chevalier on Apr 22 at  4:01 am:
> On Wed, Apr 22, 2015 at 3:28 AM, Austin Clements
>  wrote:
> > On Tue, 21 Apr 2015, Ronny Chevalier  wrote:
> >> On Tue, Apr 21, 2015 at 1:35 AM, David Bremner  
> >> wrote:
> >>> Ronny Chevalier  writes:
> >> Austin Clements wrote:
> >>> And I think there's a fairly easy way to do it in C code that will
> >>> also prevent library interface bloat: instead of introducing new
> >>> library APIs to get at this information, just use the existing
> >>> notmuch_thread_get_messages API and construct the matched and
> >>> non-matched lists in the CLI.  Doing it in the CLI wouldn't require
> >>> the library to export yet another string list structure, which is
> >>> always a huge pain (thanks C!), and wouldn't introduce more "helper"
> >>> functions into the library API.
> >>
> >> I disagree with what Austin said. Because this does not solve the
> >> issue at all (or I misunderstood). The issue is with the notmuch API,
> >> if someone is using this library there no way it can parse properly
> >> the authors.
> >> In my case I am not using the CLI but the notmuch library, fixing this
> >> in the CLI is just an hack, and it does not fix the issue for the
> >> library users.
> >
> > My suggestion was in no way specific to the CLI. That was the context of
> > the discussion at the time, but for the purposes of this discussion, the
> > CLI is just another library user.
> 
> Ok, sorry for misunderstanding.
> 
> >
> > You're completely right that there's no way to reliably parse the
> > authors list returned by notmuch_thread_get_authors. So don't do
> > that. Just use notmuch_thread_get_messages, walk the messages list, and
> > build your own authors list. There's no need to introduce additional
> > complexity and surface area into the library API for this specific use
> > case (IMO, even notmuch_thread_get_authors shouldn't exist, but it's
> > there for legacy reasons.) Then you can get author lists for matched,
> > non-matched, matching a specific tag, just the to, just the from, counts
> > of how many times each author appeared, whatever you want.
> >
> 
> Ok thanks!
> 
> If I read the code correctly, _notmuch_thread_create in lib/thread.cc
> process every message to get information like tags, subject and
> authors. Since notmuch_thread_get_authors is here for legacy reasons,
> would it be better to generate the list of authors only when requested
> with notmuch_thread_get_authors (and cache the result of course)?
> Because, new code will not use this and will do this work manually,
> the generation of the list in intern consumes resources for nothing.

It might be worth making this lazy. I'd be surprised if this has
noticeable CPU or memory cost in the grand scheme of putting together
a thread, but I don't have any numbers to back this up.


notmuch_thread_get_authors

2015-04-21 Thread Austin Clements
On Tue, 21 Apr 2015, Ronny Chevalier  wrote:
> On Tue, Apr 21, 2015 at 1:35 AM, David Bremner  wrote:
>> Ronny Chevalier  writes:
> Austin Clements wrote:
>> And I think there's a fairly easy way to do it in C code that will
>> also prevent library interface bloat: instead of introducing new
>> library APIs to get at this information, just use the existing
>> notmuch_thread_get_messages API and construct the matched and
>> non-matched lists in the CLI.  Doing it in the CLI wouldn't require
>> the library to export yet another string list structure, which is
>> always a huge pain (thanks C!), and wouldn't introduce more "helper"
>> functions into the library API.
>
> I disagree with what Austin said. Because this does not solve the
> issue at all (or I misunderstood). The issue is with the notmuch API,
> if someone is using this library there no way it can parse properly
> the authors.
> In my case I am not using the CLI but the notmuch library, fixing this
> in the CLI is just an hack, and it does not fix the issue for the
> library users.

My suggestion was in no way specific to the CLI. That was the context of
the discussion at the time, but for the purposes of this discussion, the
CLI is just another library user.

You're completely right that there's no way to reliably parse the
authors list returned by notmuch_thread_get_authors. So don't do
that. Just use notmuch_thread_get_messages, walk the messages list, and
build your own authors list. There's no need to introduce additional
complexity and surface area into the library API for this specific use
case (IMO, even notmuch_thread_get_authors shouldn't exist, but it's
there for legacy reasons.) Then you can get author lists for matched,
non-matched, matching a specific tag, just the to, just the from, counts
of how many times each author appeared, whatever you want.

> Furthermore, I do not see why providing a string list NULL-terminated
> in C is a huge pain?

See the notmuch_tags_* and notmuch_filesnames_* APIs. Those are just
string lists.

> Otherwise, I agree with Mark Walters comments on the patch.
>
> If no one is working to fix this at the moment, I can send a patch?
>
> Ronny


Re: notmuch_thread_get_authors

2015-04-21 Thread Austin Clements
Quoth Ronny Chevalier on Apr 22 at  4:01 am:
 On Wed, Apr 22, 2015 at 3:28 AM, Austin Clements
 acleme...@csail.mit.edu wrote:
  On Tue, 21 Apr 2015, Ronny Chevalier chevalier.ro...@gmail.com wrote:
  On Tue, Apr 21, 2015 at 1:35 AM, David Bremner da...@tethera.net wrote:
  Ronny Chevalier chevalier.ro...@gmail.com writes:
  Austin Clements wrote:
  And I think there's a fairly easy way to do it in C code that will
  also prevent library interface bloat: instead of introducing new
  library APIs to get at this information, just use the existing
  notmuch_thread_get_messages API and construct the matched and
  non-matched lists in the CLI.  Doing it in the CLI wouldn't require
  the library to export yet another string list structure, which is
  always a huge pain (thanks C!), and wouldn't introduce more helper
  functions into the library API.
 
  I disagree with what Austin said. Because this does not solve the
  issue at all (or I misunderstood). The issue is with the notmuch API,
  if someone is using this library there no way it can parse properly
  the authors.
  In my case I am not using the CLI but the notmuch library, fixing this
  in the CLI is just an hack, and it does not fix the issue for the
  library users.
 
  My suggestion was in no way specific to the CLI. That was the context of
  the discussion at the time, but for the purposes of this discussion, the
  CLI is just another library user.
 
 Ok, sorry for misunderstanding.
 
 
  You're completely right that there's no way to reliably parse the
  authors list returned by notmuch_thread_get_authors. So don't do
  that. Just use notmuch_thread_get_messages, walk the messages list, and
  build your own authors list. There's no need to introduce additional
  complexity and surface area into the library API for this specific use
  case (IMO, even notmuch_thread_get_authors shouldn't exist, but it's
  there for legacy reasons.) Then you can get author lists for matched,
  non-matched, matching a specific tag, just the to, just the from, counts
  of how many times each author appeared, whatever you want.
 
 
 Ok thanks!
 
 If I read the code correctly, _notmuch_thread_create in lib/thread.cc
 process every message to get information like tags, subject and
 authors. Since notmuch_thread_get_authors is here for legacy reasons,
 would it be better to generate the list of authors only when requested
 with notmuch_thread_get_authors (and cache the result of course)?
 Because, new code will not use this and will do this work manually,
 the generation of the list in intern consumes resources for nothing.

It might be worth making this lazy. I'd be surprised if this has
noticeable CPU or memory cost in the grand scheme of putting together
a thread, but I don't have any numbers to back this up.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: notmuch_thread_get_authors

2015-04-21 Thread Austin Clements
On Tue, 21 Apr 2015, Ronny Chevalier chevalier.ro...@gmail.com wrote:
 On Tue, Apr 21, 2015 at 1:35 AM, David Bremner da...@tethera.net wrote:
 Ronny Chevalier chevalier.ro...@gmail.com writes:
 Austin Clements wrote:
 And I think there's a fairly easy way to do it in C code that will
 also prevent library interface bloat: instead of introducing new
 library APIs to get at this information, just use the existing
 notmuch_thread_get_messages API and construct the matched and
 non-matched lists in the CLI.  Doing it in the CLI wouldn't require
 the library to export yet another string list structure, which is
 always a huge pain (thanks C!), and wouldn't introduce more helper
 functions into the library API.

 I disagree with what Austin said. Because this does not solve the
 issue at all (or I misunderstood). The issue is with the notmuch API,
 if someone is using this library there no way it can parse properly
 the authors.
 In my case I am not using the CLI but the notmuch library, fixing this
 in the CLI is just an hack, and it does not fix the issue for the
 library users.

My suggestion was in no way specific to the CLI. That was the context of
the discussion at the time, but for the purposes of this discussion, the
CLI is just another library user.

You're completely right that there's no way to reliably parse the
authors list returned by notmuch_thread_get_authors. So don't do
that. Just use notmuch_thread_get_messages, walk the messages list, and
build your own authors list. There's no need to introduce additional
complexity and surface area into the library API for this specific use
case (IMO, even notmuch_thread_get_authors shouldn't exist, but it's
there for legacy reasons.) Then you can get author lists for matched,
non-matched, matching a specific tag, just the to, just the from, counts
of how many times each author appeared, whatever you want.

 Furthermore, I do not see why providing a string list NULL-terminated
 in C is a huge pain?

See the notmuch_tags_* and notmuch_filesnames_* APIs. Those are just
string lists.

 Otherwise, I agree with Mark Walters comments on the patch.

 If no one is working to fix this at the moment, I can send a patch?

 Ronny
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v2 8/8] emacs: Support cid: references with shr renderer

2015-01-24 Thread Austin Clements
shr has really nice support for inline image rendering, but previously
we only had the hooks for w3m cid: references.
---
 emacs/notmuch-show.el | 45 +
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 34dcedd..66350d4 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -767,14 +767,43 @@ (defun 
notmuch-show-get-mime-type-of-application/octet-stream (part)
  nil

 (defun notmuch-show-insert-part-text/html (msg part content-type nth depth 
button)
-  ;; text/html handler to work around bugs in renderers and our
-  ;; invisibile parts code. In particular w3m sets up a keymap which
-  ;; "leaks" outside the invisible region and causes strange effects
-  ;; in notmuch. We set mm-inline-text-html-with-w3m-keymap to nil to
-  ;; tell w3m not to set a keymap (so the normal notmuch-show-mode-map
-  ;; remains).
-  (let ((mm-inline-text-html-with-w3m-keymap nil))
-(notmuch-show-insert-part-*/* msg part content-type nth depth button)))
+  (if (eq mm-text-html-renderer 'shr)
+  ;; It's easier to drive shr ourselves than to work around the
+  ;; goofy things `mm-shr' does (like irreversibly taking over
+  ;; content ID handling).
+  (notmuch-show--insert-part-text/html-shr msg part)
+;; Otherwise, let message-mode do the heavy lifting
+;;
+;; w3m sets up a keymap which "leaks" outside the invisible region
+;; and causes strange effects in notmuch. We set
+;; mm-inline-text-html-with-w3m-keymap to nil to tell w3m not to
+;; set a keymap (so the normal notmuch-show-mode-map remains).
+(let ((mm-inline-text-html-with-w3m-keymap nil))
+  (notmuch-show-insert-part-*/* msg part content-type nth depth button
+
+;; These functions are used by notmuch-show--insert-part-text/html-shr
+(declare-function libxml-parse-html-region "xml.c")
+(declare-function shr-insert-document "shr")
+
+(defun notmuch-show--insert-part-text/html-shr (msg part)
+  ;; Make sure shr is loaded before we start let-binding its globals
+  (require 'shr)
+  (let ((dom (let ((process-crypto notmuch-show-process-crypto))
+  (with-temp-buffer
+(insert (notmuch-get-bodypart-text msg part process-crypto))
+(libxml-parse-html-region (point-min) (point-max)
+   (shr-content-function
+(lambda (url)
+  ;; shr strips the "cid:" part of URL, but doesn't
+  ;; URL-decode it (see RFC 2392).
+  (let ((cid (url-unhex-string url)))
+(first (notmuch-show--get-cid-content cid)
+   ;; Block all external images to prevent privacy leaks and
+   ;; potential attacks.  FIXME: If we block an image, offer a
+   ;; button to load external images.
+   (shr-blocked-images "."))
+(shr-insert-document dom)
+t))

 (defun notmuch-show-insert-part-*/* (msg part content-type nth depth button)
   ;; This handler _must_ succeed - it is the handler of last resort.
-- 
2.1.3



[PATCH v2 7/8] emacs: Rewrite content ID handling

2015-01-24 Thread Austin Clements
Besides generally cleaning up the code and separating the general
content ID handling from the w3m-specific code, this fixes several
problems.

Foremost is that, previously, the code roughly assumed that referenced
parts would be in the same multipart/related as the reference.
According to RFC 2392, nothing could be further from the truth:
content IDs are supposed to be globally unique and globally
addressable.  This is nonsense, but this patch at least fixes things
so content IDs can be anywhere in the same message.

As a side-effect of the above, this handles multipart/alternate
content-IDs more in line with RFC 2046 section 5.1.2 (not that I've
ever seen this in the wild).  This also properly URL-decodes cid:
URLs, as per RFC 2392 (the previous code did not), and applies crypto
settings from the show buffer (the previous code used the global
crypto settings).
---
 emacs/notmuch-show.el | 120 +++---
 1 file changed, 74 insertions(+), 46 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 11eac5f..34dcedd 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -525,6 +525,73 @@ (defun notmuch-show-toggle-part-invisibility ( 
button)
  (overlay-put overlay 'invisible (not show))
  t)

+;; Part content ID handling
+
+(defvar notmuch-show--cids nil
+  "Alist from raw content ID to (MSG PART).")
+(make-variable-buffer-local 'notmuch-show--cids)
+
+(defun notmuch-show--register-cids (msg part)
+  "Register content-IDs in PART and all of PART's sub-parts."
+  (let ((content-id (plist-get part :content-id)))
+(when content-id
+  ;; Note that content-IDs are globally unique, except when they
+  ;; aren't: RFC 2046 section 5.1.4 permits children of a
+  ;; multipart/alternative to have the same content-ID, in which
+  ;; case the MUA is supposed to pick the best one it can render.
+  ;; We simply add the content-ID to the beginning of our alist;
+  ;; so if this happens, we'll take the last (and "best")
+  ;; alternative (even if we can't render it).
+  (push (list content-id msg part) notmuch-show--cids)))
+  ;; Recurse on sub-parts
+  (let ((ctype (notmuch-split-content-type
+   (downcase (plist-get part :content-type)
+(cond ((equal (first ctype) "multipart")
+  (mapc (apply-partially #'notmuch-show--register-cids msg)
+(plist-get part :content)))
+ ((equal ctype '("message" "rfc822"))
+  (notmuch-show--register-cids
+   msg
+   (first (plist-get (first (plist-get part :content)) :body)))
+
+(defun notmuch-show--get-cid-content (cid)
+  "Return a list (CID-content content-type) or nil.
+
+This will only find parts from messages that have been inserted
+into the current buffer.  CID must be a raw content ID, without
+enclosing angle brackets, a cid: prefix, or URL encoding.  This
+will return nil if the CID is unknown or cannot be retrieved."
+  (let ((descriptor (cdr (assoc cid notmuch-show--cids
+(when descriptor
+  (let* ((msg (first descriptor))
+(part (second descriptor))
+;; Request caching for this content, as some messages
+;; reference the same cid: part many times (hundreds!).
+(content (notmuch-get-bodypart-binary
+  msg part notmuch-show-process-crypto 'cache))
+(content-type (plist-get part :content-type)))
+   (list content content-type)
+
+(defun notmuch-show-setup-w3m ()
+  "Instruct w3m how to retrieve content from a \"related\" part of a message."
+  (interactive)
+  (if (boundp 'w3m-cid-retrieve-function-alist)
+(unless (assq 'notmuch-show-mode w3m-cid-retrieve-function-alist)
+  (push (cons 'notmuch-show-mode #'notmuch-show--cid-w3m-retrieve)
+   w3m-cid-retrieve-function-alist)))
+  (setq mm-inline-text-html-with-images t))
+
+(defvar w3m-current-buffer) ;; From `w3m.el'.
+(defun notmuch-show--cid-w3m-retrieve (url  args)
+  ;; url includes the cid: prefix and is URL encoded (see RFC 2392).
+  (let* ((cid (url-unhex-string (substring url 4)))
+(content-and-type
+ (with-current-buffer w3m-current-buffer
+   (notmuch-show--get-cid-content cid
+(when content-and-type
+  (insert (first content-and-type))
+  (second content-and-type
+
 ;; MIME part renderers

 (defun notmuch-show-multipart/*-to-list (part)
@@ -549,56 +616,11 @@ (defun notmuch-show-insert-part-multipart/alternative 
(msg part content-type nth
   (indent-rigidly start (point) 1)))
   t)

-(defun notmuch-show-setup-w3m ()
-  "Instruct w3m how to retrieve content from a \"related\" part of a message."
-  (interactive)
-  (if (boundp 'w3m-cid-retrieve-function-alist)
-(unless (assq 'notmuch-show-mode w3m-cid-retrieve-function-alist)
-  (push (cons 'notmuch-show-mode 'notmuch-show-w3m-cid-retrieve)
-   w3m-cid-retrieve-function-alist)))
-  (setq 

[PATCH v2 6/8] emacs: Use generalized content caching in w3m CID code

2015-01-24 Thread Austin Clements
Previously this did its own caching, but this is now supported by more
generally by `notmuch-get-bodypart-binary'.
---
 emacs/notmuch-show.el | 23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index f29428a..11eac5f 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -562,15 +562,14 @@ (defvar w3m-current-buffer) ;; From `w3m.el'.
 (defvar notmuch-show-w3m-cid-store nil)
 (make-variable-buffer-local 'notmuch-show-w3m-cid-store)

-(defun notmuch-show-w3m-cid-store-internal (content-id msg part content)
-  (push (list content-id msg part content)
-   notmuch-show-w3m-cid-store))
+(defun notmuch-show-w3m-cid-store-internal (content-id msg part)
+  (push (list content-id msg part) notmuch-show-w3m-cid-store))

 (defun notmuch-show-w3m-cid-store (msg part)
   (let ((content-id (plist-get part :content-id)))
 (when content-id
   (notmuch-show-w3m-cid-store-internal (concat "cid:" content-id)
-  msg part nil
+  msg part

 (defun notmuch-show-w3m-cid-retrieve (url  args)
   (let ((matching-part (with-current-buffer w3m-current-buffer
@@ -578,18 +577,12 @@ (defun notmuch-show-w3m-cid-retrieve (url  args)
 (if matching-part
(let* ((msg (nth 1 matching-part))
   (part (nth 2 matching-part))
-  (content (nth 3 matching-part))
   (content-type (plist-get part :content-type)))
- ;; If we don't already have the content, get it and cache
- ;; it, as some messages reference the same cid: part many
- ;; times (hundreds!), which results in many calls to
- ;; `notmuch part'.
- (unless content
-   (setq content (notmuch-get-bodypart-binary
-  msg part notmuch-show-process-crypto))
-   (with-current-buffer w3m-current-buffer
- (notmuch-show-w3m-cid-store-internal url msg part content)))
- (insert content)
+ ;; Request content caching, as some messages reference the
+ ;; same cid: part many times (hundreds!), which results in
+ ;; many calls to `notmuch show'.
+ (insert (notmuch-get-bodypart-binary
+  msg part notmuch-show-process-crypto 'cache))
  content-type)
   nil)))

-- 
2.1.3



[PATCH v2 5/8] emacs: Support caching in notmuch-get-bodypart-{binary, text}

2015-01-24 Thread Austin Clements
(The actual code change here is small, but requires re-indenting
existing code.)
---
 emacs/notmuch-lib.el | 64 
 1 file changed, 40 insertions(+), 24 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 3154725..f8e5165 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -529,39 +529,53 @@ (defun notmuch-parts-filter-by-type (parts type)
(lambda (part) (notmuch-match-content-type (plist-get part :content-type) 
type))
parts))

-(defun notmuch-get-bodypart-binary (msg part process-crypto)
+(defun notmuch-get-bodypart-binary (msg part process-crypto  cache)
   "Return the unprocessed content of PART in MSG as a unibyte string.

 This returns the \"raw\" content of the given part after content
 transfer decoding, but with no further processing (see the
 discussion of --format=raw in man notmuch-show).  In particular,
-this does no charset conversion."
-  (let ((args `("show" "--format=raw"
-   ,(format "--part=%d" (plist-get part :id))
-   ,@(when process-crypto '("--decrypt"))
-   ,(notmuch-id-to-query (plist-get msg :id)
-(with-temp-buffer
-  ;; Emacs internally uses a UTF-8-like multibyte string
-  ;; representation by default (regardless of the coding system,
-  ;; which only affects how it goes from outside data to this
-  ;; internal representation).  This *almost* never matters.
-  ;; Annoyingly, it does matter if we use this data in an image
-  ;; descriptor, since Emacs will use its internal data buffer
-  ;; directly and this multibyte representation corrupts binary
-  ;; image formats.  Since the caller is asking for binary data, a
-  ;; unibyte string is a more appropriate representation anyway.
-  (set-buffer-multibyte nil)
-  (let ((coding-system-for-read 'no-conversion))
-   (apply #'call-process notmuch-command nil '(t nil) nil args)
-   (buffer-string)
-
-(defun notmuch-get-bodypart-text (msg part process-crypto)
+this does no charset conversion.
+
+If CACHE is non-nil, the content of this part will be saved in
+MSG (if it isn't already)."
+  (let ((data (plist-get part :binary-content)))
+(when (not data)
+  (let ((args `("show" "--format=raw"
+   ,(format "--part=%d" (plist-get part :id))
+   ,@(when process-crypto '("--decrypt"))
+   ,(notmuch-id-to-query (plist-get msg :id)
+   (with-temp-buffer
+ ;; Emacs internally uses a UTF-8-like multibyte string
+ ;; representation by default (regardless of the coding
+ ;; system, which only affects how it goes from outside data
+ ;; to this internal representation).  This *almost* never
+ ;; matters.  Annoyingly, it does matter if we use this data
+ ;; in an image descriptor, since Emacs will use its internal
+ ;; data buffer directly and this multibyte representation
+ ;; corrupts binary image formats.  Since the caller is
+ ;; asking for binary data, a unibyte string is a more
+ ;; appropriate representation anyway.
+ (set-buffer-multibyte nil)
+ (let ((coding-system-for-read 'no-conversion))
+   (apply #'call-process notmuch-command nil '(t nil) nil args)
+   (setq data (buffer-string)
+  (when cache
+   ;; Cheat.  part is non-nil, and `plist-put' always modifies
+   ;; the list in place if it's non-nil.
+   (plist-put part :binary-content data)))
+data))
+
+(defun notmuch-get-bodypart-text (msg part process-crypto  cache)
   "Return the text content of PART in MSG.

 This returns the content of the given part as a multibyte Lisp
 string after performing content transfer decoding and any
 necessary charset decoding.  It is an error to use this for
-non-text/* parts."
+non-text/* parts.
+
+If CACHE is non-nil, the content of this part will be saved in
+MSG (if it isn't already)."
   (let ((content (plist-get part :content)))
 (when (not content)
   ;; Use show --format=sexp to fetch decoded content
@@ -572,7 +586,9 @@ (defun notmuch-get-bodypart-text (msg part process-crypto)
 (npart (apply #'notmuch-call-notmuch-sexp args)))
(setq content (plist-get npart :content))
(when (not content)
- (error "Internal error: No :content from %S" args
+ (error "Internal error: No :content from %S" args)))
+  (when cache
+   (plist-put part :content content)))
 content))

 ;; Workaround: The call to `mm-display-part' below triggers a bug in
-- 
2.1.3



[PATCH v2 4/8] emacs: Return unibyte strings for binary part data

2015-01-24 Thread Austin Clements
Unibyte strings are meant for representing binary data.  In practice,
using unibyte versus multibyte strings affects *almost* nothing.  It
does happen to matter if we use the binary data in an image descriptor
(which is, helpfully, not documented anywhere and getting it wrong
results in opaque errors like "Not a PNG image: ").
---
 emacs/notmuch-lib.el | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index a0a95d8..3154725 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -530,7 +530,7 @@ (defun notmuch-parts-filter-by-type (parts type)
parts))

 (defun notmuch-get-bodypart-binary (msg part process-crypto)
-  "Return the unprocessed content of PART in MSG.
+  "Return the unprocessed content of PART in MSG as a unibyte string.

 This returns the \"raw\" content of the given part after content
 transfer decoding, but with no further processing (see the
@@ -541,6 +541,16 @@ (defun notmuch-get-bodypart-binary (msg part 
process-crypto)
,@(when process-crypto '("--decrypt"))
,(notmuch-id-to-query (plist-get msg :id)
 (with-temp-buffer
+  ;; Emacs internally uses a UTF-8-like multibyte string
+  ;; representation by default (regardless of the coding system,
+  ;; which only affects how it goes from outside data to this
+  ;; internal representation).  This *almost* never matters.
+  ;; Annoyingly, it does matter if we use this data in an image
+  ;; descriptor, since Emacs will use its internal data buffer
+  ;; directly and this multibyte representation corrupts binary
+  ;; image formats.  Since the caller is asking for binary data, a
+  ;; unibyte string is a more appropriate representation anyway.
+  (set-buffer-multibyte nil)
   (let ((coding-system-for-read 'no-conversion))
(apply #'call-process notmuch-command nil '(t nil) nil args)
(buffer-string)
-- 
2.1.3



[PATCH v2 2/8] emacs: Create an API for fetching parts as undecoded binary

2015-01-24 Thread Austin Clements
The new function, `notmuch-get-bodypart-binary', replaces
`notmuch-get-bodypart-internal'.  Whereas the old function was really
meant for internal use in `notmuch-get-bodypart-content', it was used
in a few other places.  Since the difference between
`notmuch-get-bodypart-content' and `notmuch-get-bodypart-internal' was
unclear, these other uses were always confusing and potentially
inconsistent.  The new call clearly requests the part as undecoded
binary.

This is step 1 of 2 in separating `notmuch-get-bodypart-content' into
two APIs for retrieving either undecoded binary or decoded text.
---
 emacs/notmuch-lib.el  | 28 ++--
 emacs/notmuch-show.el | 22 +-
 2 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index fd25f7c..d4b6684 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -529,25 +529,25 @@ (defun notmuch-parts-filter-by-type (parts type)
(lambda (part) (notmuch-match-content-type (plist-get part :content-type) 
type))
parts))

-;; Helper for parts which are generally not included in the default
-;; SEXP output.
-(defun notmuch-get-bodypart-internal (query part-number process-crypto)
-  (let ((args '("show" "--format=raw"))
-   (part-arg (format "--part=%s" part-number)))
-(setq args (append args (list part-arg)))
-(if process-crypto
-   (setq args (append args '("--decrypt"
-(setq args (append args (list query)))
+(defun notmuch-get-bodypart-binary (msg part process-crypto)
+  "Return the unprocessed content of PART in MSG.
+
+This returns the \"raw\" content of the given part after content
+transfer decoding, but with no further processing (see the
+discussion of --format=raw in man notmuch-show).  In particular,
+this does no charset conversion."
+  (let ((args `("show" "--format=raw"
+   ,(format "--part=%d" (plist-get part :id))
+   ,@(when process-crypto '("--decrypt"))
+   ,(notmuch-id-to-query (plist-get msg :id)
 (with-temp-buffer
   (let ((coding-system-for-read 'no-conversion))
-   (progn
- (apply 'call-process (append (list notmuch-command nil (list t nil) 
nil) args))
- (buffer-string))
+   (apply #'call-process notmuch-command nil '(t nil) nil args)
+   (buffer-string)

 (defun notmuch-get-bodypart-content (msg part process-crypto)
   (or (plist-get part :content)
-  (notmuch-get-bodypart-internal (notmuch-id-to-query (plist-get msg :id))
-(plist-get part :id) process-crypto)))
+  (notmuch-get-bodypart-binary msg part process-crypto)))

 ;; Workaround: The call to `mm-display-part' below triggers a bug in
 ;; Emacs 24 if it attempts to use the shr renderer to display an HTML
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index df2389e..b3e339e 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -579,16 +579,14 @@ (defun notmuch-show-w3m-cid-retrieve (url  args)
(let* ((msg (nth 1 matching-part))
   (part (nth 2 matching-part))
   (content (nth 3 matching-part))
-  (message-id (plist-get msg :id))
-  (part-number (plist-get part :id))
   (content-type (plist-get part :content-type)))
  ;; If we don't already have the content, get it and cache
  ;; it, as some messages reference the same cid: part many
  ;; times (hundreds!), which results in many calls to
  ;; `notmuch part'.
  (unless content
-   (setq content (notmuch-get-bodypart-internal (notmuch-id-to-query 
message-id)
- part-number 
notmuch-show-process-crypto))
+   (setq content (notmuch-get-bodypart-binary
+  msg part notmuch-show-process-crypto))
(with-current-buffer w3m-current-buffer
  (notmuch-show-w3m-cid-store-internal url msg part content)))
  (insert content)
@@ -2162,15 +2160,14 @@ (defun notmuch-show-stash-git-send-email ( 
no-in-reply-to)

 ;; Interactive part functions and their helpers

-(defun notmuch-show-generate-part-buffer (message-id nth)
+(defun notmuch-show-generate-part-buffer (msg part)
   "Return a temporary buffer containing the specified part's content."
   (let ((buf (generate-new-buffer " *notmuch-part*"))
(process-crypto notmuch-show-process-crypto))
 (with-current-buffer buf
-  (setq notmuch-show-process-crypto process-crypto)
-  ;; Always acquires the part via `notmuch part', even if it is
-  ;; available in the SEXP output.
-  (insert (notmuch-get-bodypart-internal message-id nth 
notmuch-show-process-crypto)))
+  ;; This is always used in the content of mm handles, which
+  ;; expect undecoded, binary part content.
+  (insert (notmuch-get-bodypart-binary msg part process-crypto)))
 buf))

 (defun 

[PATCH v2 1/8] emacs: Track full message and part descriptor in w3m CID store

2015-01-24 Thread Austin Clements
This will simplify later changes.
---
 emacs/notmuch-show.el | 33 ++---
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 87b4881..df2389e 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -562,35 +562,26 @@ (defvar w3m-current-buffer) ;; From `w3m.el'.
 (defvar notmuch-show-w3m-cid-store nil)
 (make-variable-buffer-local 'notmuch-show-w3m-cid-store)

-(defun notmuch-show-w3m-cid-store-internal (content-id
-   message-id
-   part-number
-   content-type
-   content)
-  (push (list content-id
- message-id
- part-number
- content-type
- content)
+(defun notmuch-show-w3m-cid-store-internal (content-id msg part content)
+  (push (list content-id msg part content)
notmuch-show-w3m-cid-store))

 (defun notmuch-show-w3m-cid-store (msg part)
   (let ((content-id (plist-get part :content-id)))
 (when content-id
   (notmuch-show-w3m-cid-store-internal (concat "cid:" content-id)
-  (plist-get msg :id)
-  (plist-get part :id)
-  (plist-get part :content-type)
-  nil
+  msg part nil

 (defun notmuch-show-w3m-cid-retrieve (url  args)
   (let ((matching-part (with-current-buffer w3m-current-buffer
 (assoc url notmuch-show-w3m-cid-store
 (if matching-part
-   (let ((message-id (nth 1 matching-part))
- (part-number (nth 2 matching-part))
- (content-type (nth 3 matching-part))
- (content (nth 4 matching-part)))
+   (let* ((msg (nth 1 matching-part))
+  (part (nth 2 matching-part))
+  (content (nth 3 matching-part))
+  (message-id (plist-get msg :id))
+  (part-number (plist-get part :id))
+  (content-type (plist-get part :content-type)))
  ;; If we don't already have the content, get it and cache
  ;; it, as some messages reference the same cid: part many
  ;; times (hundreds!), which results in many calls to
@@ -599,11 +590,7 @@ (defun notmuch-show-w3m-cid-retrieve (url  args)
(setq content (notmuch-get-bodypart-internal (notmuch-id-to-query 
message-id)
  part-number 
notmuch-show-process-crypto))
(with-current-buffer w3m-current-buffer
- (notmuch-show-w3m-cid-store-internal url
-  message-id
-  part-number
-  content-type
-  content)))
+ (notmuch-show-w3m-cid-store-internal url msg part content)))
  (insert content)
  content-type)
   nil)))
-- 
2.1.3



[PATCH v2 0/8] Improve charset and cid: handling

2015-01-24 Thread Austin Clements
This is v2 of id:1398105468-14317-1-git-send-email-amdragon at mit.edu.
This improves some comments/documentation, fixes a bug that caused
cryptographic processing to not happen on HTML parts, and addresses
some byte compiler warnings on Emacs 23.  This version has also been
rebased against the several months of changes that happened on master
since v1 (which, remarkably, was trivial).

The diff from v1 is below.

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 83cbf2f..f8e5165 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -535,7 +535,10 @@ (defun notmuch-get-bodypart-binary (msg part 
process-crypto  cache)
 This returns the \"raw\" content of the given part after content
 transfer decoding, but with no further processing (see the
 discussion of --format=raw in man notmuch-show).  In particular,
-this does no charset conversion."
+this does no charset conversion.
+
+If CACHE is non-nil, the content of this part will be saved in
+MSG (if it isn't already)."
   (let ((data (plist-get part :binary-content)))
 (when (not data)
   (let ((args `("show" "--format=raw"
@@ -558,6 +561,8 @@ (defun notmuch-get-bodypart-binary (msg part process-crypto 
 cache)
(apply #'call-process notmuch-command nil '(t nil) nil args)
(setq data (buffer-string)
   (when cache
+   ;; Cheat.  part is non-nil, and `plist-put' always modifies
+   ;; the list in place if it's non-nil.
(plist-put part :binary-content data)))
 data))

@@ -567,7 +572,10 @@ (defun notmuch-get-bodypart-text (msg part process-crypto 
 cache)
 This returns the content of the given part as a multibyte Lisp
 string after performing content transfer decoding and any
 necessary charset decoding.  It is an error to use this for
-non-text/* parts."
+non-text/* parts.
+
+If CACHE is non-nil, the content of this part will be saved in
+MSG (if it isn't already)."
   (let ((content (plist-get part :content)))
 (when (not content)
   ;; Use show --format=sexp to fetch decoded content
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index d190711..66350d4 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -781,10 +781,14 @@ (defun notmuch-show-insert-part-text/html (msg part 
content-type nth depth butto
 (let ((mm-inline-text-html-with-w3m-keymap nil))
   (notmuch-show-insert-part-*/* msg part content-type nth depth button

+;; These functions are used by notmuch-show--insert-part-text/html-shr
+(declare-function libxml-parse-html-region "xml.c")
+(declare-function shr-insert-document "shr")
+
 (defun notmuch-show--insert-part-text/html-shr (msg part)
   ;; Make sure shr is loaded before we start let-binding its globals
   (require 'shr)
-  (let ((dom (let (process-crypto notmuch-show-process-crypto)
+  (let ((dom (let ((process-crypto notmuch-show-process-crypto))
   (with-temp-buffer
 (insert (notmuch-get-bodypart-text msg part process-crypto))
 (libxml-parse-html-region (point-min) (point-max)




[PATCH 08/11] emacs: Support caching in notmuch-get-bodypart-{binary, text}

2015-01-24 Thread Austin Clements
On Fri, 25 Apr 2014, Mark Walters  wrote:
> On Thu, 24 Apr 2014, Austin Clements  wrote:
>> Quoth Mark Walters on Apr 24 at 11:46 am:
>>> 
>>> On Mon, 21 Apr 2014, Austin Clements  wrote:
>>> > (The actual code change here is small, but requires re-indenting
>>> > existing code.)
>>> > ---
>>> >  emacs/notmuch-lib.el | 52 
>>> > ++--
>>> >  1 file changed, 30 insertions(+), 22 deletions(-)
>>> >
>>> > diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
>>> > index fc67b14..fee8512 100644
>>> > --- a/emacs/notmuch-lib.el
>>> > +++ b/emacs/notmuch-lib.el
>>> > @@ -503,33 +503,39 @@ (defun notmuch-parts-filter-by-type (parts type)
>>> > (lambda (part) (notmuch-match-content-type (plist-get part 
>>> > :content-type) type))
>>> > parts))
>>> >  
>>> > -(defun notmuch-get-bodypart-binary (msg part process-crypto)
>>> > +(defun notmuch-get-bodypart-binary (msg part process-crypto  
>>> > cache)
>>> >"Return the unprocessed content of PART in MSG as a unibyte string.
>>> >  
>>> >  This returns the \"raw\" content of the given part after content
>>> >  transfer decoding, but with no further processing (see the
>>> >  discussion of --format=raw in man notmuch-show).  In particular,
>>> >  this does no charset conversion."
>>> > -  (let ((args `("show" "--format=raw"
>>> > - ,(format "--part=%d" (plist-get part :id))
>>> > - ,@(when process-crypto '("--decrypt"))
>>> > - ,(notmuch-id-to-query (plist-get msg :id)
>>> > -(with-temp-buffer
>>> > -  ;; Emacs internally uses a UTF-8-like multibyte string
>>> > -  ;; representation by default (regardless of the coding system,
>>> > -  ;; which only affects how it goes from outside data to this
>>> > -  ;; internal representation).  This *almost* never matters.
>>> > -  ;; Annoyingly, it does matter if we use this data in an image
>>> > -  ;; descriptor, since Emacs will use its internal data buffer
>>> > -  ;; directly and this multibyte representation corrupts binary
>>> > -  ;; image formats.  Since the caller is asking for binary data, a
>>> > -  ;; unibyte string is a more appropriate representation anyway.
>>> > -  (set-buffer-multibyte nil)
>>> > -  (let ((coding-system-for-read 'no-conversion))
>>> > - (apply #'call-process notmuch-command nil '(t nil) nil args)
>>> > - (buffer-string)
>>> > -
>>> > -(defun notmuch-get-bodypart-text (msg part process-crypto)
>>> > +  (let ((data (plist-get part :binary-content)))
>>> > +(when (not data)
>>> > +  (let ((args `("show" "--format=raw"
>>> > + ,(format "--part=%d" (plist-get part :id))
>>> > + ,@(when process-crypto '("--decrypt"))
>>> > + ,(notmuch-id-to-query (plist-get msg :id)
>>> > + (with-temp-buffer
>>> > +   ;; Emacs internally uses a UTF-8-like multibyte string
>>> > +   ;; representation by default (regardless of the coding
>>> > +   ;; system, which only affects how it goes from outside data
>>> > +   ;; to this internal representation).  This *almost* never
>>> > +   ;; matters.  Annoyingly, it does matter if we use this data
>>> > +   ;; in an image descriptor, since Emacs will use its internal
>>> > +   ;; data buffer directly and this multibyte representation
>>> > +   ;; corrupts binary image formats.  Since the caller is
>>> > +   ;; asking for binary data, a unibyte string is a more
>>> > +   ;; appropriate representation anyway.
>>> > +   (set-buffer-multibyte nil)
>>> > +   (let ((coding-system-for-read 'no-conversion))
>>> > + (apply #'call-process notmuch-command nil '(t nil) nil args)
>>> > + (setq data (buffer-string)
>>> > +  (when cache
>>> > + (plist-put part :binary-content data)))
>>> > +data))
>>> 
>>> I am a little puzzled by this but that could be lack of familiarity with
>>> elisp. As far as I can see plist-put will sometimes modify the original
>>> plist and sometimes return a new plist. If the latt

[PATCH 00/11] Improve charset and cid: handling

2015-01-24 Thread Austin Clements
I added declare-functions for both of these, which should take care of
the Emacs 23 warnings and be more robust on Emacs 24.  We can't reach
the function that calls these unless shr is actually available, but the
byte compiler doesn't know that.

On Sat, 26 Apr 2014, Mark Walters  wrote:
> Aside from the minor comments I mentioned in previous emails and one
> more comment below this looks good. 
>
> The extra comment is that on emacs23 I get the following when compiling:
>
> In end of data:
> notmuch-show.el:2188:1:Warning: the following functions are not known to be
> defined: libxml-parse-html-region, shr-insert-document
>
> Finally, I have not really tested it as I mainly use emacs23
>
> Best wishes
>
> Mark
>
>
>
>
> On Mon, 21 Apr 2014, Austin Clements  wrote:
>> I set out to quickly add support for cid: links in the shr renderer
>> and wound up making our charset handling more robust and rewriting our
>> content-ID handling.  The test introduced in patch 2 passes in all but
>> one really obscure case, but only because of many unwritten and
>> potentially fragile assumptions that Emacs and the CLI make about each
>> other.
>>
>> The first three patches could reasonably go in to 0.18.  The rest of
>> this series is certainly post-0.18, but I didn't want to lose track of
>> it.
>>
>> This series comes in three stages.  Each depends on the earlier ones,
>> but each prefix makes sense on its own and could be pushed without the
>> later stages.
>>
>> Patch 1 is a simple clean up patch.
>>
>> Patches 2 through 7 robust-ify our charset handling in Emacs, mostly
>> by splitting the broken `notmuch-get-bodypart-content' API into
>> `notmuch-get-bodypart-binary' and `notmuch-get-bodypart-text' so a
>> caller can explicitly convey their requirements.
>>
>> The remaining patches improve our content-ID handling and add support
>> for cid: links for shr.
>>
>> ___
>> notmuch mailing list
>> notmuch at notmuchmail.org
>> http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 11/11] emacs: Support cid: references with shr renderer

2015-01-24 Thread Austin Clements
On Thu, 01 May 2014, David Edmondson  wrote:
> On Mon, Apr 21 2014, Austin Clements wrote:
>> +(defun notmuch-show--insert-part-text/html-shr (msg part)
>> +  ;; Make sure shr is loaded before we start let-binding its globals
>> +  (require 'shr)
>> +  (let ((dom (let (process-crypto notmuch-show-process-crypto)
>
> Missing brackets? (You let-bind both `process-crypto' and
> `notmuch-show-process-crypto' as nil.)

'Doh.  Thanks.


[PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API

2015-01-24 Thread Austin Clements
On Fri, 11 Jul 2014, David Bremner  wrote:
> Austin Clements  writes:
>
>> +This returns the content of the given part as a multibyte Lisp
>
> What does "multibyte" mean here? utf8? current encoding?

Elisp has two kinds of stings: "unibyte strings" and "multibyte
strings".

  
https://www.gnu.org/software/emacs/manual/html_node/elisp/Non_002dASCII-in-Strings.html

You can think of unibyte strings as binary data; they're just vectors of
bytes without any particular encoding semantics (though when you use a
unibyte string you can endow it with encoding).  Multibyte strings,
however, are text; they're vectors of Unicode code points.

>> +string after performing content transfer decoding and any
>> +necessary charset decoding.  It is an error to use this for
>> +non-text/* parts."
>> +  (let ((content (plist-get part :content)))
>> +(when (not content)
>> +  ;; Use show --format=sexp to fetch decoded content
>> +  (let* ((args `("show" "--format=sexp" "--include-html"
>> + ,(format "--part=%s" (plist-get part :id))
>> + ,@(when process-crypto '("--decrypt"))
>> + ,(notmuch-id-to-query (plist-get msg :id
>> + (npart (apply #'notmuch-call-notmuch-sexp args)))
>> +(setq content (plist-get npart :content))
>> +(when (not content)
>> +  (error "Internal error: No :content from %S" args
>> +content))
>
> I'm a bit curious at the lack of setting "coding-system-for-read" here.
> Are we assuming the user has their environment set up correctly? Not so
> much a criticism as being nervous about everything coding-system
> related.

That is interesting.  coding-system-for-read should really go in
notmuch-call-notmuch-sexp, but I worry that, while *almost* all strings
the CLI outputs are UTF-8, not quite all of them are.  For example, we
output filenames exactly at the OS reports the bytes to us (which is
necessary, in a sense, because POSIX enforces no particular encoding on
file names, but still really unfortunate).

We could set coding-system-for-read, but a full solution needs more
cooperation from the CLI.  Possibly the right answer, at least for the
sexp format, is to do our own UTF-8 to "\u" escapes for strings that
are known to be UTF-8 and leave the raw bytes for the few that aren't.
Then we would set the coding-system-for-read to 'no-conversion and I
think everything would Just Work.

That doesn't help for JSON, which is supposed to be all UTF-8 all the
time.  I can think of solutions there, but they're all ugly and involve
things like encoding filenames as base64 when they aren't valid UTF-8.

So...  I don't think I'm going to do anything about this at this moment.

> I didn't see anything else to object to in this patch or the previous
> one.


[PATCH 04/11] emacs: Track full message and part descriptor in w3m CID store

2015-01-24 Thread Austin Clements
On Thu, 10 Jul 2014, David Bremner  wrote:
> Austin Clements  writes:
>
>> This will simplify later changes.
>
> I'd have preferred the whitespace changes as a seperate patch, but OK.

Not sure which whitespace changes you're referring to.  Everything in
this diff is actual (minor) code changes.


Re: [PATCH 08/11] emacs: Support caching in notmuch-get-bodypart-{binary, text}

2015-01-24 Thread Austin Clements
On Fri, 25 Apr 2014, Mark Walters markwalters1...@gmail.com wrote:
 On Thu, 24 Apr 2014, Austin Clements amdra...@mit.edu wrote:
 Quoth Mark Walters on Apr 24 at 11:46 am:
 
 On Mon, 21 Apr 2014, Austin Clements amdra...@mit.edu wrote:
  (The actual code change here is small, but requires re-indenting
  existing code.)
  ---
   emacs/notmuch-lib.el | 52 
  ++--
   1 file changed, 30 insertions(+), 22 deletions(-)
 
  diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
  index fc67b14..fee8512 100644
  --- a/emacs/notmuch-lib.el
  +++ b/emacs/notmuch-lib.el
  @@ -503,33 +503,39 @@ (defun notmuch-parts-filter-by-type (parts type)
  (lambda (part) (notmuch-match-content-type (plist-get part 
  :content-type) type))
  parts))
   
  -(defun notmuch-get-bodypart-binary (msg part process-crypto)
  +(defun notmuch-get-bodypart-binary (msg part process-crypto optional 
  cache)
 Return the unprocessed content of PART in MSG as a unibyte string.
   
   This returns the \raw\ content of the given part after content
   transfer decoding, but with no further processing (see the
   discussion of --format=raw in man notmuch-show).  In particular,
   this does no charset conversion.
  -  (let ((args `(show --format=raw
  - ,(format --part=%d (plist-get part :id))
  - ,@(when process-crypto '(--decrypt))
  - ,(notmuch-id-to-query (plist-get msg :id)
  -(with-temp-buffer
  -  ;; Emacs internally uses a UTF-8-like multibyte string
  -  ;; representation by default (regardless of the coding system,
  -  ;; which only affects how it goes from outside data to this
  -  ;; internal representation).  This *almost* never matters.
  -  ;; Annoyingly, it does matter if we use this data in an image
  -  ;; descriptor, since Emacs will use its internal data buffer
  -  ;; directly and this multibyte representation corrupts binary
  -  ;; image formats.  Since the caller is asking for binary data, a
  -  ;; unibyte string is a more appropriate representation anyway.
  -  (set-buffer-multibyte nil)
  -  (let ((coding-system-for-read 'no-conversion))
  - (apply #'call-process notmuch-command nil '(t nil) nil args)
  - (buffer-string)
  -
  -(defun notmuch-get-bodypart-text (msg part process-crypto)
  +  (let ((data (plist-get part :binary-content)))
  +(when (not data)
  +  (let ((args `(show --format=raw
  + ,(format --part=%d (plist-get part :id))
  + ,@(when process-crypto '(--decrypt))
  + ,(notmuch-id-to-query (plist-get msg :id)
  + (with-temp-buffer
  +   ;; Emacs internally uses a UTF-8-like multibyte string
  +   ;; representation by default (regardless of the coding
  +   ;; system, which only affects how it goes from outside data
  +   ;; to this internal representation).  This *almost* never
  +   ;; matters.  Annoyingly, it does matter if we use this data
  +   ;; in an image descriptor, since Emacs will use its internal
  +   ;; data buffer directly and this multibyte representation
  +   ;; corrupts binary image formats.  Since the caller is
  +   ;; asking for binary data, a unibyte string is a more
  +   ;; appropriate representation anyway.
  +   (set-buffer-multibyte nil)
  +   (let ((coding-system-for-read 'no-conversion))
  + (apply #'call-process notmuch-command nil '(t nil) nil args)
  + (setq data (buffer-string)
  +  (when cache
  + (plist-put part :binary-content data)))
  +data))
 
 I am a little puzzled by this but that could be lack of familiarity with
 elisp. As far as I can see plist-put will sometimes modify the original
 plist and sometimes return a new plist. If the latter happens then I
 think it works out as if we hadn't cached anything as the part passed to
 the function is unmodified. That might not matter in this case (though I
 find the lack of determinism disturbing).
 
 Or is something else going on?

 No, your familiarity with Elisp serves you well.  I'm completely
 cheating here.  According to the specification of plist-put, it's
 allowed to return a new list but in reality this only happens when the
 original plist is nil.  We lean on this already all over the
 notmuch-emacs code, but maybe that doesn't excuse me adding one more
 cheat.

 I could add a comment here explaining what's going on, I could
 manually do the list insertion in a way that's guaranteed to mutate it
 in place, or I could add a nil :binary-content property when parts are
 created (since plist-put is guaranteed to mutate existing keys in
 place).

 I think a comment is fine. 

Done.

 (Incidentally what is the best way of telling if emacs has changed an
 object or returned a new one for other commands? Something like (setq
 oldobject object) (setq object (operation-on object)) (if (eq object
 oldobject) ... ))

If `eq' returns t, it definitely returned the original object, though it
may or may not have modified

Re: [PATCH 04/11] emacs: Track full message and part descriptor in w3m CID store

2015-01-24 Thread Austin Clements
On Thu, 10 Jul 2014, David Bremner da...@tethera.net wrote:
 Austin Clements amdra...@mit.edu writes:

 This will simplify later changes.

 I'd have preferred the whitespace changes as a seperate patch, but OK.

Not sure which whitespace changes you're referring to.  Everything in
this diff is actual (minor) code changes.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 11/11] emacs: Support cid: references with shr renderer

2015-01-24 Thread Austin Clements
On Thu, 01 May 2014, David Edmondson d...@dme.org wrote:
 On Mon, Apr 21 2014, Austin Clements wrote:
 +(defun notmuch-show--insert-part-text/html-shr (msg part)
 +  ;; Make sure shr is loaded before we start let-binding its globals
 +  (require 'shr)
 +  (let ((dom (let (process-crypto notmuch-show-process-crypto)

 Missing brackets? (You let-bind both `process-crypto' and
 `notmuch-show-process-crypto' as nil.)

'Doh.  Thanks.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v2 8/8] emacs: Support cid: references with shr renderer

2015-01-24 Thread Austin Clements
shr has really nice support for inline image rendering, but previously
we only had the hooks for w3m cid: references.
---
 emacs/notmuch-show.el | 45 +
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 34dcedd..66350d4 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -767,14 +767,43 @@ (defun 
notmuch-show-get-mime-type-of-application/octet-stream (part)
  nil
 
 (defun notmuch-show-insert-part-text/html (msg part content-type nth depth 
button)
-  ;; text/html handler to work around bugs in renderers and our
-  ;; invisibile parts code. In particular w3m sets up a keymap which
-  ;; leaks outside the invisible region and causes strange effects
-  ;; in notmuch. We set mm-inline-text-html-with-w3m-keymap to nil to
-  ;; tell w3m not to set a keymap (so the normal notmuch-show-mode-map
-  ;; remains).
-  (let ((mm-inline-text-html-with-w3m-keymap nil))
-(notmuch-show-insert-part-*/* msg part content-type nth depth button)))
+  (if (eq mm-text-html-renderer 'shr)
+  ;; It's easier to drive shr ourselves than to work around the
+  ;; goofy things `mm-shr' does (like irreversibly taking over
+  ;; content ID handling).
+  (notmuch-show--insert-part-text/html-shr msg part)
+;; Otherwise, let message-mode do the heavy lifting
+;;
+;; w3m sets up a keymap which leaks outside the invisible region
+;; and causes strange effects in notmuch. We set
+;; mm-inline-text-html-with-w3m-keymap to nil to tell w3m not to
+;; set a keymap (so the normal notmuch-show-mode-map remains).
+(let ((mm-inline-text-html-with-w3m-keymap nil))
+  (notmuch-show-insert-part-*/* msg part content-type nth depth button
+
+;; These functions are used by notmuch-show--insert-part-text/html-shr
+(declare-function libxml-parse-html-region xml.c)
+(declare-function shr-insert-document shr)
+
+(defun notmuch-show--insert-part-text/html-shr (msg part)
+  ;; Make sure shr is loaded before we start let-binding its globals
+  (require 'shr)
+  (let ((dom (let ((process-crypto notmuch-show-process-crypto))
+  (with-temp-buffer
+(insert (notmuch-get-bodypart-text msg part process-crypto))
+(libxml-parse-html-region (point-min) (point-max)
+   (shr-content-function
+(lambda (url)
+  ;; shr strips the cid: part of URL, but doesn't
+  ;; URL-decode it (see RFC 2392).
+  (let ((cid (url-unhex-string url)))
+(first (notmuch-show--get-cid-content cid)
+   ;; Block all external images to prevent privacy leaks and
+   ;; potential attacks.  FIXME: If we block an image, offer a
+   ;; button to load external images.
+   (shr-blocked-images .))
+(shr-insert-document dom)
+t))
 
 (defun notmuch-show-insert-part-*/* (msg part content-type nth depth button)
   ;; This handler _must_ succeed - it is the handler of last resort.
-- 
2.1.3

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v2 2/8] emacs: Create an API for fetching parts as undecoded binary

2015-01-24 Thread Austin Clements
The new function, `notmuch-get-bodypart-binary', replaces
`notmuch-get-bodypart-internal'.  Whereas the old function was really
meant for internal use in `notmuch-get-bodypart-content', it was used
in a few other places.  Since the difference between
`notmuch-get-bodypart-content' and `notmuch-get-bodypart-internal' was
unclear, these other uses were always confusing and potentially
inconsistent.  The new call clearly requests the part as undecoded
binary.

This is step 1 of 2 in separating `notmuch-get-bodypart-content' into
two APIs for retrieving either undecoded binary or decoded text.
---
 emacs/notmuch-lib.el  | 28 ++--
 emacs/notmuch-show.el | 22 +-
 2 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index fd25f7c..d4b6684 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -529,25 +529,25 @@ (defun notmuch-parts-filter-by-type (parts type)
(lambda (part) (notmuch-match-content-type (plist-get part :content-type) 
type))
parts))
 
-;; Helper for parts which are generally not included in the default
-;; SEXP output.
-(defun notmuch-get-bodypart-internal (query part-number process-crypto)
-  (let ((args '(show --format=raw))
-   (part-arg (format --part=%s part-number)))
-(setq args (append args (list part-arg)))
-(if process-crypto
-   (setq args (append args '(--decrypt
-(setq args (append args (list query)))
+(defun notmuch-get-bodypart-binary (msg part process-crypto)
+  Return the unprocessed content of PART in MSG.
+
+This returns the \raw\ content of the given part after content
+transfer decoding, but with no further processing (see the
+discussion of --format=raw in man notmuch-show).  In particular,
+this does no charset conversion.
+  (let ((args `(show --format=raw
+   ,(format --part=%d (plist-get part :id))
+   ,@(when process-crypto '(--decrypt))
+   ,(notmuch-id-to-query (plist-get msg :id)
 (with-temp-buffer
   (let ((coding-system-for-read 'no-conversion))
-   (progn
- (apply 'call-process (append (list notmuch-command nil (list t nil) 
nil) args))
- (buffer-string))
+   (apply #'call-process notmuch-command nil '(t nil) nil args)
+   (buffer-string)
 
 (defun notmuch-get-bodypart-content (msg part process-crypto)
   (or (plist-get part :content)
-  (notmuch-get-bodypart-internal (notmuch-id-to-query (plist-get msg :id))
-(plist-get part :id) process-crypto)))
+  (notmuch-get-bodypart-binary msg part process-crypto)))
 
 ;; Workaround: The call to `mm-display-part' below triggers a bug in
 ;; Emacs 24 if it attempts to use the shr renderer to display an HTML
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index df2389e..b3e339e 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -579,16 +579,14 @@ (defun notmuch-show-w3m-cid-retrieve (url rest args)
(let* ((msg (nth 1 matching-part))
   (part (nth 2 matching-part))
   (content (nth 3 matching-part))
-  (message-id (plist-get msg :id))
-  (part-number (plist-get part :id))
   (content-type (plist-get part :content-type)))
  ;; If we don't already have the content, get it and cache
  ;; it, as some messages reference the same cid: part many
  ;; times (hundreds!), which results in many calls to
  ;; `notmuch part'.
  (unless content
-   (setq content (notmuch-get-bodypart-internal (notmuch-id-to-query 
message-id)
- part-number 
notmuch-show-process-crypto))
+   (setq content (notmuch-get-bodypart-binary
+  msg part notmuch-show-process-crypto))
(with-current-buffer w3m-current-buffer
  (notmuch-show-w3m-cid-store-internal url msg part content)))
  (insert content)
@@ -2162,15 +2160,14 @@ (defun notmuch-show-stash-git-send-email (optional 
no-in-reply-to)
 
 ;; Interactive part functions and their helpers
 
-(defun notmuch-show-generate-part-buffer (message-id nth)
+(defun notmuch-show-generate-part-buffer (msg part)
   Return a temporary buffer containing the specified part's content.
   (let ((buf (generate-new-buffer  *notmuch-part*))
(process-crypto notmuch-show-process-crypto))
 (with-current-buffer buf
-  (setq notmuch-show-process-crypto process-crypto)
-  ;; Always acquires the part via `notmuch part', even if it is
-  ;; available in the SEXP output.
-  (insert (notmuch-get-bodypart-internal message-id nth 
notmuch-show-process-crypto)))
+  ;; This is always used in the content of mm handles, which
+  ;; expect undecoded, binary part content.
+  (insert (notmuch-get-bodypart-binary msg part process-crypto)))
 buf))
 
 (defun 

[PATCH v2 3/8] emacs: Remove broken `notmuch-get-bodypart-content' API

2015-01-24 Thread Austin Clements
`notmuch-get-bodypart-content' could do two very different things,
depending on conditions: for text/* parts other than text/html, it
would return the part content as a multibyte Lisp string *after*
charset conversion, while for other parts (including text/html), it
would return binary part content without charset conversion.

This commit completes the split of `notmuch-get-bodypart-content' into
two different and explicit APIs: `notmuch-get-bodypart-binary' and
`notmuch-get-bodypart-text'.  It updates all callers to use one or the
other depending on what's appropriate.
---
 emacs/notmuch-lib.el  | 37 -
 emacs/notmuch-show.el |  8 
 2 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index d4b6684..a0a95d8 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -545,9 +545,25 @@ (defun notmuch-get-bodypart-binary (msg part 
process-crypto)
(apply #'call-process notmuch-command nil '(t nil) nil args)
(buffer-string)
 
-(defun notmuch-get-bodypart-content (msg part process-crypto)
-  (or (plist-get part :content)
-  (notmuch-get-bodypart-binary msg part process-crypto)))
+(defun notmuch-get-bodypart-text (msg part process-crypto)
+  Return the text content of PART in MSG.
+
+This returns the content of the given part as a multibyte Lisp
+string after performing content transfer decoding and any
+necessary charset decoding.  It is an error to use this for
+non-text/* parts.
+  (let ((content (plist-get part :content)))
+(when (not content)
+  ;; Use show --format=sexp to fetch decoded content
+  (let* ((args `(show --format=sexp --include-html
+,(format --part=%s (plist-get part :id))
+,@(when process-crypto '(--decrypt))
+,(notmuch-id-to-query (plist-get msg :id
+(npart (apply #'notmuch-call-notmuch-sexp args)))
+   (setq content (plist-get npart :content))
+   (when (not content)
+ (error Internal error: No :content from %S args
+content))
 
 ;; Workaround: The call to `mm-display-part' below triggers a bug in
 ;; Emacs 24 if it attempts to use the shr renderer to display an HTML
@@ -568,18 +584,21 @@ (defun notmuch-mm-display-part-inline (msg part 
content-type process-crypto)
 current buffer, if possible.
   (let ((display-buffer (current-buffer)))
 (with-temp-buffer
-  ;; In case there is :content, the content string is already converted
-  ;; into emacs internal format. `gnus-decoded' is a fake charset,
-  ;; which means no further decoding (to be done by mm- functions).
-  (let* ((charset (if (plist-member part :content)
- 'gnus-decoded
+  ;; In case we already have :content, use it and tell mm-* that
+  ;; it's already been charset-decoded by using the fake
+  ;; `gnus-decoded' charset.  Otherwise, we'll fetch the binary
+  ;; part content and let mm-* decode it.
+  (let* ((have-content (plist-member part :content))
+(charset (if have-content 'gnus-decoded
(plist-get part :content-charset)))
 (handle (mm-make-handle (current-buffer) `(,content-type (charset 
. ,charset)
;; If the user wants the part inlined, insert the content and
;; test whether we are able to inline it (which includes both
;; capability and suitability tests).
(when (mm-inlined-p handle)
- (insert (notmuch-get-bodypart-content msg part process-crypto))
+ (if have-content
+ (insert (notmuch-get-bodypart-text msg part process-crypto))
+   (insert (notmuch-get-bodypart-binary msg part process-crypto)))
  (when (mm-inlinable-p handle)
(set-buffer display-buffer)
(mm-display-part handle)
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index b3e339e..f29428a 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -702,7 +702,7 @@ (defun notmuch-show-insert-part-text/plain (msg part 
content-type nth depth butt
   (let ((start (if button
   (button-start button)
 (point
-(insert (notmuch-get-bodypart-content msg part 
notmuch-show-process-crypto))
+(insert (notmuch-get-bodypart-text msg part notmuch-show-process-crypto))
 (save-excursion
   (save-restriction
(narrow-to-region start (point-max))
@@ -711,9 +711,9 @@ (defun notmuch-show-insert-part-text/plain (msg part 
content-type nth depth butt
 
 (defun notmuch-show-insert-part-text/calendar (msg part content-type nth depth 
button)
   (insert (with-temp-buffer
-   (insert (notmuch-get-bodypart-content msg part 
notmuch-show-process-crypto))
-   ;; notmuch-get-bodypart-content provides raw, non-converted
-   ;; data. Replace CRLF with LF before icalendar can use it.
+   (insert (notmuch-get-bodypart-text msg part 

[PATCH v2 4/8] emacs: Return unibyte strings for binary part data

2015-01-24 Thread Austin Clements
Unibyte strings are meant for representing binary data.  In practice,
using unibyte versus multibyte strings affects *almost* nothing.  It
does happen to matter if we use the binary data in an image descriptor
(which is, helpfully, not documented anywhere and getting it wrong
results in opaque errors like Not a PNG image: giant binary spew
that is, in fact, a PNG image).
---
 emacs/notmuch-lib.el | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index a0a95d8..3154725 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -530,7 +530,7 @@ (defun notmuch-parts-filter-by-type (parts type)
parts))
 
 (defun notmuch-get-bodypart-binary (msg part process-crypto)
-  Return the unprocessed content of PART in MSG.
+  Return the unprocessed content of PART in MSG as a unibyte string.
 
 This returns the \raw\ content of the given part after content
 transfer decoding, but with no further processing (see the
@@ -541,6 +541,16 @@ (defun notmuch-get-bodypart-binary (msg part 
process-crypto)
,@(when process-crypto '(--decrypt))
,(notmuch-id-to-query (plist-get msg :id)
 (with-temp-buffer
+  ;; Emacs internally uses a UTF-8-like multibyte string
+  ;; representation by default (regardless of the coding system,
+  ;; which only affects how it goes from outside data to this
+  ;; internal representation).  This *almost* never matters.
+  ;; Annoyingly, it does matter if we use this data in an image
+  ;; descriptor, since Emacs will use its internal data buffer
+  ;; directly and this multibyte representation corrupts binary
+  ;; image formats.  Since the caller is asking for binary data, a
+  ;; unibyte string is a more appropriate representation anyway.
+  (set-buffer-multibyte nil)
   (let ((coding-system-for-read 'no-conversion))
(apply #'call-process notmuch-command nil '(t nil) nil args)
(buffer-string)
-- 
2.1.3

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v2 0/8] Improve charset and cid: handling

2015-01-24 Thread Austin Clements
This is v2 of id:1398105468-14317-1-git-send-email-amdra...@mit.edu.
This improves some comments/documentation, fixes a bug that caused
cryptographic processing to not happen on HTML parts, and addresses
some byte compiler warnings on Emacs 23.  This version has also been
rebased against the several months of changes that happened on master
since v1 (which, remarkably, was trivial).

The diff from v1 is below.

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 83cbf2f..f8e5165 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -535,7 +535,10 @@ (defun notmuch-get-bodypart-binary (msg part 
process-crypto optional cache)
 This returns the \raw\ content of the given part after content
 transfer decoding, but with no further processing (see the
 discussion of --format=raw in man notmuch-show).  In particular,
-this does no charset conversion.
+this does no charset conversion.
+
+If CACHE is non-nil, the content of this part will be saved in
+MSG (if it isn't already).
   (let ((data (plist-get part :binary-content)))
 (when (not data)
   (let ((args `(show --format=raw
@@ -558,6 +561,8 @@ (defun notmuch-get-bodypart-binary (msg part process-crypto 
optional cache)
(apply #'call-process notmuch-command nil '(t nil) nil args)
(setq data (buffer-string)
   (when cache
+   ;; Cheat.  part is non-nil, and `plist-put' always modifies
+   ;; the list in place if it's non-nil.
(plist-put part :binary-content data)))
 data))
 
@@ -567,7 +572,10 @@ (defun notmuch-get-bodypart-text (msg part process-crypto 
optional cache)
 This returns the content of the given part as a multibyte Lisp
 string after performing content transfer decoding and any
 necessary charset decoding.  It is an error to use this for
-non-text/* parts.
+non-text/* parts.
+
+If CACHE is non-nil, the content of this part will be saved in
+MSG (if it isn't already).
   (let ((content (plist-get part :content)))
 (when (not content)
   ;; Use show --format=sexp to fetch decoded content
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index d190711..66350d4 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -781,10 +781,14 @@ (defun notmuch-show-insert-part-text/html (msg part 
content-type nth depth butto
 (let ((mm-inline-text-html-with-w3m-keymap nil))
   (notmuch-show-insert-part-*/* msg part content-type nth depth button
 
+;; These functions are used by notmuch-show--insert-part-text/html-shr
+(declare-function libxml-parse-html-region xml.c)
+(declare-function shr-insert-document shr)
+
 (defun notmuch-show--insert-part-text/html-shr (msg part)
   ;; Make sure shr is loaded before we start let-binding its globals
   (require 'shr)
-  (let ((dom (let (process-crypto notmuch-show-process-crypto)
+  (let ((dom (let ((process-crypto notmuch-show-process-crypto))
   (with-temp-buffer
 (insert (notmuch-get-bodypart-text msg part process-crypto))
 (libxml-parse-html-region (point-min) (point-max)


___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v2 6/8] emacs: Use generalized content caching in w3m CID code

2015-01-24 Thread Austin Clements
Previously this did its own caching, but this is now supported by more
generally by `notmuch-get-bodypart-binary'.
---
 emacs/notmuch-show.el | 23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index f29428a..11eac5f 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -562,15 +562,14 @@ (defvar w3m-current-buffer) ;; From `w3m.el'.
 (defvar notmuch-show-w3m-cid-store nil)
 (make-variable-buffer-local 'notmuch-show-w3m-cid-store)
 
-(defun notmuch-show-w3m-cid-store-internal (content-id msg part content)
-  (push (list content-id msg part content)
-   notmuch-show-w3m-cid-store))
+(defun notmuch-show-w3m-cid-store-internal (content-id msg part)
+  (push (list content-id msg part) notmuch-show-w3m-cid-store))
 
 (defun notmuch-show-w3m-cid-store (msg part)
   (let ((content-id (plist-get part :content-id)))
 (when content-id
   (notmuch-show-w3m-cid-store-internal (concat cid: content-id)
-  msg part nil
+  msg part
 
 (defun notmuch-show-w3m-cid-retrieve (url rest args)
   (let ((matching-part (with-current-buffer w3m-current-buffer
@@ -578,18 +577,12 @@ (defun notmuch-show-w3m-cid-retrieve (url rest args)
 (if matching-part
(let* ((msg (nth 1 matching-part))
   (part (nth 2 matching-part))
-  (content (nth 3 matching-part))
   (content-type (plist-get part :content-type)))
- ;; If we don't already have the content, get it and cache
- ;; it, as some messages reference the same cid: part many
- ;; times (hundreds!), which results in many calls to
- ;; `notmuch part'.
- (unless content
-   (setq content (notmuch-get-bodypart-binary
-  msg part notmuch-show-process-crypto))
-   (with-current-buffer w3m-current-buffer
- (notmuch-show-w3m-cid-store-internal url msg part content)))
- (insert content)
+ ;; Request content caching, as some messages reference the
+ ;; same cid: part many times (hundreds!), which results in
+ ;; many calls to `notmuch show'.
+ (insert (notmuch-get-bodypart-binary
+  msg part notmuch-show-process-crypto 'cache))
  content-type)
   nil)))
 
-- 
2.1.3

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v2 1/8] emacs: Track full message and part descriptor in w3m CID store

2015-01-24 Thread Austin Clements
This will simplify later changes.
---
 emacs/notmuch-show.el | 33 ++---
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 87b4881..df2389e 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -562,35 +562,26 @@ (defvar w3m-current-buffer) ;; From `w3m.el'.
 (defvar notmuch-show-w3m-cid-store nil)
 (make-variable-buffer-local 'notmuch-show-w3m-cid-store)
 
-(defun notmuch-show-w3m-cid-store-internal (content-id
-   message-id
-   part-number
-   content-type
-   content)
-  (push (list content-id
- message-id
- part-number
- content-type
- content)
+(defun notmuch-show-w3m-cid-store-internal (content-id msg part content)
+  (push (list content-id msg part content)
notmuch-show-w3m-cid-store))
 
 (defun notmuch-show-w3m-cid-store (msg part)
   (let ((content-id (plist-get part :content-id)))
 (when content-id
   (notmuch-show-w3m-cid-store-internal (concat cid: content-id)
-  (plist-get msg :id)
-  (plist-get part :id)
-  (plist-get part :content-type)
-  nil
+  msg part nil
 
 (defun notmuch-show-w3m-cid-retrieve (url rest args)
   (let ((matching-part (with-current-buffer w3m-current-buffer
 (assoc url notmuch-show-w3m-cid-store
 (if matching-part
-   (let ((message-id (nth 1 matching-part))
- (part-number (nth 2 matching-part))
- (content-type (nth 3 matching-part))
- (content (nth 4 matching-part)))
+   (let* ((msg (nth 1 matching-part))
+  (part (nth 2 matching-part))
+  (content (nth 3 matching-part))
+  (message-id (plist-get msg :id))
+  (part-number (plist-get part :id))
+  (content-type (plist-get part :content-type)))
  ;; If we don't already have the content, get it and cache
  ;; it, as some messages reference the same cid: part many
  ;; times (hundreds!), which results in many calls to
@@ -599,11 +590,7 @@ (defun notmuch-show-w3m-cid-retrieve (url rest args)
(setq content (notmuch-get-bodypart-internal (notmuch-id-to-query 
message-id)
  part-number 
notmuch-show-process-crypto))
(with-current-buffer w3m-current-buffer
- (notmuch-show-w3m-cid-store-internal url
-  message-id
-  part-number
-  content-type
-  content)))
+ (notmuch-show-w3m-cid-store-internal url msg part content)))
  (insert content)
  content-type)
   nil)))
-- 
2.1.3

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v2 5/8] emacs: Support caching in notmuch-get-bodypart-{binary, text}

2015-01-24 Thread Austin Clements
(The actual code change here is small, but requires re-indenting
existing code.)
---
 emacs/notmuch-lib.el | 64 
 1 file changed, 40 insertions(+), 24 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 3154725..f8e5165 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -529,39 +529,53 @@ (defun notmuch-parts-filter-by-type (parts type)
(lambda (part) (notmuch-match-content-type (plist-get part :content-type) 
type))
parts))
 
-(defun notmuch-get-bodypart-binary (msg part process-crypto)
+(defun notmuch-get-bodypart-binary (msg part process-crypto optional cache)
   Return the unprocessed content of PART in MSG as a unibyte string.
 
 This returns the \raw\ content of the given part after content
 transfer decoding, but with no further processing (see the
 discussion of --format=raw in man notmuch-show).  In particular,
-this does no charset conversion.
-  (let ((args `(show --format=raw
-   ,(format --part=%d (plist-get part :id))
-   ,@(when process-crypto '(--decrypt))
-   ,(notmuch-id-to-query (plist-get msg :id)
-(with-temp-buffer
-  ;; Emacs internally uses a UTF-8-like multibyte string
-  ;; representation by default (regardless of the coding system,
-  ;; which only affects how it goes from outside data to this
-  ;; internal representation).  This *almost* never matters.
-  ;; Annoyingly, it does matter if we use this data in an image
-  ;; descriptor, since Emacs will use its internal data buffer
-  ;; directly and this multibyte representation corrupts binary
-  ;; image formats.  Since the caller is asking for binary data, a
-  ;; unibyte string is a more appropriate representation anyway.
-  (set-buffer-multibyte nil)
-  (let ((coding-system-for-read 'no-conversion))
-   (apply #'call-process notmuch-command nil '(t nil) nil args)
-   (buffer-string)
-
-(defun notmuch-get-bodypart-text (msg part process-crypto)
+this does no charset conversion.
+
+If CACHE is non-nil, the content of this part will be saved in
+MSG (if it isn't already).
+  (let ((data (plist-get part :binary-content)))
+(when (not data)
+  (let ((args `(show --format=raw
+   ,(format --part=%d (plist-get part :id))
+   ,@(when process-crypto '(--decrypt))
+   ,(notmuch-id-to-query (plist-get msg :id)
+   (with-temp-buffer
+ ;; Emacs internally uses a UTF-8-like multibyte string
+ ;; representation by default (regardless of the coding
+ ;; system, which only affects how it goes from outside data
+ ;; to this internal representation).  This *almost* never
+ ;; matters.  Annoyingly, it does matter if we use this data
+ ;; in an image descriptor, since Emacs will use its internal
+ ;; data buffer directly and this multibyte representation
+ ;; corrupts binary image formats.  Since the caller is
+ ;; asking for binary data, a unibyte string is a more
+ ;; appropriate representation anyway.
+ (set-buffer-multibyte nil)
+ (let ((coding-system-for-read 'no-conversion))
+   (apply #'call-process notmuch-command nil '(t nil) nil args)
+   (setq data (buffer-string)
+  (when cache
+   ;; Cheat.  part is non-nil, and `plist-put' always modifies
+   ;; the list in place if it's non-nil.
+   (plist-put part :binary-content data)))
+data))
+
+(defun notmuch-get-bodypart-text (msg part process-crypto optional cache)
   Return the text content of PART in MSG.
 
 This returns the content of the given part as a multibyte Lisp
 string after performing content transfer decoding and any
 necessary charset decoding.  It is an error to use this for
-non-text/* parts.
+non-text/* parts.
+
+If CACHE is non-nil, the content of this part will be saved in
+MSG (if it isn't already).
   (let ((content (plist-get part :content)))
 (when (not content)
   ;; Use show --format=sexp to fetch decoded content
@@ -572,7 +586,9 @@ (defun notmuch-get-bodypart-text (msg part process-crypto)
 (npart (apply #'notmuch-call-notmuch-sexp args)))
(setq content (plist-get npart :content))
(when (not content)
- (error Internal error: No :content from %S args
+ (error Internal error: No :content from %S args)))
+  (when cache
+   (plist-put part :content content)))
 content))
 
 ;; Workaround: The call to `mm-display-part' below triggers a bug in
-- 
2.1.3

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API

2015-01-24 Thread Austin Clements
On Fri, 11 Jul 2014, David Bremner da...@tethera.net wrote:
 Austin Clements amdra...@mit.edu writes:

 +This returns the content of the given part as a multibyte Lisp

 What does multibyte mean here? utf8? current encoding?

Elisp has two kinds of stings: unibyte strings and multibyte
strings.

  
https://www.gnu.org/software/emacs/manual/html_node/elisp/Non_002dASCII-in-Strings.html

You can think of unibyte strings as binary data; they're just vectors of
bytes without any particular encoding semantics (though when you use a
unibyte string you can endow it with encoding).  Multibyte strings,
however, are text; they're vectors of Unicode code points.

 +string after performing content transfer decoding and any
 +necessary charset decoding.  It is an error to use this for
 +non-text/* parts.
 +  (let ((content (plist-get part :content)))
 +(when (not content)
 +  ;; Use show --format=sexp to fetch decoded content
 +  (let* ((args `(show --format=sexp --include-html
 + ,(format --part=%s (plist-get part :id))
 + ,@(when process-crypto '(--decrypt))
 + ,(notmuch-id-to-query (plist-get msg :id
 + (npart (apply #'notmuch-call-notmuch-sexp args)))
 +(setq content (plist-get npart :content))
 +(when (not content)
 +  (error Internal error: No :content from %S args
 +content))

 I'm a bit curious at the lack of setting coding-system-for-read here.
 Are we assuming the user has their environment set up correctly? Not so
 much a criticism as being nervous about everything coding-system
 related.

That is interesting.  coding-system-for-read should really go in
notmuch-call-notmuch-sexp, but I worry that, while *almost* all strings
the CLI outputs are UTF-8, not quite all of them are.  For example, we
output filenames exactly at the OS reports the bytes to us (which is
necessary, in a sense, because POSIX enforces no particular encoding on
file names, but still really unfortunate).

We could set coding-system-for-read, but a full solution needs more
cooperation from the CLI.  Possibly the right answer, at least for the
sexp format, is to do our own UTF-8 to \u escapes for strings that
are known to be UTF-8 and leave the raw bytes for the few that aren't.
Then we would set the coding-system-for-read to 'no-conversion and I
think everything would Just Work.

That doesn't help for JSON, which is supposed to be all UTF-8 all the
time.  I can think of solutions there, but they're all ugly and involve
things like encoding filenames as base64 when they aren't valid UTF-8.

So...  I don't think I'm going to do anything about this at this moment.

 I didn't see anything else to object to in this patch or the previous
 one.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 00/11] Improve charset and cid: handling

2015-01-24 Thread Austin Clements
I added declare-functions for both of these, which should take care of
the Emacs 23 warnings and be more robust on Emacs 24.  We can't reach
the function that calls these unless shr is actually available, but the
byte compiler doesn't know that.

On Sat, 26 Apr 2014, Mark Walters markwalters1...@gmail.com wrote:
 Aside from the minor comments I mentioned in previous emails and one
 more comment below this looks good. 

 The extra comment is that on emacs23 I get the following when compiling:

 In end of data:
 notmuch-show.el:2188:1:Warning: the following functions are not known to be
 defined: libxml-parse-html-region, shr-insert-document

 Finally, I have not really tested it as I mainly use emacs23

 Best wishes

 Mark




 On Mon, 21 Apr 2014, Austin Clements amdra...@mit.edu wrote:
 I set out to quickly add support for cid: links in the shr renderer
 and wound up making our charset handling more robust and rewriting our
 content-ID handling.  The test introduced in patch 2 passes in all but
 one really obscure case, but only because of many unwritten and
 potentially fragile assumptions that Emacs and the CLI make about each
 other.

 The first three patches could reasonably go in to 0.18.  The rest of
 this series is certainly post-0.18, but I didn't want to lose track of
 it.

 This series comes in three stages.  Each depends on the earlier ones,
 but each prefix makes sense on its own and could be pushed without the
 later stages.

 Patch 1 is a simple clean up patch.

 Patches 2 through 7 robust-ify our charset handling in Emacs, mostly
 by splitting the broken `notmuch-get-bodypart-content' API into
 `notmuch-get-bodypart-binary' and `notmuch-get-bodypart-text' so a
 caller can explicitly convey their requirements.

 The remaining patches improve our content-ID handling and add support
 for cid: links for shr.

 ___
 notmuch mailing list
 notmuch@notmuchmail.org
 http://notmuchmail.org/mailman/listinfo/notmuch
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


privacy problem: text/html parts pull in network resources

2015-01-21 Thread Austin Clements
Quoth Daniel Kahn Gillmor on Jan 21 at  4:36 pm:
> On Wed 2015-01-21 16:14:07 -0500, Austin Clements wrote:
> > I have a fix for this on shr buried deep in an old patch series that I
> > never got back to: id:1398105468-14317-12-git-send-email-amdragon at mit.edu
> >
> > For shr, the key is to set shr-blocked-images to ".".
> 
> I've just done this, but it doesn't seem to help.
> 
> > However, IIRC, in the current notmuch message rendering pipeline, mm
> > overrides this variable with something computed from
> > gnus-blocked-images.  That said, I'm not sure why gnus-blocked-images
> > isn't *already* taking care of this, but that's probably the place to
> > start digging.
> 
> gnus-blocked-images is set for me to the function
> gnus-block-private-groups, but i don't know what that is (the function
> is undocumented afaict).  Setting gnus-blocked-images to a regexp of "."
> seems to work for me, though.

In notmuch, mm will wind up calling (gnus-block-private-groups nil).
Unfortunately, gnus apparently considers nil to be a news group rather
than a "private group" (gnus speak for email, I think), so
gnus-block-private-groups returns nil (meaning *don't* block images)
rather than ".".

Probably notmuch should override the gnus-blocked-images variable,
since the default value is simply wrong for notmuch.  Maybe something
along the lines of the following should go around our text/html
handler?

  (let ((gnus-blocked-images
 (if (eq gnus-blocked-images 'gnus-block-private-groups)
 ;; mm uses gnus-blocked-images to control image loading.
 ;; However, the default value of gnus-blocked-images
 ;; doesn't work for notmuch because
 ;; gnus-block-private-groups depends on gnus variables we
 ;; don't set.  Override it to disallow network image
 ;; loading.
 "."
   ;; Use the user's customized value.
   gnus-blocked-images)))
...)

Long live abstraction!


privacy problem: text/html parts pull in network resources

2015-01-21 Thread Austin Clements
I have a fix for this on shr buried deep in an old patch series that I
never got back to: id:1398105468-14317-12-git-send-email-amdragon at mit.edu

For shr, the key is to set shr-blocked-images to ".".  However, IIRC,
in the current notmuch message rendering pipeline, mm overrides this
variable with something computed from gnus-blocked-images.  That said,
I'm not sure why gnus-blocked-images isn't *already* taking care of
this, but that's probably the place to start digging.

Quoth Daniel Kahn Gillmor on Jan 21 at  4:00 pm:
> If i send a message with a text/html part (either it's only text/html,
> or all parts are rendered, or it's multipart/alternative with only a
> text/html subpart) and that HTML has  src="http://example.org/test.png"/> in it, then notmuch will make a
> network request for that image.
> 
> This is a privacy disaster, because it enables an e-mail sender to use
> "web bugs" to tell when a given notmuch user has opened their e-mail.
> 
> It's also a bit of a consistency/storage/indexing disaster because it
> means that what you see when you open a given message will change
> depending on the network environment you're in when you open it.
> 
> It's also potentially a security problem because it means that anyone in
> control of the remote server (or the network between you and the remote
> server if the image isn't sourced over https) can feed arbitrary data
> into whatever emacs image rendering library is being used.  (granted,
> this is not a unique problem because this can already be done by the
> original message sender with a multipart/mixed message, but it's an
> additional exposure of attack surface)
> 
> I just raised this on #notmuch, and i don't have the time or the
> knowledge to look into it now, but i think the defaults here need to be
> to avoid network access entirely unless the user explicitly requests it.
> 
>--dkg



> ___
> notmuch mailing list
> notmuch at notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch



Re: privacy problem: text/html parts pull in network resources

2015-01-21 Thread Austin Clements
I have a fix for this on shr buried deep in an old patch series that I
never got back to: id:1398105468-14317-12-git-send-email-amdra...@mit.edu

For shr, the key is to set shr-blocked-images to ..  However, IIRC,
in the current notmuch message rendering pipeline, mm overrides this
variable with something computed from gnus-blocked-images.  That said,
I'm not sure why gnus-blocked-images isn't *already* taking care of
this, but that's probably the place to start digging.

Quoth Daniel Kahn Gillmor on Jan 21 at  4:00 pm:
 If i send a message with a text/html part (either it's only text/html,
 or all parts are rendered, or it's multipart/alternative with only a
 text/html subpart) and that HTML has img
 src=http://example.org/test.png/ in it, then notmuch will make a
 network request for that image.
 
 This is a privacy disaster, because it enables an e-mail sender to use
 web bugs to tell when a given notmuch user has opened their e-mail.
 
 It's also a bit of a consistency/storage/indexing disaster because it
 means that what you see when you open a given message will change
 depending on the network environment you're in when you open it.
 
 It's also potentially a security problem because it means that anyone in
 control of the remote server (or the network between you and the remote
 server if the image isn't sourced over https) can feed arbitrary data
 into whatever emacs image rendering library is being used.  (granted,
 this is not a unique problem because this can already be done by the
 original message sender with a multipart/mixed message, but it's an
 additional exposure of attack surface)
 
 I just raised this on #notmuch, and i don't have the time or the
 knowledge to look into it now, but i think the defaults here need to be
 to avoid network access entirely unless the user explicitly requests it.
 
--dkg



 ___
 notmuch mailing list
 notmuch@notmuchmail.org
 http://notmuchmail.org/mailman/listinfo/notmuch

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: privacy problem: text/html parts pull in network resources

2015-01-21 Thread Austin Clements
Quoth Daniel Kahn Gillmor on Jan 21 at  4:36 pm:
 On Wed 2015-01-21 16:14:07 -0500, Austin Clements wrote:
  I have a fix for this on shr buried deep in an old patch series that I
  never got back to: id:1398105468-14317-12-git-send-email-amdra...@mit.edu
 
  For shr, the key is to set shr-blocked-images to ..
 
 I've just done this, but it doesn't seem to help.
 
  However, IIRC, in the current notmuch message rendering pipeline, mm
  overrides this variable with something computed from
  gnus-blocked-images.  That said, I'm not sure why gnus-blocked-images
  isn't *already* taking care of this, but that's probably the place to
  start digging.
 
 gnus-blocked-images is set for me to the function
 gnus-block-private-groups, but i don't know what that is (the function
 is undocumented afaict).  Setting gnus-blocked-images to a regexp of .
 seems to work for me, though.

In notmuch, mm will wind up calling (gnus-block-private-groups nil).
Unfortunately, gnus apparently considers nil to be a news group rather
than a private group (gnus speak for email, I think), so
gnus-block-private-groups returns nil (meaning *don't* block images)
rather than ..

Probably notmuch should override the gnus-blocked-images variable,
since the default value is simply wrong for notmuch.  Maybe something
along the lines of the following should go around our text/html
handler?

  (let ((gnus-blocked-images
 (if (eq gnus-blocked-images 'gnus-block-private-groups)
 ;; mm uses gnus-blocked-images to control image loading.
 ;; However, the default value of gnus-blocked-images
 ;; doesn't work for notmuch because
 ;; gnus-block-private-groups depends on gnus variables we
 ;; don't set.  Override it to disallow network image
 ;; loading.
 .
   ;; Use the user's customized value.
   gnus-blocked-images)))
...)

Long live abstraction!
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[WIP PATCH 4/4] lib: Add "lastmod:" queries for filtering by last modification

2015-01-15 Thread Austin Clements
Quoth David Bremner on Jan 15 at 10:08 pm:
> Austin Clements  writes:
> 
> > From: Austin Clements 
> >
> > XXX Includes reference to notmuch search --db-revision, which doesn't
> > exist.
> 
> What would --db-revision do if it was implimented? Did you want to pass
> a UUID in?

Yes, exactly.  And have the CLI abort if the UUID didn't match
(meaning the lastmod values are no longer in the same sequence).

> d


[PATCH v2 2/5] Add the NOTMUCH_FEATURE_INDEXED_MIMETYPES database feature

2015-01-15 Thread Austin Clements
Just one nit. Otherwise this patch LGTM.

On January 15, 2015 12:20:08 PM EST, Jani Nikula  wrote:
>
>Austin, would you mind having a look at this one please?
>
>Thanks,
>Jani.
>
>On Wed, 14 Jan 2015, Todd  wrote:
>> ---
>>  lib/database-private.h | 15 ---
>>  lib/database.cc| 10 --
>>  2 files changed, 20 insertions(+), 5 deletions(-)
>>
>> diff --git a/lib/database-private.h b/lib/database-private.h
>> index 15e03cc..6d6fa2c 100644
>> --- a/lib/database-private.h
>> +++ b/lib/database-private.h
>> @@ -92,6 +92,14 @@ enum _notmuch_features {
>>   *
>>   * Introduced: version 3. */
>>  NOTMUCH_FEATURE_GHOSTS = 1 << 4,
>> +
>> +
>> +/* If set, then the database was created after the introduction
>of
>> + * indexed mime types. If unset, then the database may contain a
>> + * mixture of messages with indexed and non-indexed mime types.
>> + *
>> + * Introduced: version 3. */
>> +NOTMUCH_FEATURE_INDEXED_MIMETYPES = 1 << 5,
>>  };
>>  
>>  /* In C++, a named enum is its own type, so define bitwise operators
>> @@ -161,9 +169,10 @@ struct _notmuch_database {
>>  
>>  /* Current database features.  If any of these are missing from a
>>   * database, request an upgrade.
>> - * NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES is not included because
>> - * upgrade doesn't currently introduce the feature (though brand new
>> - * databases will have it). */
>> + * NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES and
>> + * NOTMUCH_FEATURE_INDEXED_MIMETYPES are not included because
>upgrade
>> + * doesn't currently introduce the features (though brand new
>databases
>> + * will have it). */
>>  #define NOTMUCH_FEATURES_CURRENT \
>>  (NOTMUCH_FEATURE_FILE_TERMS | NOTMUCH_FEATURE_DIRECTORY_DOCS | \
>>   NOTMUCH_FEATURE_BOOL_FOLDER | NOTMUCH_FEATURE_GHOSTS)
>> diff --git a/lib/database.cc b/lib/database.cc
>> index 3601f9d..2de60f8 100644
>> --- a/lib/database.cc
>> +++ b/lib/database.cc
>> @@ -304,6 +304,11 @@ static const struct {
>>"exact folder:/path: search", "rw" },
>>  { NOTMUCH_FEATURE_GHOSTS,
>>"mail documents for missing messages", "w"},
>> +/* Knowledge of the index mime-types are not required for
>reading
>> + * a database because a reader will just be unable to query
>> + * them. */
>> +{ NOTMUCH_FEATURE_INDEXED_MIMETYPES,
>> +  "mime-types in database", "w"},

I would label this "indexed MIME types" to be closer to the enum and because 
"MIME" is an acronym and hence should be capitalized.

>>  };
>>  
>>  const char *
>> @@ -646,9 +651,10 @@ notmuch_database_create (const char *path,
>notmuch_database_t **database)
>>  if (status)
>>  goto DONE;
>>  
>> -/* Upgrade doesn't add this feature to existing databases, but
>new
>> - * databases have it. */
>> +/* Upgrade doesn't add these feature to existing databases, but
>> + * new databases have them. */
>>  notmuch->features |= NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES;
>> +notmuch->features |= NOTMUCH_FEATURE_INDEXED_MIMETYPES;
>>  
>>  status = notmuch_database_upgrade (notmuch, NULL, NULL);
>>  if (status) {
>> -- 
>> 1.9.1
>>
>> ___
>> notmuch mailing list
>> notmuch at notmuchmail.org
>> http://notmuchmail.org/mailman/listinfo/notmuch



Re: [PATCH v2 2/5] Add the NOTMUCH_FEATURE_INDEXED_MIMETYPES database feature

2015-01-15 Thread Austin Clements
Just one nit. Otherwise this patch LGTM.

On January 15, 2015 12:20:08 PM EST, Jani Nikula j...@nikula.org wrote:

Austin, would you mind having a look at this one please?

Thanks,
Jani.

On Wed, 14 Jan 2015, Todd t...@electricoding.com wrote:
 ---
  lib/database-private.h | 15 ---
  lib/database.cc| 10 --
  2 files changed, 20 insertions(+), 5 deletions(-)

 diff --git a/lib/database-private.h b/lib/database-private.h
 index 15e03cc..6d6fa2c 100644
 --- a/lib/database-private.h
 +++ b/lib/database-private.h
 @@ -92,6 +92,14 @@ enum _notmuch_features {
   *
   * Introduced: version 3. */
  NOTMUCH_FEATURE_GHOSTS = 1  4,
 +
 +
 +/* If set, then the database was created after the introduction
of
 + * indexed mime types. If unset, then the database may contain a
 + * mixture of messages with indexed and non-indexed mime types.
 + *
 + * Introduced: version 3. */
 +NOTMUCH_FEATURE_INDEXED_MIMETYPES = 1  5,
  };
  
  /* In C++, a named enum is its own type, so define bitwise operators
 @@ -161,9 +169,10 @@ struct _notmuch_database {
  
  /* Current database features.  If any of these are missing from a
   * database, request an upgrade.
 - * NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES is not included because
 - * upgrade doesn't currently introduce the feature (though brand new
 - * databases will have it). */
 + * NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES and
 + * NOTMUCH_FEATURE_INDEXED_MIMETYPES are not included because
upgrade
 + * doesn't currently introduce the features (though brand new
databases
 + * will have it). */
  #define NOTMUCH_FEATURES_CURRENT \
  (NOTMUCH_FEATURE_FILE_TERMS | NOTMUCH_FEATURE_DIRECTORY_DOCS | \
   NOTMUCH_FEATURE_BOOL_FOLDER | NOTMUCH_FEATURE_GHOSTS)
 diff --git a/lib/database.cc b/lib/database.cc
 index 3601f9d..2de60f8 100644
 --- a/lib/database.cc
 +++ b/lib/database.cc
 @@ -304,6 +304,11 @@ static const struct {
exact folder:/path: search, rw },
  { NOTMUCH_FEATURE_GHOSTS,
mail documents for missing messages, w},
 +/* Knowledge of the index mime-types are not required for
reading
 + * a database because a reader will just be unable to query
 + * them. */
 +{ NOTMUCH_FEATURE_INDEXED_MIMETYPES,
 +  mime-types in database, w},

I would label this indexed MIME types to be closer to the enum and because 
MIME is an acronym and hence should be capitalized.

  };
  
  const char *
 @@ -646,9 +651,10 @@ notmuch_database_create (const char *path,
notmuch_database_t **database)
  if (status)
  goto DONE;
  
 -/* Upgrade doesn't add this feature to existing databases, but
new
 - * databases have it. */
 +/* Upgrade doesn't add these feature to existing databases, but
 + * new databases have them. */
  notmuch-features |= NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES;
 +notmuch-features |= NOTMUCH_FEATURE_INDEXED_MIMETYPES;
  
  status = notmuch_database_upgrade (notmuch, NULL, NULL);
  if (status) {
 -- 
 1.9.1

 ___
 notmuch mailing list
 notmuch@notmuchmail.org
 http://notmuchmail.org/mailman/listinfo/notmuch

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [WIP PATCH 4/4] lib: Add lastmod: queries for filtering by last modification

2015-01-15 Thread Austin Clements
Quoth David Bremner on Jan 15 at 10:08 pm:
 Austin Clements acleme...@csail.mit.edu writes:
 
  From: Austin Clements amdra...@mit.edu
 
  XXX Includes reference to notmuch search --db-revision, which doesn't
  exist.
 
 What would --db-revision do if it was implimented? Did you want to pass
 a UUID in?

Yes, exactly.  And have the CLI abort if the UUID didn't match
(meaning the lastmod values are no longer in the same sequence).

 d
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Negating searches in notmuch running via emacs?

2015-01-06 Thread Austin Clements
Hi David.

In general, "-" will negate a query term.  However, Xapian's query
syntax (to which notmuch is beholden) ignores "-" at the beginning of
a query (and immediately after an open paren).  Hence, if you want to
negate the first term in a query, you'll have to use "not" instead.

Hope that helps.

Quoth David Creelman on Jan 06 at  8:46 am:
> Hi Carl (et al),
> Thanks for notmuch. It's making my email setup a bit more efficient. I used 
> to be a filter to differnt
> mbox kind of person, but I'm a changed man now !!
> 
> Just wondered, is it possible to negate a search in notmuch for emacs?
> 
> I have a bunch of stuff that is in my inbox that I don't want to see and have 
> tried doing things like
> 
>   -googlealerts-noreply
>   -from:googlealerts-noreply
> 
> But it looks like the '-' is ignored
> 
> Is there a way of getting filtering? It looks like it's possible from the 
> command line application (with -
> and + on search terms).
> 
> Regards,
> David
> ___
> notmuch mailing list
> notmuch at notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch


Re: Negating searches in notmuch running via emacs?

2015-01-06 Thread Austin Clements
Hi David.

In general, - will negate a query term.  However, Xapian's query
syntax (to which notmuch is beholden) ignores - at the beginning of
a query (and immediately after an open paren).  Hence, if you want to
negate the first term in a query, you'll have to use not instead.

Hope that helps.

Quoth David Creelman on Jan 06 at  8:46 am:
 Hi Carl (et al),
 Thanks for notmuch. It's making my email setup a bit more efficient. I used 
 to be a filter to differnt
 mbox kind of person, but I'm a changed man now !!
 
 Just wondered, is it possible to negate a search in notmuch for emacs?
 
 I have a bunch of stuff that is in my inbox that I don't want to see and have 
 tried doing things like
 
   -googlealerts-noreply
   -from:googlealerts-noreply
 
 But it looks like the '-' is ignored
 
 Is there a way of getting filtering? It looks like it's possible from the 
 command line application (with -
 and + on search terms).
 
 Regards,
 David
 ___
 notmuch mailing list
 notmuch@notmuchmail.org
 http://notmuchmail.org/mailman/listinfo/notmuch
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] NEWS: Database version 3, API improvements, and ghost messages

2014-11-10 Thread Austin Clements
---
 NEWS | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/NEWS b/NEWS
index b30ed1b..7a121e4 100644
--- a/NEWS
+++ b/NEWS
@@ -36,9 +36,39 @@ Improved `q` binding in notmuch buffers
 Library changes
 ---

+Introduced database version 3 with support for "database features."
+
+  Features are independent aspects of the database schema.
+  Representing these independently of the database version number will
+  let us evolve the database format faster and more incrementally,
+  while maintaining better forwards and backwards compatibility.
+
+Library users are no longer required to call `notmuch_database_upgrade`
+
+  Previously, library users were required to call
+  `notmuch_database_needs_upgrade` and `notmuch_database_upgrade`
+  before using a writable database.  Even the CLI didn't get this
+  right, and it is no longer required.  Now, individual APIs may
+  return `NOTMUCH_STATUS_UPGRADE_REQUIRED` if the database format is
+  too out of date for that API.
+
+Library users can now abort an atomic section by closing the database
+
+  Previously there was no supported way to abort an atomic section.
+  Callers can now simply close the database, and any outstanding
+  atomic section will be aborted.
+
 Add return status to notmuch_database_close and
 notmuch_database_destroy

+Bug fixes and performance improvements for thread linking
+
+  The database now represents missing-but-referenced messages ("ghost
+  messages") similarly to how it represents regular messages.  This
+  enables an improved thread linking algorithm that performs better
+  and fixes a bug that sometimes prevented notmuch from linking
+  messages into the same thread.
+
 nmbug
 -

-- 
2.1.1



[PATCH] NEWS: Database version 3, API improvements, and ghost messages

2014-11-10 Thread Austin Clements
---
 NEWS | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/NEWS b/NEWS
index b30ed1b..7a121e4 100644
--- a/NEWS
+++ b/NEWS
@@ -36,9 +36,39 @@ Improved `q` binding in notmuch buffers
 Library changes
 ---
 
+Introduced database version 3 with support for database features.
+
+  Features are independent aspects of the database schema.
+  Representing these independently of the database version number will
+  let us evolve the database format faster and more incrementally,
+  while maintaining better forwards and backwards compatibility.
+
+Library users are no longer required to call `notmuch_database_upgrade`
+
+  Previously, library users were required to call
+  `notmuch_database_needs_upgrade` and `notmuch_database_upgrade`
+  before using a writable database.  Even the CLI didn't get this
+  right, and it is no longer required.  Now, individual APIs may
+  return `NOTMUCH_STATUS_UPGRADE_REQUIRED` if the database format is
+  too out of date for that API.
+
+Library users can now abort an atomic section by closing the database
+
+  Previously there was no supported way to abort an atomic section.
+  Callers can now simply close the database, and any outstanding
+  atomic section will be aborted.
+
 Add return status to notmuch_database_close and
 notmuch_database_destroy
 
+Bug fixes and performance improvements for thread linking
+
+  The database now represents missing-but-referenced messages (ghost
+  messages) similarly to how it represents regular messages.  This
+  enables an improved thread linking algorithm that performs better
+  and fixes a bug that sometimes prevented notmuch from linking
+  messages into the same thread.
+
 nmbug
 -
 
-- 
2.1.1

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v3.1 3/9] lib: Introduce macros for bit operations

2014-10-24 Thread Austin Clements
These macros help clarify basic bit-twiddling code and are written to
be robust against C undefined behavior of shift operators.
---
 lib/message.cc|  6 +++---
 lib/notmuch-private.h | 11 +++
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 38bc929..55d2ff6 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -869,7 +869,7 @@ notmuch_bool_t
 notmuch_message_get_flag (notmuch_message_t *message,
  notmuch_message_flag_t flag)
 {
-return message->flags & (1 << flag);
+return NOTMUCH_TEST_BIT (message->flags, flag);
 }

 void
@@ -877,9 +877,9 @@ notmuch_message_set_flag (notmuch_message_t *message,
  notmuch_message_flag_t flag, notmuch_bool_t enable)
 {
 if (enable)
-   message->flags |= (1 << flag);
+   NOTMUCH_SET_BIT (>flags, flag);
 else
-   message->flags &= ~(1 << flag);
+   NOTMUCH_CLEAR_BIT (>flags, flag);
 }

 time_t
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 36cc12b..b86897c 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -63,6 +63,17 @@ NOTMUCH_BEGIN_DECLS
 #define STRNCMP_LITERAL(var, literal) \
 strncmp ((var), (literal), sizeof (literal) - 1)

+/* Robust bit test/set/reset macros */
+#define NOTMUCH_TEST_BIT(val, bit) \
+(((bit) < 0 || (bit) >= CHAR_BIT * sizeof (unsigned long long)) ? 0
\
+ : !!((val) & (1ull << (bit
+#define NOTMUCH_SET_BIT(valp, bit) \
+(((bit) < 0 || (bit) >= CHAR_BIT * sizeof (unsigned long long)) ? *(valp) \
+ : (*(valp) |= (1ull << (bit
+#define NOTMUCH_CLEAR_BIT(valp,  bit) \
+(((bit) < 0 || (bit) >= CHAR_BIT * sizeof (unsigned long long)) ? *(valp) \
+ : (*(valp) &= ~(1ull << (bit
+
 #define unused(x) x __attribute__ ((unused))

 #ifdef __cplusplus
-- 
2.1.0



[PATCH v1 1/3] search: Seperately report matching and non-matching authors.

2014-10-24 Thread Austin Clements
Quoth Mark Walters on Oct 24 at 10:23 am:
> 
> Hi
> 
> I definitely like the idea: some comments below.

Agreed.

> On Fri, 24 Oct 2014, David Edmondson  wrote:
> > In addition to the :authors attribute of each search result, include
> > :authors_matched and :authors_non_matched attributes. Both attributes
> > are always included. If there are no non-matching authors, the
> > :authors_non_matched attribute is set to the empty string.
> 
> What about having both authors_matched and authors_not_matched as lists
> of authors (ie one string for each author)? Then emacs, for example,
> wouldn't try and parse the string back into authors before
> splitting. And authors_non_matched could be an empty list when
> appropriate which seems more natural than the empty string.
> 
> All the above is based on what a client might want in the output rather
> than what is easy or sensible to implement in the C code.

I was going to suggest exactly the same thing.

And I think there's a fairly easy way to do it in C code that will
also prevent library interface bloat: instead of introducing new
library APIs to get at this information, just use the existing
notmuch_thread_get_messages API and construct the matched and
non-matched lists in the CLI.  Doing it in the CLI wouldn't require
the library to export yet another string list structure, which is
always a huge pain (thanks C!), and wouldn't introduce more "helper"
functions into the library API.

I think the only disadvantage to doing this would be some duplication
of the {matched_,}authors_{array,hash} logic into the CLI, but those
are only there to support notmuch_thread_get_authors, which I think is
a huge hack and should go away in the long term.  (And, by doing this
list building in the CLI, we avoid further embellishing the
unnecessary "get thread authors" API).


Re: [PATCH v1 1/3] search: Seperately report matching and non-matching authors.

2014-10-24 Thread Austin Clements
Quoth Mark Walters on Oct 24 at 10:23 am:
 
 Hi
 
 I definitely like the idea: some comments below.

Agreed.

 On Fri, 24 Oct 2014, David Edmondson d...@dme.org wrote:
  In addition to the :authors attribute of each search result, include
  :authors_matched and :authors_non_matched attributes. Both attributes
  are always included. If there are no non-matching authors, the
  :authors_non_matched attribute is set to the empty string.
 
 What about having both authors_matched and authors_not_matched as lists
 of authors (ie one string for each author)? Then emacs, for example,
 wouldn't try and parse the string back into authors before
 splitting. And authors_non_matched could be an empty list when
 appropriate which seems more natural than the empty string.
 
 All the above is based on what a client might want in the output rather
 than what is easy or sensible to implement in the C code.

I was going to suggest exactly the same thing.

And I think there's a fairly easy way to do it in C code that will
also prevent library interface bloat: instead of introducing new
library APIs to get at this information, just use the existing
notmuch_thread_get_messages API and construct the matched and
non-matched lists in the CLI.  Doing it in the CLI wouldn't require
the library to export yet another string list structure, which is
always a huge pain (thanks C!), and wouldn't introduce more helper
functions into the library API.

I think the only disadvantage to doing this would be some duplication
of the {matched_,}authors_{array,hash} logic into the CLI, but those
are only there to support notmuch_thread_get_authors, which I think is
a huge hack and should go away in the long term.  (And, by doing this
list building in the CLI, we avoid further embellishing the
unnecessary get thread authors API).
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v3.1 3/9] lib: Introduce macros for bit operations

2014-10-24 Thread Austin Clements
These macros help clarify basic bit-twiddling code and are written to
be robust against C undefined behavior of shift operators.
---
 lib/message.cc|  6 +++---
 lib/notmuch-private.h | 11 +++
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 38bc929..55d2ff6 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -869,7 +869,7 @@ notmuch_bool_t
 notmuch_message_get_flag (notmuch_message_t *message,
  notmuch_message_flag_t flag)
 {
-return message-flags  (1  flag);
+return NOTMUCH_TEST_BIT (message-flags, flag);
 }
 
 void
@@ -877,9 +877,9 @@ notmuch_message_set_flag (notmuch_message_t *message,
  notmuch_message_flag_t flag, notmuch_bool_t enable)
 {
 if (enable)
-   message-flags |= (1  flag);
+   NOTMUCH_SET_BIT (message-flags, flag);
 else
-   message-flags = ~(1  flag);
+   NOTMUCH_CLEAR_BIT (message-flags, flag);
 }
 
 time_t
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 36cc12b..b86897c 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -63,6 +63,17 @@ NOTMUCH_BEGIN_DECLS
 #define STRNCMP_LITERAL(var, literal) \
 strncmp ((var), (literal), sizeof (literal) - 1)
 
+/* Robust bit test/set/reset macros */
+#define NOTMUCH_TEST_BIT(val, bit) \
+(((bit)  0 || (bit) = CHAR_BIT * sizeof (unsigned long long)) ? 0
\
+ : !!((val)  (1ull  (bit
+#define NOTMUCH_SET_BIT(valp, bit) \
+(((bit)  0 || (bit) = CHAR_BIT * sizeof (unsigned long long)) ? *(valp) \
+ : (*(valp) |= (1ull  (bit
+#define NOTMUCH_CLEAR_BIT(valp,  bit) \
+(((bit)  0 || (bit) = CHAR_BIT * sizeof (unsigned long long)) ? *(valp) \
+ : (*(valp) = ~(1ull  (bit
+
 #define unused(x) x __attribute__ ((unused))
 
 #ifdef __cplusplus
-- 
2.1.0

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v3 9/9] lib: Remove unnecessary thread linking steps when using ghost messages

2014-10-23 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

Previously, it was necessary to link new messages to children to work
around some (though not all) problems with the old metadata-based
approach to stored thread IDs.  With ghost messages, this is no longer
necessary, so don't bother with child linking when ghost messages are
in use.
---
 lib/database.cc | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index b673718..3601f9d 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -2136,11 +2136,11 @@ _consume_metadata_thread_id (void *ctx, 
notmuch_database_t *notmuch,
  * reference 'message'.
  *
  * In all cases, we assign to the current message the first thread ID
- * found (through either parent or child). We will also merge any
- * existing, distinct threads where this message belongs to both,
- * (which is not uncommon when messages are processed out of order).
+ * found. We will also merge any existing, distinct threads where this
+ * message belongs to both, (which is not uncommon when messages are
+ * processed out of order).
  *
- * Finally, if no thread ID has been found through parent or child, we
+ * Finally, if no thread ID has been found through referenced messages, we
  * call _notmuch_message_generate_thread_id to generate a new thread
  * ID. This should only happen for new, top-level messages, (no
  * References or In-Reply-To header in this message, and no previously
@@ -2172,10 +2172,23 @@ _notmuch_database_link_message (notmuch_database_t 
*notmuch,
 if (status)
goto DONE;

-status = _notmuch_database_link_message_to_children (notmuch, message,
-_id);
-if (status)
-   goto DONE;
+if (! (notmuch->features & NOTMUCH_FEATURE_GHOSTS)) {
+   /* In general, it shouldn't be necessary to link children,
+* since the earlier indexing of those children will have
+* stored a thread ID for the missing parent.  However, prior
+* to ghost messages, these stored thread IDs were NOT
+* rewritten during thread merging (and there was no
+* performant way to do so), so if indexed children were
+* pulled into a different thread ID by a merge, it was
+* necessary to pull them *back* into the stored thread ID of
+* the parent.  With ghost messages, we just rewrite the
+* stored thread IDs during merging, so this workaround isn't
+* necessary. */
+   status = _notmuch_database_link_message_to_children (notmuch, message,
+_id);
+   if (status)
+   goto DONE;
+}

 /* If not part of any existing thread, generate a new thread ID. */
 if (thread_id == NULL) {
-- 
2.1.0



[PATCH v3 8/9] test: Test upgrade to ghost messages feature

2014-10-23 Thread Austin Clements
---
 test/T530-upgrade.sh | 21 +
 1 file changed, 21 insertions(+)

diff --git a/test/T530-upgrade.sh b/test/T530-upgrade.sh
index c4c4ac8..6b42a69 100755
--- a/test/T530-upgrade.sh
+++ b/test/T530-upgrade.sh
@@ -116,4 +116,25 @@ MAIL_DIR/bar/new/21:2,
 MAIL_DIR/bar/new/22:2,
 MAIL_DIR/cur/51:2,"

+# Ghost messages are difficult to test since they're nearly invisible.
+# However, if the upgrade works correctly, the ghost message should
+# retain the right thread ID even if all of the original messages in
+# the thread are deleted.  That's what we test.  This won't detect if
+# the upgrade just plain didn't happen, but it should detect if
+# something went wrong.
+test_begin_subtest "ghost message retains thread ID"
+# Upgrade database
+notmuch new
+# Get thread ID of real message
+thread=$(notmuch search --output=threads id:4EFC743A.3060609 at april.org)
+# Delete all real messages in that thread
+rm $(notmuch search --output=files $thread)
+notmuch new
+# "Deliver" ghost message
+add_message '[subject]=Ghost' '[id]=4EFC3931.6030007 at april.org'
+# If the ghost upgrade worked, the new message should be attached to
+# the existing thread ID.
+nthread=$(notmuch search --output=threads id:4EFC3931.6030007 at april.org)
+test_expect_equal "$thread" "$nthread"
+
 test_done
-- 
2.1.0



[PATCH v3 7/9] lib: Enable ghost messages feature

2014-10-23 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

This fixes the broken thread order test.
---
 lib/database-private.h| 2 +-
 test/T260-thread-order.sh | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/database-private.h b/lib/database-private.h
index e2e4bc8..15e03cc 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -166,7 +166,7 @@ struct _notmuch_database {
  * databases will have it). */
 #define NOTMUCH_FEATURES_CURRENT \
 (NOTMUCH_FEATURE_FILE_TERMS | NOTMUCH_FEATURE_DIRECTORY_DOCS | \
- NOTMUCH_FEATURE_BOOL_FOLDER)
+ NOTMUCH_FEATURE_BOOL_FOLDER | NOTMUCH_FEATURE_GHOSTS)

 /* Return the list of terms from the given iterator matching a prefix.
  * The prefix will be stripped from the strings in the returned list.
diff --git a/test/T260-thread-order.sh b/test/T260-thread-order.sh
index b435d79..99f5833 100755
--- a/test/T260-thread-order.sh
+++ b/test/T260-thread-order.sh
@@ -30,7 +30,6 @@ expected=$(for ((i = 0; i < $nthreads; i++)); do
 test_expect_equal "$output" "$expected"

 test_begin_subtest "Messages with all parents get linked in all delivery 
orders"
-test_subtest_known_broken
 # Here we do the same thing as the previous test, but each message
 # references all of its parents.  Since every message references the
 # root of the thread, each thread should always be fully joined.  This
-- 
2.1.0



[PATCH v3 6/9] lib: Implement upgrade to ghost messages feature

2014-10-23 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

Somehow this is the first upgrade pass that actually does *any* error
checking, so this also adds the bit of necessary infrastructure to
handle that.
---
 lib/database.cc | 66 +++--
 1 file changed, 64 insertions(+), 2 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 92a92d9..b673718 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1231,6 +1231,7 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
 notmuch_bool_t timer_is_active = FALSE;
 enum _notmuch_features target_features, new_features;
 notmuch_status_t status;
+notmuch_private_status_t private_status;
 unsigned int count = 0, total = 0;

 status = _notmuch_database_ensure_writable (notmuch);
@@ -1275,6 +1276,13 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
for (t = db->allterms_begin ("XTIMESTAMP"); t != t_end; t++)
++total;
 }
+if (new_features & NOTMUCH_FEATURE_GHOSTS) {
+   /* The ghost message upgrade converts all thread_id_*
+* metadata values into ghost message documents. */
+   t_end = db->metadata_keys_end ("thread_id_");
+   for (t = db->metadata_keys_begin ("thread_id_"); t != t_end; ++t)
+   ++total;
+}

 /* Perform the upgrade in a transaction. */
 db->begin_transaction (true);
@@ -1378,10 +1386,64 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
}
 }

+/* Perform metadata upgrades. */
+
+/* Prior to NOTMUCH_FEATURE_GHOSTS, thread IDs for missing
+ * messages were stored as database metadata. Change these to
+ * ghost messages.
+ */
+if (new_features & NOTMUCH_FEATURE_GHOSTS) {
+   notmuch_message_t *message;
+   std::string message_id, thread_id;
+
+   t_end = db->metadata_keys_end (NOTMUCH_METADATA_THREAD_ID_PREFIX);
+   for (t = db->metadata_keys_begin (NOTMUCH_METADATA_THREAD_ID_PREFIX);
+t != t_end; ++t) {
+   if (do_progress_notify) {
+   progress_notify (closure, (double) count / total);
+   do_progress_notify = 0;
+   }
+
+   message_id = (*t).substr (
+   strlen (NOTMUCH_METADATA_THREAD_ID_PREFIX));
+   thread_id = db->get_metadata (*t);
+
+   /* Create ghost message */
+   message = _notmuch_message_create_for_message_id (
+   notmuch, message_id.c_str (), _status);
+   if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
+   /* Document already exists; ignore the stored thread ID */
+   } else if (private_status ==
+  NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+   private_status = _notmuch_message_initialize_ghost (
+   message, thread_id.c_str ());
+   if (! private_status)
+   _notmuch_message_sync (message);
+   }
+
+   if (private_status) {
+   fprintf (stderr,
+"Upgrade failed while creating ghost messages.\n");
+   status = COERCE_STATUS (private_status, "Unexpected status from 
_notmuch_message_initialize_ghost");
+   goto DONE;
+   }
+
+   /* Clear saved metadata thread ID */
+   db->set_metadata (*t, "");
+
+   ++count;
+   }
+}
+
+status = NOTMUCH_STATUS_SUCCESS;
 db->set_metadata ("features", _print_features (local, notmuch->features));
 db->set_metadata ("version", STRINGIFY (NOTMUCH_DATABASE_VERSION));

-db->commit_transaction ();
+ DONE:
+if (status == NOTMUCH_STATUS_SUCCESS)
+   db->commit_transaction ();
+else
+   db->cancel_transaction ();

 if (timer_is_active) {
/* Now stop the timer. */
@@ -1397,7 +1459,7 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
 }

 talloc_free (local);
-return NOTMUCH_STATUS_SUCCESS;
+return status;
 }

 notmuch_status_t
-- 
2.1.0



[PATCH v3 5/9] lib: Implement ghost-based thread linking

2014-10-23 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

This updates the thread linking code to use ghost messages instead of
user metadata to link messages into threads.

In contrast with the old approach, this is actually correct.
Previously, thread merging updated only the thread IDs of message
documents, not thread IDs stored in user metadata.  As originally
diagnosed by Mark Walters [1] and as demonstrated by the broken
T260-thread-order test, this can cause notmuch to fail to link
messages even though they're in the same thread.  In principle the old
approach could have been fixed by updating the user metadata thread
IDs as well, but these are not indexed and hence this would have
required a full scan of all stored thread IDs.  Ghost messages solve
this problem naturally by reusing the exact same thread ID and message
ID representation and indexing as regular messages.

Furthermore, thanks to this greater symmetry, ghost messages are also
algorithmically simpler.  We continue to support the old user metadata
format, so this patch can't delete any code, but when we do remove
support for the old format, several functions can simply be deleted.

[1] id:8738h7kv2q.fsf at qmul.ac.uk
---
 lib/database.cc | 99 +++--
 1 file changed, 83 insertions(+), 16 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index c641bcd..92a92d9 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1752,6 +1752,12 @@ _get_metadata_thread_id_key (void *ctx, const char 
*message_id)
message_id);
 }

+static notmuch_status_t
+_resolve_message_id_to_thread_id_old (notmuch_database_t *notmuch,
+ void *ctx,
+ const char *message_id,
+ const char **thread_id_ret);
+
 /* Find the thread ID to which the message with 'message_id' belongs.
  *
  * Note: 'thread_id_ret' must not be NULL!
@@ -1760,9 +1766,9 @@ _get_metadata_thread_id_key (void *ctx, const char 
*message_id)
  *
  * Note: If there is no message in the database with the given
  * 'message_id' then a new thread_id will be allocated for this
- * message and stored in the database metadata, (where this same
+ * message ID and stored in the database metadata so that the
  * thread ID can be looked up if the message is added to the database
- * later).
+ * later.
  */
 static notmuch_status_t
 _resolve_message_id_to_thread_id (notmuch_database_t *notmuch,
@@ -1770,6 +1776,49 @@ _resolve_message_id_to_thread_id (notmuch_database_t 
*notmuch,
  const char *message_id,
  const char **thread_id_ret)
 {
+notmuch_private_status_t status;
+notmuch_message_t *message;
+
+if (! (notmuch->features & NOTMUCH_FEATURE_GHOSTS))
+   return _resolve_message_id_to_thread_id_old (notmuch, ctx, message_id,
+thread_id_ret);
+
+/* Look for this message (regular or ghost) */
+message = _notmuch_message_create_for_message_id (
+   notmuch, message_id, );
+if (status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
+   /* Message exists */
+   *thread_id_ret = talloc_steal (
+   ctx, notmuch_message_get_thread_id (message));
+} else if (status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+   /* Message did not exist.  Give it a fresh thread ID and
+* populate this message as a ghost message. */
+   *thread_id_ret = talloc_strdup (
+   ctx, _notmuch_database_generate_thread_id (notmuch));
+   if (! *thread_id_ret) {
+   status = NOTMUCH_PRIVATE_STATUS_OUT_OF_MEMORY;
+   } else {
+   status = _notmuch_message_initialize_ghost (message, 
*thread_id_ret);
+   if (status == 0)
+   /* Commit the new ghost message */
+   _notmuch_message_sync (message);
+   }
+} else {
+   /* Create failed. Fall through. */
+}
+
+notmuch_message_destroy (message);
+
+return COERCE_STATUS (status, "Error creating ghost message");
+}
+
+/* Pre-ghost messages _resolve_message_id_to_thread_id */
+static notmuch_status_t
+_resolve_message_id_to_thread_id_old (notmuch_database_t *notmuch,
+ void *ctx,
+ const char *message_id,
+ const char **thread_id_ret)
+{
 notmuch_status_t status;
 notmuch_message_t *message;
 string thread_id_string;
@@ -2007,13 +2056,16 @@ _consume_metadata_thread_id (void *ctx, 
notmuch_database_t *notmuch,
 }
 }

-/* Given a (mostly empty) 'message' and its corresponding
+/* Given a blank or ghost 'message' and its corresponding
  * 'message_file' link it to existing threads in the database.
  *
- * The first check is in the metadata of the database to see if we
- * have pre-allocated a thread_id in advance for this message

[PATCH v3 4/9] lib: Internal support for querying and creating ghost messages

2014-10-23 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

This updates the message abstraction to support ghost messages: it
adds a message flag that distinguishes regular messages from ghost
messages, and an internal function for initializing a newly created
(blank) message as a ghost message.
---
 lib/message.cc| 52 +--
 lib/notmuch-private.h |  4 
 lib/notmuch.h |  9 -
 3 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 55d2ff6..a7a13cc 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -39,6 +39,9 @@ struct visible _notmuch_message {
 notmuch_message_file_t *message_file;
 notmuch_message_list_t *replies;
 unsigned long flags;
+/* For flags that are initialized on-demand, lazy_flags indicates
+ * if each flag has been initialized. */
+unsigned long lazy_flags;

 Xapian::Document doc;
 Xapian::termcount termpos;
@@ -99,6 +102,7 @@ _notmuch_message_create_for_document (const void 
*talloc_owner,

 message->frozen = 0;
 message->flags = 0;
+message->lazy_flags = 0;

 /* Each of these will be lazily created as needed. */
 message->message_id = NULL;
@@ -192,7 +196,7 @@ _notmuch_message_create (const void *talloc_owner,
  *
  * There is already a document with message ID 'message_id' in the
  * database. The returned message can be used to query/modify the
- * document.
+ * document. The message may be a ghost message.
  *
  *   NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND:
  *
@@ -305,6 +309,7 @@ _notmuch_message_ensure_metadata (notmuch_message_t 
*message)
 const char *thread_prefix = _find_prefix ("thread"),
*tag_prefix = _find_prefix ("tag"),
*id_prefix = _find_prefix ("id"),
+   *type_prefix = _find_prefix ("type"),
*filename_prefix = _find_prefix ("file-direntry"),
*replyto_prefix = _find_prefix ("replyto");

@@ -337,10 +342,25 @@ _notmuch_message_ensure_metadata (notmuch_message_t 
*message)
message->message_id =
_notmuch_message_get_term (message, i, end, id_prefix);

+/* Get document type */
+assert (strcmp (id_prefix, type_prefix) < 0);
+if (! NOTMUCH_TEST_BIT (message->lazy_flags, NOTMUCH_MESSAGE_FLAG_GHOST)) {
+   i.skip_to (type_prefix);
+   /* "T" is the prefix "type" fields.  See
+* BOOLEAN_PREFIX_INTERNAL. */
+   if (*i == "Tmail")
+   NOTMUCH_CLEAR_BIT (>flags, NOTMUCH_MESSAGE_FLAG_GHOST);
+   else if (*i == "Tghost")
+   NOTMUCH_SET_BIT (>flags, NOTMUCH_MESSAGE_FLAG_GHOST);
+   else
+   INTERNAL_ERROR ("Message without type term");
+   NOTMUCH_SET_BIT (>lazy_flags, NOTMUCH_MESSAGE_FLAG_GHOST);
+}
+
 /* Get filename list.  Here we get only the terms.  We lazily
  * expand them to full file names when needed in
  * _notmuch_message_ensure_filename_list. */
-assert (strcmp (id_prefix, filename_prefix) < 0);
+assert (strcmp (type_prefix, filename_prefix) < 0);
 if (!message->filename_term_list && !message->filename_list)
message->filename_term_list =
_notmuch_database_get_terms_with_prefix (message, i, end,
@@ -371,6 +391,11 @@ _notmuch_message_invalidate_metadata (notmuch_message_t 
*message,
message->tag_list = NULL;
 }

+if (strcmp ("type", prefix_name) == 0) {
+   NOTMUCH_CLEAR_BIT (>flags, NOTMUCH_MESSAGE_FLAG_GHOST);
+   NOTMUCH_CLEAR_BIT (>lazy_flags, NOTMUCH_MESSAGE_FLAG_GHOST);
+}
+
 if (strcmp ("file-direntry", prefix_name) == 0) {
talloc_free (message->filename_term_list);
talloc_free (message->filename_list);
@@ -869,6 +894,10 @@ notmuch_bool_t
 notmuch_message_get_flag (notmuch_message_t *message,
  notmuch_message_flag_t flag)
 {
+if (flag == NOTMUCH_MESSAGE_FLAG_GHOST &&
+   ! NOTMUCH_TEST_BIT (message->lazy_flags, flag))
+   _notmuch_message_ensure_metadata (message);
+
 return NOTMUCH_TEST_BIT (message->flags, flag);
 }

@@ -880,6 +909,7 @@ notmuch_message_set_flag (notmuch_message_t *message,
NOTMUCH_SET_BIT (>flags, flag);
 else
NOTMUCH_CLEAR_BIT (>flags, flag);
+NOTMUCH_SET_BIT (>lazy_flags, flag);
 }

 time_t
@@ -989,6 +1019,24 @@ _notmuch_message_delete (notmuch_message_t *message)
 return NOTMUCH_STATUS_SUCCESS;
 }

+/* Transform a blank message into a ghost message.  The caller must
+ * _notmuch_message_sync the message. */
+notmuch_private_status_t
+_notmuch_message_initialize_ghost (notmuch_message_t *message,
+  const char *thread_id)
+{
+notmuch_private_status_t status;
+
+status = _notmuch_message_add_term (message, "type", 

[PATCH v3 3/9] lib: Introduce macros for bit operations

2014-10-23 Thread Austin Clements
These macros help clarify basic bit-twiddling code and are written to
be robust against C undefined behavior of shift operators.
---
 lib/message.cc|  6 +++---
 lib/notmuch-private.h | 11 +++
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 38bc929..55d2ff6 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -869,7 +869,7 @@ notmuch_bool_t
 notmuch_message_get_flag (notmuch_message_t *message,
  notmuch_message_flag_t flag)
 {
-return message->flags & (1 << flag);
+return NOTMUCH_TEST_BIT (message->flags, flag);
 }

 void
@@ -877,9 +877,9 @@ notmuch_message_set_flag (notmuch_message_t *message,
  notmuch_message_flag_t flag, notmuch_bool_t enable)
 {
 if (enable)
-   message->flags |= (1 << flag);
+   NOTMUCH_SET_BIT (>flags, flag);
 else
-   message->flags &= ~(1 << flag);
+   NOTMUCH_CLEAR_BIT (>flags, flag);
 }

 time_t
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 36cc12b..7250291 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -63,6 +63,17 @@ NOTMUCH_BEGIN_DECLS
 #define STRNCMP_LITERAL(var, literal) \
 strncmp ((var), (literal), sizeof (literal) - 1)

+/* Robust bit test/set/reset macros */
+#define NOTMUCH_TEST_BIT(val, bit) \
+((bit < 0 || bit >= CHAR_BIT * sizeof (unsigned long long)) ? 0\
+ : !!((val) & (1ull << bit)))
+#define NOTMUCH_SET_BIT(valp, bit) \
+((bit < 0 || bit >= CHAR_BIT * sizeof (unsigned long long)) ? *(valp) \
+ : (*(valp) |= (1ull << bit)))
+#define NOTMUCH_CLEAR_BIT(valp,  bit) \
+((bit < 0 || bit >= CHAR_BIT * sizeof (unsigned long long)) ? *(valp) \
+ : (*(valp) &= ~(1ull << bit)))
+
 #define unused(x) x __attribute__ ((unused))

 #ifdef __cplusplus
-- 
2.1.0



[PATCH v3 2/9] lib: Update database schema doc for ghost messages

2014-10-23 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

This describes the structure of ghost mail documents.  Ghost messages
are not yet implemented.
---
 lib/database.cc | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 8fd7fad..c641bcd 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -50,8 +50,8 @@ typedef struct {

 /* Here's the current schema for our database (for NOTMUCH_DATABASE_VERSION):
  *
- * We currently have two different types of documents (mail and
- * directory) and also some metadata.
+ * We currently have three different types of documents (mail, ghost,
+ * and directory) and also some metadata.
  *
  * Mail document
  * -
@@ -109,6 +109,15 @@ typedef struct {
  *
  * The data portion of a mail document is empty.
  *
+ * Ghost mail document [if NOTMUCH_FEATURE_GHOSTS]
+ * ---
+ * A ghost mail document is like a mail document, but where we don't
+ * have the message content.  These are used to track thread reference
+ * information for messages we haven't received.
+ *
+ * A ghost mail document has type: ghost; id and thread fields that
+ * are identical to the mail document fields; and a MESSAGE_ID value.
+ *
  * Directory document
  * --
  * A directory document is used by a client of the notmuch library to
@@ -172,6 +181,13 @@ typedef struct {
  * generated is 1 and the value will be
  * incremented for each thread ID.
  *
+ * Obsolete metadata
+ * -
+ *
+ * If ! NOTMUCH_FEATURE_GHOSTS, there are no ghost mail documents.
+ * Instead, the database has the following additional database
+ * metadata:
+ *
  * thread_id_* A pre-allocated thread ID for a particular
  * message. This is actually an arbitrarily large
  * family of metadata name. Any particular name is
-- 
2.1.0



[PATCH v3 1/9] lib: Add a ghost messages database feature

2014-10-23 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

This will be implemented over the next several patches.  The feature
is not yet "enabled" (this does not add it to
NOTMUCH_FEATURES_CURRENT).
---
 lib/database-private.h | 7 +++
 lib/database.cc| 2 ++
 2 files changed, 9 insertions(+)

diff --git a/lib/database-private.h b/lib/database-private.h
index ca0751c..e2e4bc8 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -85,6 +85,13 @@ enum _notmuch_features {
  *
  * Introduced: version 2. */
 NOTMUCH_FEATURE_BOOL_FOLDER = 1 << 3,
+
+/* If set, missing messages are stored in ghost mail documents.
+ * If unset, thread IDs of ghost messages are stored as database
+ * metadata instead of in ghost documents.
+ *
+ * Introduced: version 3. */
+NOTMUCH_FEATURE_GHOSTS = 1 << 4,
 };

 /* In C++, a named enum is its own type, so define bitwise operators
diff --git a/lib/database.cc b/lib/database.cc
index 1c6ffc5..8fd7fad 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -286,6 +286,8 @@ static const struct {
   "from/subject/message-ID in database", "w" },
 { NOTMUCH_FEATURE_BOOL_FOLDER,
   "exact folder:/path: search", "rw" },
+{ NOTMUCH_FEATURE_GHOSTS,
+  "mail documents for missing messages", "w"},
 };

 const char *
-- 
2.1.0



[PATCH v3 0/9] Add ghost messages and fix thread linking

2014-10-23 Thread Austin Clements
This is v3 of
id:1412637438-4821-1-git-send-email-aclements at csail.mit.edu.
This fixes some comments, as suggested by Mark.  There are no
code changes relative to v2.  The diff from v2 is below.

diff --git a/lib/database.cc b/lib/database.cc
index 6e51a72..3601f9d 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -2121,10 +2121,13 @@ _consume_metadata_thread_id (void *ctx, 
notmuch_database_t *notmuch,
 /* Given a blank or ghost 'message' and its corresponding
  * 'message_file' link it to existing threads in the database.
  *
- * The first check is in the metadata of the database to see if we
- * have pre-allocated a thread_id in advance for this message, (which
- * would have happened if a message was previously added that
- * referenced this one).
+ * First, if is_ghost, this retrieves the thread ID already stored in
+ * the message (which will be the case if a message was previously
+ * added that referenced this one).  If the message is blank
+ * (!is_ghost), it doesn't have a thread ID yet (we'll generate one
+ * later in this function).  If the database does not support ghost
+ * messages, this checks for a thread ID stored in database metadata
+ * for this message ID.
  *
  * Second, we look at 'message_file' and its link-relevant headers
  * (References and In-Reply-To) for message IDs.
@@ -2132,12 +2135,12 @@ _consume_metadata_thread_id (void *ctx, 
notmuch_database_t *notmuch,
  * Finally, we look in the database for existing message that
  * reference 'message'.
  *
- * In all cases, we assign to the current message the first thread_id
- * found (through either parent or child). We will also merge any
- * existing, distinct threads where this message belongs to both,
- * (which is not uncommon when messages are processed out of order).
+ * In all cases, we assign to the current message the first thread ID
+ * found. We will also merge any existing, distinct threads where this
+ * message belongs to both, (which is not uncommon when messages are
+ * processed out of order).
  *
- * Finally, if no thread ID has been found through parent or child, we
+ * Finally, if no thread ID has been found through referenced messages, we
  * call _notmuch_message_generate_thread_id to generate a new thread
  * ID. This should only happen for new, top-level messages, (no
  * References or In-Reply-To header in this message, and no previously



[PATCH v3 0/9] Add ghost messages and fix thread linking

2014-10-23 Thread Austin Clements
This is v3 of
id:1412637438-4821-1-git-send-email-acleme...@csail.mit.edu.
This fixes some comments, as suggested by Mark.  There are no
code changes relative to v2.  The diff from v2 is below.

diff --git a/lib/database.cc b/lib/database.cc
index 6e51a72..3601f9d 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -2121,10 +2121,13 @@ _consume_metadata_thread_id (void *ctx, 
notmuch_database_t *notmuch,
 /* Given a blank or ghost 'message' and its corresponding
  * 'message_file' link it to existing threads in the database.
  *
- * The first check is in the metadata of the database to see if we
- * have pre-allocated a thread_id in advance for this message, (which
- * would have happened if a message was previously added that
- * referenced this one).
+ * First, if is_ghost, this retrieves the thread ID already stored in
+ * the message (which will be the case if a message was previously
+ * added that referenced this one).  If the message is blank
+ * (!is_ghost), it doesn't have a thread ID yet (we'll generate one
+ * later in this function).  If the database does not support ghost
+ * messages, this checks for a thread ID stored in database metadata
+ * for this message ID.
  *
  * Second, we look at 'message_file' and its link-relevant headers
  * (References and In-Reply-To) for message IDs.
@@ -2132,12 +2135,12 @@ _consume_metadata_thread_id (void *ctx, 
notmuch_database_t *notmuch,
  * Finally, we look in the database for existing message that
  * reference 'message'.
  *
- * In all cases, we assign to the current message the first thread_id
- * found (through either parent or child). We will also merge any
- * existing, distinct threads where this message belongs to both,
- * (which is not uncommon when messages are processed out of order).
+ * In all cases, we assign to the current message the first thread ID
+ * found. We will also merge any existing, distinct threads where this
+ * message belongs to both, (which is not uncommon when messages are
+ * processed out of order).
  *
- * Finally, if no thread ID has been found through parent or child, we
+ * Finally, if no thread ID has been found through referenced messages, we
  * call _notmuch_message_generate_thread_id to generate a new thread
  * ID. This should only happen for new, top-level messages, (no
  * References or In-Reply-To header in this message, and no previously

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v3 3/9] lib: Introduce macros for bit operations

2014-10-23 Thread Austin Clements
These macros help clarify basic bit-twiddling code and are written to
be robust against C undefined behavior of shift operators.
---
 lib/message.cc|  6 +++---
 lib/notmuch-private.h | 11 +++
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 38bc929..55d2ff6 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -869,7 +869,7 @@ notmuch_bool_t
 notmuch_message_get_flag (notmuch_message_t *message,
  notmuch_message_flag_t flag)
 {
-return message-flags  (1  flag);
+return NOTMUCH_TEST_BIT (message-flags, flag);
 }
 
 void
@@ -877,9 +877,9 @@ notmuch_message_set_flag (notmuch_message_t *message,
  notmuch_message_flag_t flag, notmuch_bool_t enable)
 {
 if (enable)
-   message-flags |= (1  flag);
+   NOTMUCH_SET_BIT (message-flags, flag);
 else
-   message-flags = ~(1  flag);
+   NOTMUCH_CLEAR_BIT (message-flags, flag);
 }
 
 time_t
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 36cc12b..7250291 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -63,6 +63,17 @@ NOTMUCH_BEGIN_DECLS
 #define STRNCMP_LITERAL(var, literal) \
 strncmp ((var), (literal), sizeof (literal) - 1)
 
+/* Robust bit test/set/reset macros */
+#define NOTMUCH_TEST_BIT(val, bit) \
+((bit  0 || bit = CHAR_BIT * sizeof (unsigned long long)) ? 0\
+ : !!((val)  (1ull  bit)))
+#define NOTMUCH_SET_BIT(valp, bit) \
+((bit  0 || bit = CHAR_BIT * sizeof (unsigned long long)) ? *(valp) \
+ : (*(valp) |= (1ull  bit)))
+#define NOTMUCH_CLEAR_BIT(valp,  bit) \
+((bit  0 || bit = CHAR_BIT * sizeof (unsigned long long)) ? *(valp) \
+ : (*(valp) = ~(1ull  bit)))
+
 #define unused(x) x __attribute__ ((unused))
 
 #ifdef __cplusplus
-- 
2.1.0

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v3 2/9] lib: Update database schema doc for ghost messages

2014-10-23 Thread Austin Clements
From: Austin Clements amdra...@mit.edu

This describes the structure of ghost mail documents.  Ghost messages
are not yet implemented.
---
 lib/database.cc | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 8fd7fad..c641bcd 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -50,8 +50,8 @@ typedef struct {
 
 /* Here's the current schema for our database (for NOTMUCH_DATABASE_VERSION):
  *
- * We currently have two different types of documents (mail and
- * directory) and also some metadata.
+ * We currently have three different types of documents (mail, ghost,
+ * and directory) and also some metadata.
  *
  * Mail document
  * -
@@ -109,6 +109,15 @@ typedef struct {
  *
  * The data portion of a mail document is empty.
  *
+ * Ghost mail document [if NOTMUCH_FEATURE_GHOSTS]
+ * ---
+ * A ghost mail document is like a mail document, but where we don't
+ * have the message content.  These are used to track thread reference
+ * information for messages we haven't received.
+ *
+ * A ghost mail document has type: ghost; id and thread fields that
+ * are identical to the mail document fields; and a MESSAGE_ID value.
+ *
  * Directory document
  * --
  * A directory document is used by a client of the notmuch library to
@@ -172,6 +181,13 @@ typedef struct {
  * generated is 1 and the value will be
  * incremented for each thread ID.
  *
+ * Obsolete metadata
+ * -
+ *
+ * If ! NOTMUCH_FEATURE_GHOSTS, there are no ghost mail documents.
+ * Instead, the database has the following additional database
+ * metadata:
+ *
  * thread_id_* A pre-allocated thread ID for a particular
  * message. This is actually an arbitrarily large
  * family of metadata name. Any particular name is
-- 
2.1.0

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v3 1/9] lib: Add a ghost messages database feature

2014-10-23 Thread Austin Clements
From: Austin Clements amdra...@mit.edu

This will be implemented over the next several patches.  The feature
is not yet enabled (this does not add it to
NOTMUCH_FEATURES_CURRENT).
---
 lib/database-private.h | 7 +++
 lib/database.cc| 2 ++
 2 files changed, 9 insertions(+)

diff --git a/lib/database-private.h b/lib/database-private.h
index ca0751c..e2e4bc8 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -85,6 +85,13 @@ enum _notmuch_features {
  *
  * Introduced: version 2. */
 NOTMUCH_FEATURE_BOOL_FOLDER = 1  3,
+
+/* If set, missing messages are stored in ghost mail documents.
+ * If unset, thread IDs of ghost messages are stored as database
+ * metadata instead of in ghost documents.
+ *
+ * Introduced: version 3. */
+NOTMUCH_FEATURE_GHOSTS = 1  4,
 };
 
 /* In C++, a named enum is its own type, so define bitwise operators
diff --git a/lib/database.cc b/lib/database.cc
index 1c6ffc5..8fd7fad 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -286,6 +286,8 @@ static const struct {
   from/subject/message-ID in database, w },
 { NOTMUCH_FEATURE_BOOL_FOLDER,
   exact folder:/path: search, rw },
+{ NOTMUCH_FEATURE_GHOSTS,
+  mail documents for missing messages, w},
 };
 
 const char *
-- 
2.1.0

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v3 6/9] lib: Implement upgrade to ghost messages feature

2014-10-23 Thread Austin Clements
From: Austin Clements amdra...@mit.edu

Somehow this is the first upgrade pass that actually does *any* error
checking, so this also adds the bit of necessary infrastructure to
handle that.
---
 lib/database.cc | 66 +++--
 1 file changed, 64 insertions(+), 2 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 92a92d9..b673718 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1231,6 +1231,7 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
 notmuch_bool_t timer_is_active = FALSE;
 enum _notmuch_features target_features, new_features;
 notmuch_status_t status;
+notmuch_private_status_t private_status;
 unsigned int count = 0, total = 0;
 
 status = _notmuch_database_ensure_writable (notmuch);
@@ -1275,6 +1276,13 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
for (t = db-allterms_begin (XTIMESTAMP); t != t_end; t++)
++total;
 }
+if (new_features  NOTMUCH_FEATURE_GHOSTS) {
+   /* The ghost message upgrade converts all thread_id_*
+* metadata values into ghost message documents. */
+   t_end = db-metadata_keys_end (thread_id_);
+   for (t = db-metadata_keys_begin (thread_id_); t != t_end; ++t)
+   ++total;
+}
 
 /* Perform the upgrade in a transaction. */
 db-begin_transaction (true);
@@ -1378,10 +1386,64 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
}
 }
 
+/* Perform metadata upgrades. */
+
+/* Prior to NOTMUCH_FEATURE_GHOSTS, thread IDs for missing
+ * messages were stored as database metadata. Change these to
+ * ghost messages.
+ */
+if (new_features  NOTMUCH_FEATURE_GHOSTS) {
+   notmuch_message_t *message;
+   std::string message_id, thread_id;
+
+   t_end = db-metadata_keys_end (NOTMUCH_METADATA_THREAD_ID_PREFIX);
+   for (t = db-metadata_keys_begin (NOTMUCH_METADATA_THREAD_ID_PREFIX);
+t != t_end; ++t) {
+   if (do_progress_notify) {
+   progress_notify (closure, (double) count / total);
+   do_progress_notify = 0;
+   }
+
+   message_id = (*t).substr (
+   strlen (NOTMUCH_METADATA_THREAD_ID_PREFIX));
+   thread_id = db-get_metadata (*t);
+
+   /* Create ghost message */
+   message = _notmuch_message_create_for_message_id (
+   notmuch, message_id.c_str (), private_status);
+   if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
+   /* Document already exists; ignore the stored thread ID */
+   } else if (private_status ==
+  NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+   private_status = _notmuch_message_initialize_ghost (
+   message, thread_id.c_str ());
+   if (! private_status)
+   _notmuch_message_sync (message);
+   }
+
+   if (private_status) {
+   fprintf (stderr,
+Upgrade failed while creating ghost messages.\n);
+   status = COERCE_STATUS (private_status, Unexpected status from 
_notmuch_message_initialize_ghost);
+   goto DONE;
+   }
+
+   /* Clear saved metadata thread ID */
+   db-set_metadata (*t, );
+
+   ++count;
+   }
+}
+
+status = NOTMUCH_STATUS_SUCCESS;
 db-set_metadata (features, _print_features (local, notmuch-features));
 db-set_metadata (version, STRINGIFY (NOTMUCH_DATABASE_VERSION));
 
-db-commit_transaction ();
+ DONE:
+if (status == NOTMUCH_STATUS_SUCCESS)
+   db-commit_transaction ();
+else
+   db-cancel_transaction ();
 
 if (timer_is_active) {
/* Now stop the timer. */
@@ -1397,7 +1459,7 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
 }
 
 talloc_free (local);
-return NOTMUCH_STATUS_SUCCESS;
+return status;
 }
 
 notmuch_status_t
-- 
2.1.0

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v3 8/9] test: Test upgrade to ghost messages feature

2014-10-23 Thread Austin Clements
---
 test/T530-upgrade.sh | 21 +
 1 file changed, 21 insertions(+)

diff --git a/test/T530-upgrade.sh b/test/T530-upgrade.sh
index c4c4ac8..6b42a69 100755
--- a/test/T530-upgrade.sh
+++ b/test/T530-upgrade.sh
@@ -116,4 +116,25 @@ MAIL_DIR/bar/new/21:2,
 MAIL_DIR/bar/new/22:2,
 MAIL_DIR/cur/51:2,
 
+# Ghost messages are difficult to test since they're nearly invisible.
+# However, if the upgrade works correctly, the ghost message should
+# retain the right thread ID even if all of the original messages in
+# the thread are deleted.  That's what we test.  This won't detect if
+# the upgrade just plain didn't happen, but it should detect if
+# something went wrong.
+test_begin_subtest ghost message retains thread ID
+# Upgrade database
+notmuch new
+# Get thread ID of real message
+thread=$(notmuch search --output=threads id:4efc743a.3060...@april.org)
+# Delete all real messages in that thread
+rm $(notmuch search --output=files $thread)
+notmuch new
+# Deliver ghost message
+add_message '[subject]=Ghost' '[id]=4efc3931.6030...@april.org'
+# If the ghost upgrade worked, the new message should be attached to
+# the existing thread ID.
+nthread=$(notmuch search --output=threads id:4efc3931.6030...@april.org)
+test_expect_equal $thread $nthread
+
 test_done
-- 
2.1.0

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v3 4/9] lib: Internal support for querying and creating ghost messages

2014-10-23 Thread Austin Clements
From: Austin Clements amdra...@mit.edu

This updates the message abstraction to support ghost messages: it
adds a message flag that distinguishes regular messages from ghost
messages, and an internal function for initializing a newly created
(blank) message as a ghost message.
---
 lib/message.cc| 52 +--
 lib/notmuch-private.h |  4 
 lib/notmuch.h |  9 -
 3 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 55d2ff6..a7a13cc 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -39,6 +39,9 @@ struct visible _notmuch_message {
 notmuch_message_file_t *message_file;
 notmuch_message_list_t *replies;
 unsigned long flags;
+/* For flags that are initialized on-demand, lazy_flags indicates
+ * if each flag has been initialized. */
+unsigned long lazy_flags;
 
 Xapian::Document doc;
 Xapian::termcount termpos;
@@ -99,6 +102,7 @@ _notmuch_message_create_for_document (const void 
*talloc_owner,
 
 message-frozen = 0;
 message-flags = 0;
+message-lazy_flags = 0;
 
 /* Each of these will be lazily created as needed. */
 message-message_id = NULL;
@@ -192,7 +196,7 @@ _notmuch_message_create (const void *talloc_owner,
  *
  * There is already a document with message ID 'message_id' in the
  * database. The returned message can be used to query/modify the
- * document.
+ * document. The message may be a ghost message.
  *
  *   NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND:
  *
@@ -305,6 +309,7 @@ _notmuch_message_ensure_metadata (notmuch_message_t 
*message)
 const char *thread_prefix = _find_prefix (thread),
*tag_prefix = _find_prefix (tag),
*id_prefix = _find_prefix (id),
+   *type_prefix = _find_prefix (type),
*filename_prefix = _find_prefix (file-direntry),
*replyto_prefix = _find_prefix (replyto);
 
@@ -337,10 +342,25 @@ _notmuch_message_ensure_metadata (notmuch_message_t 
*message)
message-message_id =
_notmuch_message_get_term (message, i, end, id_prefix);
 
+/* Get document type */
+assert (strcmp (id_prefix, type_prefix)  0);
+if (! NOTMUCH_TEST_BIT (message-lazy_flags, NOTMUCH_MESSAGE_FLAG_GHOST)) {
+   i.skip_to (type_prefix);
+   /* T is the prefix type fields.  See
+* BOOLEAN_PREFIX_INTERNAL. */
+   if (*i == Tmail)
+   NOTMUCH_CLEAR_BIT (message-flags, NOTMUCH_MESSAGE_FLAG_GHOST);
+   else if (*i == Tghost)
+   NOTMUCH_SET_BIT (message-flags, NOTMUCH_MESSAGE_FLAG_GHOST);
+   else
+   INTERNAL_ERROR (Message without type term);
+   NOTMUCH_SET_BIT (message-lazy_flags, NOTMUCH_MESSAGE_FLAG_GHOST);
+}
+
 /* Get filename list.  Here we get only the terms.  We lazily
  * expand them to full file names when needed in
  * _notmuch_message_ensure_filename_list. */
-assert (strcmp (id_prefix, filename_prefix)  0);
+assert (strcmp (type_prefix, filename_prefix)  0);
 if (!message-filename_term_list  !message-filename_list)
message-filename_term_list =
_notmuch_database_get_terms_with_prefix (message, i, end,
@@ -371,6 +391,11 @@ _notmuch_message_invalidate_metadata (notmuch_message_t 
*message,
message-tag_list = NULL;
 }
 
+if (strcmp (type, prefix_name) == 0) {
+   NOTMUCH_CLEAR_BIT (message-flags, NOTMUCH_MESSAGE_FLAG_GHOST);
+   NOTMUCH_CLEAR_BIT (message-lazy_flags, NOTMUCH_MESSAGE_FLAG_GHOST);
+}
+
 if (strcmp (file-direntry, prefix_name) == 0) {
talloc_free (message-filename_term_list);
talloc_free (message-filename_list);
@@ -869,6 +894,10 @@ notmuch_bool_t
 notmuch_message_get_flag (notmuch_message_t *message,
  notmuch_message_flag_t flag)
 {
+if (flag == NOTMUCH_MESSAGE_FLAG_GHOST 
+   ! NOTMUCH_TEST_BIT (message-lazy_flags, flag))
+   _notmuch_message_ensure_metadata (message);
+
 return NOTMUCH_TEST_BIT (message-flags, flag);
 }
 
@@ -880,6 +909,7 @@ notmuch_message_set_flag (notmuch_message_t *message,
NOTMUCH_SET_BIT (message-flags, flag);
 else
NOTMUCH_CLEAR_BIT (message-flags, flag);
+NOTMUCH_SET_BIT (message-lazy_flags, flag);
 }
 
 time_t
@@ -989,6 +1019,24 @@ _notmuch_message_delete (notmuch_message_t *message)
 return NOTMUCH_STATUS_SUCCESS;
 }
 
+/* Transform a blank message into a ghost message.  The caller must
+ * _notmuch_message_sync the message. */
+notmuch_private_status_t
+_notmuch_message_initialize_ghost (notmuch_message_t *message,
+  const char *thread_id)
+{
+notmuch_private_status_t status;
+
+status = _notmuch_message_add_term (message, type, ghost);
+if (status)
+   return status;
+status = _notmuch_message_add_term (message, thread, thread_id);
+if (status)
+   return status;
+
+return NOTMUCH_PRIVATE_STATUS_SUCCESS;
+}
+
 /* Ensure

[PATCH v3 5/9] lib: Implement ghost-based thread linking

2014-10-23 Thread Austin Clements
From: Austin Clements amdra...@mit.edu

This updates the thread linking code to use ghost messages instead of
user metadata to link messages into threads.

In contrast with the old approach, this is actually correct.
Previously, thread merging updated only the thread IDs of message
documents, not thread IDs stored in user metadata.  As originally
diagnosed by Mark Walters [1] and as demonstrated by the broken
T260-thread-order test, this can cause notmuch to fail to link
messages even though they're in the same thread.  In principle the old
approach could have been fixed by updating the user metadata thread
IDs as well, but these are not indexed and hence this would have
required a full scan of all stored thread IDs.  Ghost messages solve
this problem naturally by reusing the exact same thread ID and message
ID representation and indexing as regular messages.

Furthermore, thanks to this greater symmetry, ghost messages are also
algorithmically simpler.  We continue to support the old user metadata
format, so this patch can't delete any code, but when we do remove
support for the old format, several functions can simply be deleted.

[1] id:8738h7kv2q@qmul.ac.uk
---
 lib/database.cc | 99 +++--
 1 file changed, 83 insertions(+), 16 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index c641bcd..92a92d9 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1752,6 +1752,12 @@ _get_metadata_thread_id_key (void *ctx, const char 
*message_id)
message_id);
 }
 
+static notmuch_status_t
+_resolve_message_id_to_thread_id_old (notmuch_database_t *notmuch,
+ void *ctx,
+ const char *message_id,
+ const char **thread_id_ret);
+
 /* Find the thread ID to which the message with 'message_id' belongs.
  *
  * Note: 'thread_id_ret' must not be NULL!
@@ -1760,9 +1766,9 @@ _get_metadata_thread_id_key (void *ctx, const char 
*message_id)
  *
  * Note: If there is no message in the database with the given
  * 'message_id' then a new thread_id will be allocated for this
- * message and stored in the database metadata, (where this same
+ * message ID and stored in the database metadata so that the
  * thread ID can be looked up if the message is added to the database
- * later).
+ * later.
  */
 static notmuch_status_t
 _resolve_message_id_to_thread_id (notmuch_database_t *notmuch,
@@ -1770,6 +1776,49 @@ _resolve_message_id_to_thread_id (notmuch_database_t 
*notmuch,
  const char *message_id,
  const char **thread_id_ret)
 {
+notmuch_private_status_t status;
+notmuch_message_t *message;
+
+if (! (notmuch-features  NOTMUCH_FEATURE_GHOSTS))
+   return _resolve_message_id_to_thread_id_old (notmuch, ctx, message_id,
+thread_id_ret);
+
+/* Look for this message (regular or ghost) */
+message = _notmuch_message_create_for_message_id (
+   notmuch, message_id, status);
+if (status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
+   /* Message exists */
+   *thread_id_ret = talloc_steal (
+   ctx, notmuch_message_get_thread_id (message));
+} else if (status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+   /* Message did not exist.  Give it a fresh thread ID and
+* populate this message as a ghost message. */
+   *thread_id_ret = talloc_strdup (
+   ctx, _notmuch_database_generate_thread_id (notmuch));
+   if (! *thread_id_ret) {
+   status = NOTMUCH_PRIVATE_STATUS_OUT_OF_MEMORY;
+   } else {
+   status = _notmuch_message_initialize_ghost (message, 
*thread_id_ret);
+   if (status == 0)
+   /* Commit the new ghost message */
+   _notmuch_message_sync (message);
+   }
+} else {
+   /* Create failed. Fall through. */
+}
+
+notmuch_message_destroy (message);
+
+return COERCE_STATUS (status, Error creating ghost message);
+}
+
+/* Pre-ghost messages _resolve_message_id_to_thread_id */
+static notmuch_status_t
+_resolve_message_id_to_thread_id_old (notmuch_database_t *notmuch,
+ void *ctx,
+ const char *message_id,
+ const char **thread_id_ret)
+{
 notmuch_status_t status;
 notmuch_message_t *message;
 string thread_id_string;
@@ -2007,13 +2056,16 @@ _consume_metadata_thread_id (void *ctx, 
notmuch_database_t *notmuch,
 }
 }
 
-/* Given a (mostly empty) 'message' and its corresponding
+/* Given a blank or ghost 'message' and its corresponding
  * 'message_file' link it to existing threads in the database.
  *
- * The first check is in the metadata of the database to see if we
- * have pre-allocated a thread_id in advance for this message, (which
- * would have

[PATCH v3 7/9] lib: Enable ghost messages feature

2014-10-23 Thread Austin Clements
From: Austin Clements amdra...@mit.edu

This fixes the broken thread order test.
---
 lib/database-private.h| 2 +-
 test/T260-thread-order.sh | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/database-private.h b/lib/database-private.h
index e2e4bc8..15e03cc 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -166,7 +166,7 @@ struct _notmuch_database {
  * databases will have it). */
 #define NOTMUCH_FEATURES_CURRENT \
 (NOTMUCH_FEATURE_FILE_TERMS | NOTMUCH_FEATURE_DIRECTORY_DOCS | \
- NOTMUCH_FEATURE_BOOL_FOLDER)
+ NOTMUCH_FEATURE_BOOL_FOLDER | NOTMUCH_FEATURE_GHOSTS)
 
 /* Return the list of terms from the given iterator matching a prefix.
  * The prefix will be stripped from the strings in the returned list.
diff --git a/test/T260-thread-order.sh b/test/T260-thread-order.sh
index b435d79..99f5833 100755
--- a/test/T260-thread-order.sh
+++ b/test/T260-thread-order.sh
@@ -30,7 +30,6 @@ expected=$(for ((i = 0; i  $nthreads; i++)); do
 test_expect_equal $output $expected
 
 test_begin_subtest Messages with all parents get linked in all delivery 
orders
-test_subtest_known_broken
 # Here we do the same thing as the previous test, but each message
 # references all of its parents.  Since every message references the
 # root of the thread, each thread should always be fully joined.  This
-- 
2.1.0

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v3 9/9] lib: Remove unnecessary thread linking steps when using ghost messages

2014-10-23 Thread Austin Clements
From: Austin Clements amdra...@mit.edu

Previously, it was necessary to link new messages to children to work
around some (though not all) problems with the old metadata-based
approach to stored thread IDs.  With ghost messages, this is no longer
necessary, so don't bother with child linking when ghost messages are
in use.
---
 lib/database.cc | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index b673718..3601f9d 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -2136,11 +2136,11 @@ _consume_metadata_thread_id (void *ctx, 
notmuch_database_t *notmuch,
  * reference 'message'.
  *
  * In all cases, we assign to the current message the first thread ID
- * found (through either parent or child). We will also merge any
- * existing, distinct threads where this message belongs to both,
- * (which is not uncommon when messages are processed out of order).
+ * found. We will also merge any existing, distinct threads where this
+ * message belongs to both, (which is not uncommon when messages are
+ * processed out of order).
  *
- * Finally, if no thread ID has been found through parent or child, we
+ * Finally, if no thread ID has been found through referenced messages, we
  * call _notmuch_message_generate_thread_id to generate a new thread
  * ID. This should only happen for new, top-level messages, (no
  * References or In-Reply-To header in this message, and no previously
@@ -2172,10 +2172,23 @@ _notmuch_database_link_message (notmuch_database_t 
*notmuch,
 if (status)
goto DONE;
 
-status = _notmuch_database_link_message_to_children (notmuch, message,
-thread_id);
-if (status)
-   goto DONE;
+if (! (notmuch-features  NOTMUCH_FEATURE_GHOSTS)) {
+   /* In general, it shouldn't be necessary to link children,
+* since the earlier indexing of those children will have
+* stored a thread ID for the missing parent.  However, prior
+* to ghost messages, these stored thread IDs were NOT
+* rewritten during thread merging (and there was no
+* performant way to do so), so if indexed children were
+* pulled into a different thread ID by a merge, it was
+* necessary to pull them *back* into the stored thread ID of
+* the parent.  With ghost messages, we just rewrite the
+* stored thread IDs during merging, so this workaround isn't
+* necessary. */
+   status = _notmuch_database_link_message_to_children (notmuch, message,
+thread_id);
+   if (status)
+   goto DONE;
+}
 
 /* If not part of any existing thread, generate a new thread ID. */
 if (thread_id == NULL) {
-- 
2.1.0

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v2 12/12] lib: Remove unnecessary thread linking steps when using ghost messages

2014-10-21 Thread Austin Clements
Quoth Mark Walters on Oct 22 at 12:17 am:
> On Tue, 07 Oct 2014, Austin Clements  wrote:
> > From: Austin Clements 
> >
> > Previously, it was necessary to link new messages to children to work
> > around some (though not all) problems with the old metadata-based
> > approach to stored thread IDs.  With ghost messages, this is no longer
> > necessary, so don't bother with child linking when ghost messages are
> > in use.
> > ---
> >  lib/database.cc | 21 +
> >  1 file changed, 17 insertions(+), 4 deletions(-)
> >
> > diff --git a/lib/database.cc b/lib/database.cc
> > index 1316529..6e51a72 100644
> > --- a/lib/database.cc
> > +++ b/lib/database.cc
> > @@ -2169,10 +2169,23 @@ _notmuch_database_link_message (notmuch_database_t 
> > *notmuch,
> >  if (status)
> > goto DONE;
> >  
> > -status = _notmuch_database_link_message_to_children (notmuch, message,
> > -_id);
> > -if (status)
> > -   goto DONE;
> > +if (! (notmuch->features & NOTMUCH_FEATURE_GHOSTS)) {
> > +   /* In general, it shouldn't be necessary to link children,
> > +* since the earlier indexing of those children will have
> > +* stored a thread ID for the missing parent.  However, prior
> > +* to ghost messages, these stored thread IDs were NOT
> > +* rewritten during thread merging (and there was no
> > +* performant way to do so), so if indexed children were
> > +* pulled into a different thread ID by a merge, it was
> > +* necessary to pull them *back* into the stored thread ID of
> > +* the parent.  With ghost messages, we just rewrite the
> > +* stored thread IDs during merging, so this workaround isn't
> > +* necessary. */
> > +   status = _notmuch_database_link_message_to_children (notmuch, message,
> > +_id);
> > +   if (status)
> > +   goto DONE;
> > +}
> 
> Ok so this basically answers my earlier comment. It might be worth
> updating the big comment at the start of the function to match the new
> code though.

Like so?

diff --git a/lib/database.cc b/lib/database.cc
index d063ec8..3601f9d 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -2136,11 +2136,11 @@ _consume_metadata_thread_id (void *ctx, 
notmuch_database_t *notmuch,
  * reference 'message'.
  *
  * In all cases, we assign to the current message the first thread ID
- * found (through either parent or child). We will also merge any
- * existing, distinct threads where this message belongs to both,
- * (which is not uncommon when messages are processed out of order).
+ * found. We will also merge any existing, distinct threads where this
+ * message belongs to both, (which is not uncommon when messages are
+ * processed out of order).
  *
- * Finally, if no thread ID has been found through parent or child, we
+ * Finally, if no thread ID has been found through referenced messages, we
  * call _notmuch_message_generate_thread_id to generate a new thread
  * ID. This should only happen for new, top-level messages, (no
  * References or In-Reply-To header in this message, and no previously


> Best wishes
> 
> Mark
> 
> >  
> >  /* If not part of any existing thread, generate a new thread ID. */
> >  if (thread_id == NULL) {
> >
> > ___
> > notmuch mailing list
> > notmuch at notmuchmail.org
> > http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v2 08/12] lib: Implement ghost-based thread linking

2014-10-21 Thread Austin Clements
Quoth Mark Walters on Oct 22 at 12:10 am:
> On Tue, 07 Oct 2014, Austin Clements  wrote:
> > From: Austin Clements 
> >
> > This updates the thread linking code to use ghost messages instead of
> > user metadata to link messages into threads.
> >
> > In contrast with the old approach, this is actually correct.
> > Previously, thread merging updated only the thread IDs of message
> > documents, not thread IDs stored in user metadata.  As originally
> > diagnosed by Mark Walters [1] and as demonstrated by the broken
> > T260-thread-order test, this can cause notmuch to fail to link
> > messages even though they're in the same thread.  In principle the old
> > approach could have been fixed by updating the user metadata thread
> > IDs as well, but these are not indexed and hence this would have
> > required a full scan of all stored thread IDs.  Ghost messages solve
> > this problem naturally by reusing the exact same thread ID and message
> > ID representation and indexing as regular messages.
> >
> > Furthermore, thanks to this greater symmetry, ghost messages are also
> > algorithmically simpler.  We continue to support the old user metadata
> > format, so this patch can't delete any code, but when we do remove
> > support for the old format, several functions can simply be deleted.
> >
> > [1] id:8738h7kv2q.fsf at qmul.ac.uk
> > ---
> >  lib/database.cc | 86 
> > +
> >  1 file changed, 75 insertions(+), 11 deletions(-)
> >
> > diff --git a/lib/database.cc b/lib/database.cc
> > index c641bcd..fdcc526 100644
> > --- a/lib/database.cc
> > +++ b/lib/database.cc
> > @@ -1752,6 +1752,12 @@ _get_metadata_thread_id_key (void *ctx, const char 
> > *message_id)
> > message_id);
> >  }
> >  
> > +static notmuch_status_t
> > +_resolve_message_id_to_thread_id_old (notmuch_database_t *notmuch,
> > + void *ctx,
> > + const char *message_id,
> > + const char **thread_id_ret);
> > +
> >  /* Find the thread ID to which the message with 'message_id' belongs.
> >   *
> >   * Note: 'thread_id_ret' must not be NULL!
> > @@ -1760,9 +1766,9 @@ _get_metadata_thread_id_key (void *ctx, const char 
> > *message_id)
> >   *
> >   * Note: If there is no message in the database with the given
> >   * 'message_id' then a new thread_id will be allocated for this
> > - * message and stored in the database metadata, (where this same
> > + * message ID and stored in the database metadata so that the
> >   * thread ID can be looked up if the message is added to the database
> > - * later).
> > + * later.
> >   */
> >  static notmuch_status_t
> >  _resolve_message_id_to_thread_id (notmuch_database_t *notmuch,
> > @@ -1770,6 +1776,49 @@ _resolve_message_id_to_thread_id (notmuch_database_t 
> > *notmuch,
> >   const char *message_id,
> >   const char **thread_id_ret)
> >  {
> > +notmuch_private_status_t status;
> > +notmuch_message_t *message;
> > +
> > +if (! (notmuch->features & NOTMUCH_FEATURE_GHOSTS))
> > +   return _resolve_message_id_to_thread_id_old (notmuch, ctx, message_id,
> > +thread_id_ret);
> > +
> > +/* Look for this message (regular or ghost) */
> > +message = _notmuch_message_create_for_message_id (
> > +   notmuch, message_id, );
> > +if (status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
> > +   /* Message exists */
> > +   *thread_id_ret = talloc_steal (
> > +   ctx, notmuch_message_get_thread_id (message));
> > +} else if (status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
> > +   /* Message did not exist.  Give it a fresh thread ID and
> > +* populate this message as a ghost message. */
> > +   *thread_id_ret = talloc_strdup (
> > +   ctx, _notmuch_database_generate_thread_id (notmuch));
> > +   if (! *thread_id_ret) {
> > +   status = NOTMUCH_PRIVATE_STATUS_OUT_OF_MEMORY;
> > +   } else {
> > +   status = _notmuch_message_initialize_ghost (message, 
> > *thread_id_ret);
> > +   if (status == 0)
> > +   /* Commit the new ghost message */
> > +   _notmuch_message_sync (message);
> > +   }
> > +} else {
> > +   /* Create failed. Fall through. */
> > +}
> > +
> > +notmuch_m

[PATCH v2 07/12] lib: Internal support for querying and creating ghost messages

2014-10-21 Thread Austin Clements
Quoth Mark Walters on Oct 22 at 12:05 am:
> 
> Hi 
> 
> I am slowly working my way through this series: only two trivial queries
> so far.
> 
> On Tue, 07 Oct 2014, Austin Clements  wrote:
> > From: Austin Clements 
> >
> > This updates the message abstraction to support ghost messages: it
> > adds a message flag that distinguishes regular messages from ghost
> > messages, and an internal function for initializing a newly created
> > (blank) message as a ghost message.
> > ---
> >  lib/message.cc| 52 
> > +--
> >  lib/notmuch-private.h |  4 
> >  lib/notmuch.h |  9 -
> >  3 files changed, 62 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/message.cc b/lib/message.cc
> > index 55d2ff6..a7a13cc 100644
> > --- a/lib/message.cc
> > +++ b/lib/message.cc
> > @@ -39,6 +39,9 @@ struct visible _notmuch_message {
> >  notmuch_message_file_t *message_file;
> >  notmuch_message_list_t *replies;
> >  unsigned long flags;
> > +/* For flags that are initialized on-demand, lazy_flags indicates
> > + * if each flag has been initialized. */
> > +unsigned long lazy_flags;
> 
> I wonder if valid_flags might be better here as, as far as I can see,
> the reason for these is so we can invalidate a flag more than an
> optimisation (which is what I thought the lazy implied).

I do think of this as an optimization.  If we were to compute the
value of this flag when a message was created (and keep it
up-to-date), there would be no need for lazy_flags.  But, unlike the
other flags, computing this is expensive.

> >  
> >  Xapian::Document doc;
> >  Xapian::termcount termpos;
> > @@ -99,6 +102,7 @@ _notmuch_message_create_for_document (const void 
> > *talloc_owner,
> >  
> >  message->frozen = 0;
> >  message->flags = 0;
> > +message->lazy_flags = 0;
> >  
> >  /* Each of these will be lazily created as needed. */
> >  message->message_id = NULL;
> > @@ -192,7 +196,7 @@ _notmuch_message_create (const void *talloc_owner,
> >   *
> >   * There is already a document with message ID 'message_id' in the
> >   * database. The returned message can be used to query/modify the
> > - * document.
> > + * document. The message may be a ghost message.
> >   *
> >   *   NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND:
> >   *
> > @@ -305,6 +309,7 @@ _notmuch_message_ensure_metadata (notmuch_message_t 
> > *message)
> >  const char *thread_prefix = _find_prefix ("thread"),
> > *tag_prefix = _find_prefix ("tag"),
> > *id_prefix = _find_prefix ("id"),
> > +   *type_prefix = _find_prefix ("type"),
> > *filename_prefix = _find_prefix ("file-direntry"),
> > *replyto_prefix = _find_prefix ("replyto");
> >  
> > @@ -337,10 +342,25 @@ _notmuch_message_ensure_metadata (notmuch_message_t 
> > *message)
> > message->message_id =
> > _notmuch_message_get_term (message, i, end, id_prefix);
> >  
> > +/* Get document type */
> > +assert (strcmp (id_prefix, type_prefix) < 0);
> > +if (! NOTMUCH_TEST_BIT (message->lazy_flags, 
> > NOTMUCH_MESSAGE_FLAG_GHOST)) {
> > +   i.skip_to (type_prefix);
> > +   /* "T" is the prefix "type" fields.  See
> > +* BOOLEAN_PREFIX_INTERNAL. */
> > +   if (*i == "Tmail")
> > +   NOTMUCH_CLEAR_BIT (>flags, NOTMUCH_MESSAGE_FLAG_GHOST);
> > +   else if (*i == "Tghost")
> > +   NOTMUCH_SET_BIT (>flags, NOTMUCH_MESSAGE_FLAG_GHOST);
> > +   else
> > +   INTERNAL_ERROR ("Message without type term");
> > +   NOTMUCH_SET_BIT (>lazy_flags, NOTMUCH_MESSAGE_FLAG_GHOST);
> > +}
> > +
> >  /* Get filename list.  Here we get only the terms.  We lazily
> >   * expand them to full file names when needed in
> >   * _notmuch_message_ensure_filename_list. */
> > -assert (strcmp (id_prefix, filename_prefix) < 0);
> > +assert (strcmp (type_prefix, filename_prefix) < 0);
> >  if (!message->filename_term_list && !message->filename_list)
> > message->filename_term_list =
> > _notmuch_database_get_terms_with_prefix (message, i, end,
> > @@ -371,6 +391,11 @@ _notmuch_message_invalidate_metadata 
> > (notmuch_message_t *message,
> > message->tag_list = NULL;
> >  }
> >  
> >

Re: [PATCH v2 07/12] lib: Internal support for querying and creating ghost messages

2014-10-21 Thread Austin Clements
Quoth Mark Walters on Oct 22 at 12:05 am:
 
 Hi 
 
 I am slowly working my way through this series: only two trivial queries
 so far.
 
 On Tue, 07 Oct 2014, Austin Clements acleme...@csail.mit.edu wrote:
  From: Austin Clements amdra...@mit.edu
 
  This updates the message abstraction to support ghost messages: it
  adds a message flag that distinguishes regular messages from ghost
  messages, and an internal function for initializing a newly created
  (blank) message as a ghost message.
  ---
   lib/message.cc| 52 
  +--
   lib/notmuch-private.h |  4 
   lib/notmuch.h |  9 -
   3 files changed, 62 insertions(+), 3 deletions(-)
 
  diff --git a/lib/message.cc b/lib/message.cc
  index 55d2ff6..a7a13cc 100644
  --- a/lib/message.cc
  +++ b/lib/message.cc
  @@ -39,6 +39,9 @@ struct visible _notmuch_message {
   notmuch_message_file_t *message_file;
   notmuch_message_list_t *replies;
   unsigned long flags;
  +/* For flags that are initialized on-demand, lazy_flags indicates
  + * if each flag has been initialized. */
  +unsigned long lazy_flags;
 
 I wonder if valid_flags might be better here as, as far as I can see,
 the reason for these is so we can invalidate a flag more than an
 optimisation (which is what I thought the lazy implied).

I do think of this as an optimization.  If we were to compute the
value of this flag when a message was created (and keep it
up-to-date), there would be no need for lazy_flags.  But, unlike the
other flags, computing this is expensive.

   
   Xapian::Document doc;
   Xapian::termcount termpos;
  @@ -99,6 +102,7 @@ _notmuch_message_create_for_document (const void 
  *talloc_owner,
   
   message-frozen = 0;
   message-flags = 0;
  +message-lazy_flags = 0;
   
   /* Each of these will be lazily created as needed. */
   message-message_id = NULL;
  @@ -192,7 +196,7 @@ _notmuch_message_create (const void *talloc_owner,
*
* There is already a document with message ID 'message_id' in the
* database. The returned message can be used to query/modify the
  - * document.
  + * document. The message may be a ghost message.
*
*   NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND:
*
  @@ -305,6 +309,7 @@ _notmuch_message_ensure_metadata (notmuch_message_t 
  *message)
   const char *thread_prefix = _find_prefix (thread),
  *tag_prefix = _find_prefix (tag),
  *id_prefix = _find_prefix (id),
  +   *type_prefix = _find_prefix (type),
  *filename_prefix = _find_prefix (file-direntry),
  *replyto_prefix = _find_prefix (replyto);
   
  @@ -337,10 +342,25 @@ _notmuch_message_ensure_metadata (notmuch_message_t 
  *message)
  message-message_id =
  _notmuch_message_get_term (message, i, end, id_prefix);
   
  +/* Get document type */
  +assert (strcmp (id_prefix, type_prefix)  0);
  +if (! NOTMUCH_TEST_BIT (message-lazy_flags, 
  NOTMUCH_MESSAGE_FLAG_GHOST)) {
  +   i.skip_to (type_prefix);
  +   /* T is the prefix type fields.  See
  +* BOOLEAN_PREFIX_INTERNAL. */
  +   if (*i == Tmail)
  +   NOTMUCH_CLEAR_BIT (message-flags, NOTMUCH_MESSAGE_FLAG_GHOST);
  +   else if (*i == Tghost)
  +   NOTMUCH_SET_BIT (message-flags, NOTMUCH_MESSAGE_FLAG_GHOST);
  +   else
  +   INTERNAL_ERROR (Message without type term);
  +   NOTMUCH_SET_BIT (message-lazy_flags, NOTMUCH_MESSAGE_FLAG_GHOST);
  +}
  +
   /* Get filename list.  Here we get only the terms.  We lazily
* expand them to full file names when needed in
* _notmuch_message_ensure_filename_list. */
  -assert (strcmp (id_prefix, filename_prefix)  0);
  +assert (strcmp (type_prefix, filename_prefix)  0);
   if (!message-filename_term_list  !message-filename_list)
  message-filename_term_list =
  _notmuch_database_get_terms_with_prefix (message, i, end,
  @@ -371,6 +391,11 @@ _notmuch_message_invalidate_metadata 
  (notmuch_message_t *message,
  message-tag_list = NULL;
   }
   
  +if (strcmp (type, prefix_name) == 0) {
  +   NOTMUCH_CLEAR_BIT (message-flags, NOTMUCH_MESSAGE_FLAG_GHOST);
  +   NOTMUCH_CLEAR_BIT (message-lazy_flags, NOTMUCH_MESSAGE_FLAG_GHOST);
  +}
  +
   if (strcmp (file-direntry, prefix_name) == 0) {
  talloc_free (message-filename_term_list);
  talloc_free (message-filename_list);
  @@ -869,6 +894,10 @@ notmuch_bool_t
   notmuch_message_get_flag (notmuch_message_t *message,
notmuch_message_flag_t flag)
   {
  +if (flag == NOTMUCH_MESSAGE_FLAG_GHOST 
  +   ! NOTMUCH_TEST_BIT (message-lazy_flags, flag))
  +   _notmuch_message_ensure_metadata (message);
  +
   return NOTMUCH_TEST_BIT (message-flags, flag);
   }
   
  @@ -880,6 +909,7 @@ notmuch_message_set_flag (notmuch_message_t *message,
  NOTMUCH_SET_BIT (message-flags, flag);
   else
  NOTMUCH_CLEAR_BIT (message-flags, flag

Re: [PATCH v2 08/12] lib: Implement ghost-based thread linking

2014-10-21 Thread Austin Clements
Quoth Mark Walters on Oct 22 at 12:10 am:
 On Tue, 07 Oct 2014, Austin Clements acleme...@csail.mit.edu wrote:
  From: Austin Clements amdra...@mit.edu
 
  This updates the thread linking code to use ghost messages instead of
  user metadata to link messages into threads.
 
  In contrast with the old approach, this is actually correct.
  Previously, thread merging updated only the thread IDs of message
  documents, not thread IDs stored in user metadata.  As originally
  diagnosed by Mark Walters [1] and as demonstrated by the broken
  T260-thread-order test, this can cause notmuch to fail to link
  messages even though they're in the same thread.  In principle the old
  approach could have been fixed by updating the user metadata thread
  IDs as well, but these are not indexed and hence this would have
  required a full scan of all stored thread IDs.  Ghost messages solve
  this problem naturally by reusing the exact same thread ID and message
  ID representation and indexing as regular messages.
 
  Furthermore, thanks to this greater symmetry, ghost messages are also
  algorithmically simpler.  We continue to support the old user metadata
  format, so this patch can't delete any code, but when we do remove
  support for the old format, several functions can simply be deleted.
 
  [1] id:8738h7kv2q@qmul.ac.uk
  ---
   lib/database.cc | 86 
  +
   1 file changed, 75 insertions(+), 11 deletions(-)
 
  diff --git a/lib/database.cc b/lib/database.cc
  index c641bcd..fdcc526 100644
  --- a/lib/database.cc
  +++ b/lib/database.cc
  @@ -1752,6 +1752,12 @@ _get_metadata_thread_id_key (void *ctx, const char 
  *message_id)
  message_id);
   }
   
  +static notmuch_status_t
  +_resolve_message_id_to_thread_id_old (notmuch_database_t *notmuch,
  + void *ctx,
  + const char *message_id,
  + const char **thread_id_ret);
  +
   /* Find the thread ID to which the message with 'message_id' belongs.
*
* Note: 'thread_id_ret' must not be NULL!
  @@ -1760,9 +1766,9 @@ _get_metadata_thread_id_key (void *ctx, const char 
  *message_id)
*
* Note: If there is no message in the database with the given
* 'message_id' then a new thread_id will be allocated for this
  - * message and stored in the database metadata, (where this same
  + * message ID and stored in the database metadata so that the
* thread ID can be looked up if the message is added to the database
  - * later).
  + * later.
*/
   static notmuch_status_t
   _resolve_message_id_to_thread_id (notmuch_database_t *notmuch,
  @@ -1770,6 +1776,49 @@ _resolve_message_id_to_thread_id (notmuch_database_t 
  *notmuch,
const char *message_id,
const char **thread_id_ret)
   {
  +notmuch_private_status_t status;
  +notmuch_message_t *message;
  +
  +if (! (notmuch-features  NOTMUCH_FEATURE_GHOSTS))
  +   return _resolve_message_id_to_thread_id_old (notmuch, ctx, message_id,
  +thread_id_ret);
  +
  +/* Look for this message (regular or ghost) */
  +message = _notmuch_message_create_for_message_id (
  +   notmuch, message_id, status);
  +if (status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
  +   /* Message exists */
  +   *thread_id_ret = talloc_steal (
  +   ctx, notmuch_message_get_thread_id (message));
  +} else if (status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
  +   /* Message did not exist.  Give it a fresh thread ID and
  +* populate this message as a ghost message. */
  +   *thread_id_ret = talloc_strdup (
  +   ctx, _notmuch_database_generate_thread_id (notmuch));
  +   if (! *thread_id_ret) {
  +   status = NOTMUCH_PRIVATE_STATUS_OUT_OF_MEMORY;
  +   } else {
  +   status = _notmuch_message_initialize_ghost (message, 
  *thread_id_ret);
  +   if (status == 0)
  +   /* Commit the new ghost message */
  +   _notmuch_message_sync (message);
  +   }
  +} else {
  +   /* Create failed. Fall through. */
  +}
  +
  +notmuch_message_destroy (message);
  +
  +return COERCE_STATUS (status, Error creating ghost message);
  +}
  +
  +/* Pre-ghost messages _resolve_message_id_to_thread_id */
  +static notmuch_status_t
  +_resolve_message_id_to_thread_id_old (notmuch_database_t *notmuch,
  + void *ctx,
  + const char *message_id,
  + const char **thread_id_ret)
  +{
   notmuch_status_t status;
   notmuch_message_t *message;
   string thread_id_string;
  @@ -2007,7 +2056,7 @@ _consume_metadata_thread_id (void *ctx, 
  notmuch_database_t *notmuch,
   }
   }
   
  -/* Given a (mostly empty) 'message' and its corresponding
  +/* Given a blank or ghost 'message' and its

Re: [PATCH v2 12/12] lib: Remove unnecessary thread linking steps when using ghost messages

2014-10-21 Thread Austin Clements
Quoth Mark Walters on Oct 22 at 12:17 am:
 On Tue, 07 Oct 2014, Austin Clements acleme...@csail.mit.edu wrote:
  From: Austin Clements amdra...@mit.edu
 
  Previously, it was necessary to link new messages to children to work
  around some (though not all) problems with the old metadata-based
  approach to stored thread IDs.  With ghost messages, this is no longer
  necessary, so don't bother with child linking when ghost messages are
  in use.
  ---
   lib/database.cc | 21 +
   1 file changed, 17 insertions(+), 4 deletions(-)
 
  diff --git a/lib/database.cc b/lib/database.cc
  index 1316529..6e51a72 100644
  --- a/lib/database.cc
  +++ b/lib/database.cc
  @@ -2169,10 +2169,23 @@ _notmuch_database_link_message (notmuch_database_t 
  *notmuch,
   if (status)
  goto DONE;
   
  -status = _notmuch_database_link_message_to_children (notmuch, message,
  -thread_id);
  -if (status)
  -   goto DONE;
  +if (! (notmuch-features  NOTMUCH_FEATURE_GHOSTS)) {
  +   /* In general, it shouldn't be necessary to link children,
  +* since the earlier indexing of those children will have
  +* stored a thread ID for the missing parent.  However, prior
  +* to ghost messages, these stored thread IDs were NOT
  +* rewritten during thread merging (and there was no
  +* performant way to do so), so if indexed children were
  +* pulled into a different thread ID by a merge, it was
  +* necessary to pull them *back* into the stored thread ID of
  +* the parent.  With ghost messages, we just rewrite the
  +* stored thread IDs during merging, so this workaround isn't
  +* necessary. */
  +   status = _notmuch_database_link_message_to_children (notmuch, message,
  +thread_id);
  +   if (status)
  +   goto DONE;
  +}
 
 Ok so this basically answers my earlier comment. It might be worth
 updating the big comment at the start of the function to match the new
 code though.

Like so?

diff --git a/lib/database.cc b/lib/database.cc
index d063ec8..3601f9d 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -2136,11 +2136,11 @@ _consume_metadata_thread_id (void *ctx, 
notmuch_database_t *notmuch,
  * reference 'message'.
  *
  * In all cases, we assign to the current message the first thread ID
- * found (through either parent or child). We will also merge any
- * existing, distinct threads where this message belongs to both,
- * (which is not uncommon when messages are processed out of order).
+ * found. We will also merge any existing, distinct threads where this
+ * message belongs to both, (which is not uncommon when messages are
+ * processed out of order).
  *
- * Finally, if no thread ID has been found through parent or child, we
+ * Finally, if no thread ID has been found through referenced messages, we
  * call _notmuch_message_generate_thread_id to generate a new thread
  * ID. This should only happen for new, top-level messages, (no
  * References or In-Reply-To header in this message, and no previously


 Best wishes
 
 Mark
 
   
   /* If not part of any existing thread, generate a new thread ID. */
   if (thread_id == NULL) {
 
  ___
  notmuch mailing list
  notmuch@notmuchmail.org
  http://notmuchmail.org/mailman/listinfo/notmuch
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[WIP PATCH 4/4] lib: Add "lastmod:" queries for filtering by last modification

2014-10-13 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

XXX Includes reference to notmuch search --db-revision, which doesn't
exist.
---
 doc/man7/notmuch-search-terms.rst | 8 
 lib/database-private.h| 1 +
 lib/database.cc   | 4 
 3 files changed, 13 insertions(+)

diff --git a/doc/man7/notmuch-search-terms.rst 
b/doc/man7/notmuch-search-terms.rst
index 1acdaa0..df76e39 100644
--- a/doc/man7/notmuch-search-terms.rst
+++ b/doc/man7/notmuch-search-terms.rst
@@ -52,6 +52,8 @@ indicate user-supplied values):

 -  date:..

+-  lastmod:..
+
 The **from:** prefix is used to match the name or address of the sender
 of an email message.

@@ -118,6 +120,12 @@ The time range can also be specified using timestamps with 
a syntax of:
 Each timestamp is a number representing the number of seconds since
 1970-01-01 00:00:00 UTC.

+The **lastmod:** prefix can be used to restrict the result by the
+database revision number of when messages were last modified (tags
+were added/removed or filenames changed).  This is usually used in
+conjunction with the **--db-revision** argument to **notmuch search**
+to find messages that have changed since an earlier query.
+
 In addition to individual terms, multiple terms can be combined with
 Boolean operators ( **and**, **or**, **not** , etc.). Each term in the
 query will be implicitly connected by a logical AND if no explicit
diff --git a/lib/database-private.h b/lib/database-private.h
index 0977229..cbca1de 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -163,6 +163,7 @@ struct _notmuch_database {
 Xapian::TermGenerator *term_gen;
 Xapian::ValueRangeProcessor *value_range_processor;
 Xapian::ValueRangeProcessor *date_range_processor;
+Xapian::ValueRangeProcessor *last_mod_range_processor;
 };

 /* Prior to database version 3, features were implied by the database
diff --git a/lib/database.cc b/lib/database.cc
index 9bec170..f9aa45d 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -913,6 +913,7 @@ notmuch_database_open (const char *path,
notmuch->term_gen->set_stemmer (Xapian::Stem ("english"));
notmuch->value_range_processor = new Xapian::NumberValueRangeProcessor 
(NOTMUCH_VALUE_TIMESTAMP);
notmuch->date_range_processor = new ParseTimeValueRangeProcessor 
(NOTMUCH_VALUE_TIMESTAMP);
+   notmuch->last_mod_range_processor = new 
Xapian::NumberValueRangeProcessor (NOTMUCH_VALUE_LAST_MOD, "lastmod:");

notmuch->query_parser->set_default_op (Xapian::Query::OP_AND);
notmuch->query_parser->set_database (*notmuch->xapian_db);
@@ -920,6 +921,7 @@ notmuch_database_open (const char *path,
notmuch->query_parser->set_stemming_strategy 
(Xapian::QueryParser::STEM_SOME);
notmuch->query_parser->add_valuerangeprocessor 
(notmuch->value_range_processor);
notmuch->query_parser->add_valuerangeprocessor 
(notmuch->date_range_processor);
+   notmuch->query_parser->add_valuerangeprocessor 
(notmuch->last_mod_range_processor);

for (i = 0; i < ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) {
prefix_t *prefix = _PREFIX_EXTERNAL[i];
@@ -991,6 +993,8 @@ notmuch_database_close (notmuch_database_t *notmuch)
 notmuch->value_range_processor = NULL;
 delete notmuch->date_range_processor;
 notmuch->date_range_processor = NULL;
+delete notmuch->last_mod_range_processor;
+notmuch->last_mod_range_processor = NULL;

 return status;
 }
-- 
2.1.0



[WIP PATCH 3/4] lib: API to retrieve database revision and UUID

2014-10-13 Thread Austin Clements
This exposes the committed database revision to library users along
with a UUID that can be used to detect when revision numbers are no
longer comparable (e.g., because the database has been replaced).
---
 lib/database-private.h |  1 +
 lib/database.cc| 11 +++
 lib/notmuch.h  | 18 ++
 3 files changed, 30 insertions(+)

diff --git a/lib/database-private.h b/lib/database-private.h
index 465065d..0977229 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -157,6 +157,7 @@ struct _notmuch_database {
  * under a higher revision number, which can be generated with
  * notmuch_database_new_revision. */
 unsigned long revision;
+const char *uuid;

 Xapian::QueryParser *query_parser;
 Xapian::TermGenerator *term_gen;
diff --git a/lib/database.cc b/lib/database.cc
index 45d32ab..9bec170 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -905,6 +905,8 @@ notmuch_database_open (const char *path,
notmuch->revision = 0;
else
notmuch->revision = Xapian::sortable_unserialise (last_mod);
+   notmuch->uuid = talloc_strdup (
+   notmuch, notmuch->xapian_db->get_uuid ().c_str ());

notmuch->query_parser = new Xapian::QueryParser;
notmuch->term_gen = new Xapian::TermGenerator;
@@ -1562,6 +1564,15 @@ DONE:
 return NOTMUCH_STATUS_SUCCESS;
 }

+unsigned long
+notmuch_database_get_revisison (notmuch_database_t *notmuch,
+   const char **uuid)
+{
+if (*uuid)
+   *uuid = notmuch->uuid;
+return notmuch->revision;
+}
+
 /* We allow the user to use arbitrarily long paths for directories. But
  * we have a term-length limit. So if we exceed that, we'll use the
  * SHA-1 of the path for the database term.
diff --git a/lib/notmuch.h b/lib/notmuch.h
index 92594b9..898f7b9 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -433,6 +433,24 @@ notmuch_status_t
 notmuch_database_end_atomic (notmuch_database_t *notmuch);

 /**
+ * Return the committed database revision and UUID.
+ *
+ * The database revision number increases monotonically with each
+ * commit to the database.  Hence, all messages and message changes
+ * committed to the database (that is, visible to readers) have a last
+ * modification revision <= the committed database revision.  Any
+ * messages committed in the future will be assigned a modification
+ * revision > the committed database revision.
+ *
+ * The UUID is a NUL-terminated opaque string that uniquely identifies
+ * this database.  Two revision numbers are only comparable if they
+ * have the same database UUID.
+ */
+unsigned long
+notmuch_database_get_revisison (notmuch_database_t *notmuch,
+   const char **uuid);
+
+/**
  * Retrieve a directory object from the database for 'path'.
  *
  * Here, 'path' should be a path relative to the path of 'database'
-- 
2.1.0



[WIP PATCH 2/4] lib: Add per-message last modification tracking

2014-10-13 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

This adds a new document value that stores the revision of the last
modification to message metadata, where the revision number increases
monotonically with each database commit.

An alternative would be to store the wall-clock time of the last
modification of each message.  In principle this is simpler and has
the advantage that any process can determine the current timestamp
without support from libnotmuch.  However, even assuming a computer's
clock never goes backward and ignoring clock skew in networked
environments, this has a fatal flaw.  Xapian uses (optimistic)
snapshot isolation, which means reads can be concurrent with writes.
Given this, consider the following time line with a write and two read
transactions:

   write  |-X-A--|
   read 1   |---B---|
   read 2  |---|

The write transaction modifies message X and records the wall-clock
time of the modification at A.  The writer hangs around for a while
and later commits its change.  Read 1 is concurrent with the write, so
it doesn't see the change to X.  It does some query and records the
wall-clock time of its results at B.  Transaction read 2 later starts
after the write commits and queries for changes since wall-clock time
B (say the reads are performing an incremental backup).  Even though
read 1 could not see the change to X, read 2 is told (correctly) that
X has not changed since B, the time of the last read.  In fact, X
changed before wall-clock time A, but the change was not visible until
*after* wall-clock time B, so read 2 misses the change to X.

This is tricky to solve in full-blown snapshot isolation, but because
Xapian serializes writes, we can use a simple, monotonically
increasing database revision number.  Furthermore, maintaining this
revision number requires no more IO than a wall-clock time solution
because Xapian already maintains statistics on the upper (and lower)
bound of each value stream.
---
 lib/database-private.h | 15 ++-
 lib/database.cc| 49 +++--
 lib/message.cc | 22 ++
 lib/notmuch-private.h  | 10 +-
 4 files changed, 92 insertions(+), 4 deletions(-)

diff --git a/lib/database-private.h b/lib/database-private.h
index 15e03cc..465065d 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -92,6 +92,12 @@ enum _notmuch_features {
  *
  * Introduced: version 3. */
 NOTMUCH_FEATURE_GHOSTS = 1 << 4,
+
+/* If set, messages store the revision number of the last
+ * modification in NOTMUCH_VALUE_LAST_MOD.
+ *
+ * Introduced: version 3. */
+NOTMUCH_FEATURE_LAST_MOD = 1 << 5,
 };

 /* In C++, a named enum is its own type, so define bitwise operators
@@ -137,6 +143,8 @@ struct _notmuch_database {

 notmuch_database_mode_t mode;
 int atomic_nesting;
+/* TRUE if changes have been made in this atomic section */
+notmuch_bool_t atomic_dirty;
 Xapian::Database *xapian_db;

 /* Bit mask of features used by this database.  This is a
@@ -145,6 +153,10 @@ struct _notmuch_database {

 unsigned int last_doc_id;
 uint64_t last_thread_id;
+/* Highest committed revision number.  Modifications are recorded
+ * under a higher revision number, which can be generated with
+ * notmuch_database_new_revision. */
+unsigned long revision;

 Xapian::QueryParser *query_parser;
 Xapian::TermGenerator *term_gen;
@@ -166,7 +178,8 @@ struct _notmuch_database {
  * databases will have it). */
 #define NOTMUCH_FEATURES_CURRENT \
 (NOTMUCH_FEATURE_FILE_TERMS | NOTMUCH_FEATURE_DIRECTORY_DOCS | \
- NOTMUCH_FEATURE_BOOL_FOLDER | NOTMUCH_FEATURE_GHOSTS)
+ NOTMUCH_FEATURE_BOOL_FOLDER | NOTMUCH_FEATURE_GHOSTS | \
+ NOTMUCH_FEATURE_LAST_MOD)

 /* Return the list of terms from the given iterator matching a prefix.
  * The prefix will be stripped from the strings in the returned list.
diff --git a/lib/database.cc b/lib/database.cc
index 6e51a72..45d32ab 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -101,6 +101,9 @@ typedef struct {
  *
  * SUBJECT:The value of the "Subject" header
  *
+ * LAST_MOD:   The revision number as of the last tag or
+ * filename change.
+ *
  * In addition, terms from the content of the message are added with
  * "from", "to", "attachment", and "subject" prefixes for use by the
  * user in searching. Similarly, terms from the path of the mail
@@ -304,6 +307,8 @@ static const struct {
   "exact folder:/path: search", "rw" },
 { NOTMUCH_FEATURE_GHOSTS,
   "mail documents for missing messages", "w"},
+{ NOTMUCH_FEATURE_LAST_MOD,
+  "modification tracking", "w"},
 };

 const char *
@@ -678,6 +683,23 @@ _notmuch_database_ensure_writable (notmuch_dat

[WIP PATCH 1/4] lib: Only sync modified message documents

2014-10-13 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

Previously, we updated the database copy of a message on every call to
_notmuch_message_sync, even if nothing had changed.  In particular,
this always happens on a thaw, so a freeze/thaw pair with no
modifications between still caused a database update.

We only modify message documents in a handful of places, so keep track
of whether the document has been modified and only sync it when
necessary.  This will be particularly important when we add message
revision tracking.
---
 lib/message.cc | 12 
 1 file changed, 12 insertions(+)

diff --git a/lib/message.cc b/lib/message.cc
index a7a13cc..cf2fd7c 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -43,6 +43,9 @@ struct visible _notmuch_message {
  * if each flag has been initialized. */
 unsigned long lazy_flags;

+/* Message document modified since last sync */
+notmuch_bool_t modified;
+
 Xapian::Document doc;
 Xapian::termcount termpos;
 };
@@ -538,6 +541,7 @@ _notmuch_message_remove_terms (notmuch_message_t *message, 
const char *prefix)

try {
message->doc.remove_term ((*i));
+   message->modified = TRUE;
} catch (const Xapian::InvalidArgumentError) {
/* Ignore failure to remove non-existent term. */
}
@@ -791,6 +795,7 @@ void
 _notmuch_message_clear_data (notmuch_message_t *message)
 {
 message->doc.set_data ("");
+message->modified = TRUE;
 }

 static void
@@ -988,6 +993,7 @@ _notmuch_message_set_header_values (notmuch_message_t 
*message,
Xapian::sortable_serialise (time_value));
 message->doc.add_value (NOTMUCH_VALUE_FROM, from);
 message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
+message->modified = TRUE;
 }

 /* Synchronize changes made to message->doc out into the database. */
@@ -999,8 +1005,12 @@ _notmuch_message_sync (notmuch_message_t *message)
 if (message->notmuch->mode == NOTMUCH_DATABASE_MODE_READ_ONLY)
return;

+if (! message->modified)
+   return;
+
 db = static_cast  
(message->notmuch->xapian_db);
 db->replace_document (message->doc_id, message->doc);
+message->modified = FALSE;
 }

 /* Delete a message document from the database. */
@@ -1075,6 +1085,7 @@ _notmuch_message_add_term (notmuch_message_t *message,
return NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG;

 message->doc.add_term (term, 0);
+message->modified = TRUE;

 talloc_free (term);

@@ -1143,6 +1154,7 @@ _notmuch_message_remove_term (notmuch_message_t *message,

 try {
message->doc.remove_term (term);
+   message->modified = TRUE;
 } catch (const Xapian::InvalidArgumentError) {
/* We'll let the philosopher's try to wrestle with the
 * question of whether failing to remove that which was not
-- 
2.1.0



[WIP PATCH 0/4] Add message revision tracking

2014-10-13 Thread Austin Clements
This implements message revision tracking.  This is definitely a
work-in-progress, but I wanted to post it since I don't know when I'll
be able to work on it next (and maybe someone else can run with it in
the mean time).  I think this makes all of the necessary library-side
changes, but doesn't do anything in the CLI to expose current revision
information other than adding support for a "lastmod" query.

This series applies on top of the ghost message series, but only
because of trivial conflicts in the lists of features in database.cc
and database-private.h.  There's no code dependency.



[WIP PATCH 0/4] Add message revision tracking

2014-10-13 Thread Austin Clements
This implements message revision tracking.  This is definitely a
work-in-progress, but I wanted to post it since I don't know when I'll
be able to work on it next (and maybe someone else can run with it in
the mean time).  I think this makes all of the necessary library-side
changes, but doesn't do anything in the CLI to expose current revision
information other than adding support for a lastmod query.

This series applies on top of the ghost message series, but only
because of trivial conflicts in the lists of features in database.cc
and database-private.h.  There's no code dependency.

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[WIP PATCH 3/4] lib: API to retrieve database revision and UUID

2014-10-13 Thread Austin Clements
This exposes the committed database revision to library users along
with a UUID that can be used to detect when revision numbers are no
longer comparable (e.g., because the database has been replaced).
---
 lib/database-private.h |  1 +
 lib/database.cc| 11 +++
 lib/notmuch.h  | 18 ++
 3 files changed, 30 insertions(+)

diff --git a/lib/database-private.h b/lib/database-private.h
index 465065d..0977229 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -157,6 +157,7 @@ struct _notmuch_database {
  * under a higher revision number, which can be generated with
  * notmuch_database_new_revision. */
 unsigned long revision;
+const char *uuid;
 
 Xapian::QueryParser *query_parser;
 Xapian::TermGenerator *term_gen;
diff --git a/lib/database.cc b/lib/database.cc
index 45d32ab..9bec170 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -905,6 +905,8 @@ notmuch_database_open (const char *path,
notmuch-revision = 0;
else
notmuch-revision = Xapian::sortable_unserialise (last_mod);
+   notmuch-uuid = talloc_strdup (
+   notmuch, notmuch-xapian_db-get_uuid ().c_str ());
 
notmuch-query_parser = new Xapian::QueryParser;
notmuch-term_gen = new Xapian::TermGenerator;
@@ -1562,6 +1564,15 @@ DONE:
 return NOTMUCH_STATUS_SUCCESS;
 }
 
+unsigned long
+notmuch_database_get_revisison (notmuch_database_t *notmuch,
+   const char **uuid)
+{
+if (*uuid)
+   *uuid = notmuch-uuid;
+return notmuch-revision;
+}
+
 /* We allow the user to use arbitrarily long paths for directories. But
  * we have a term-length limit. So if we exceed that, we'll use the
  * SHA-1 of the path for the database term.
diff --git a/lib/notmuch.h b/lib/notmuch.h
index 92594b9..898f7b9 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -433,6 +433,24 @@ notmuch_status_t
 notmuch_database_end_atomic (notmuch_database_t *notmuch);
 
 /**
+ * Return the committed database revision and UUID.
+ *
+ * The database revision number increases monotonically with each
+ * commit to the database.  Hence, all messages and message changes
+ * committed to the database (that is, visible to readers) have a last
+ * modification revision = the committed database revision.  Any
+ * messages committed in the future will be assigned a modification
+ * revision  the committed database revision.
+ *
+ * The UUID is a NUL-terminated opaque string that uniquely identifies
+ * this database.  Two revision numbers are only comparable if they
+ * have the same database UUID.
+ */
+unsigned long
+notmuch_database_get_revisison (notmuch_database_t *notmuch,
+   const char **uuid);
+
+/**
  * Retrieve a directory object from the database for 'path'.
  *
  * Here, 'path' should be a path relative to the path of 'database'
-- 
2.1.0

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[WIP PATCH 4/4] lib: Add lastmod: queries for filtering by last modification

2014-10-13 Thread Austin Clements
From: Austin Clements amdra...@mit.edu

XXX Includes reference to notmuch search --db-revision, which doesn't
exist.
---
 doc/man7/notmuch-search-terms.rst | 8 
 lib/database-private.h| 1 +
 lib/database.cc   | 4 
 3 files changed, 13 insertions(+)

diff --git a/doc/man7/notmuch-search-terms.rst 
b/doc/man7/notmuch-search-terms.rst
index 1acdaa0..df76e39 100644
--- a/doc/man7/notmuch-search-terms.rst
+++ b/doc/man7/notmuch-search-terms.rst
@@ -52,6 +52,8 @@ indicate user-supplied values):
 
 -  date:since..until
 
+-  lastmod:since..until
+
 The **from:** prefix is used to match the name or address of the sender
 of an email message.
 
@@ -118,6 +120,12 @@ The time range can also be specified using timestamps with 
a syntax of:
 Each timestamp is a number representing the number of seconds since
 1970-01-01 00:00:00 UTC.
 
+The **lastmod:** prefix can be used to restrict the result by the
+database revision number of when messages were last modified (tags
+were added/removed or filenames changed).  This is usually used in
+conjunction with the **--db-revision** argument to **notmuch search**
+to find messages that have changed since an earlier query.
+
 In addition to individual terms, multiple terms can be combined with
 Boolean operators ( **and**, **or**, **not** , etc.). Each term in the
 query will be implicitly connected by a logical AND if no explicit
diff --git a/lib/database-private.h b/lib/database-private.h
index 0977229..cbca1de 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -163,6 +163,7 @@ struct _notmuch_database {
 Xapian::TermGenerator *term_gen;
 Xapian::ValueRangeProcessor *value_range_processor;
 Xapian::ValueRangeProcessor *date_range_processor;
+Xapian::ValueRangeProcessor *last_mod_range_processor;
 };
 
 /* Prior to database version 3, features were implied by the database
diff --git a/lib/database.cc b/lib/database.cc
index 9bec170..f9aa45d 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -913,6 +913,7 @@ notmuch_database_open (const char *path,
notmuch-term_gen-set_stemmer (Xapian::Stem (english));
notmuch-value_range_processor = new Xapian::NumberValueRangeProcessor 
(NOTMUCH_VALUE_TIMESTAMP);
notmuch-date_range_processor = new ParseTimeValueRangeProcessor 
(NOTMUCH_VALUE_TIMESTAMP);
+   notmuch-last_mod_range_processor = new 
Xapian::NumberValueRangeProcessor (NOTMUCH_VALUE_LAST_MOD, lastmod:);
 
notmuch-query_parser-set_default_op (Xapian::Query::OP_AND);
notmuch-query_parser-set_database (*notmuch-xapian_db);
@@ -920,6 +921,7 @@ notmuch_database_open (const char *path,
notmuch-query_parser-set_stemming_strategy 
(Xapian::QueryParser::STEM_SOME);
notmuch-query_parser-add_valuerangeprocessor 
(notmuch-value_range_processor);
notmuch-query_parser-add_valuerangeprocessor 
(notmuch-date_range_processor);
+   notmuch-query_parser-add_valuerangeprocessor 
(notmuch-last_mod_range_processor);
 
for (i = 0; i  ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) {
prefix_t *prefix = BOOLEAN_PREFIX_EXTERNAL[i];
@@ -991,6 +993,8 @@ notmuch_database_close (notmuch_database_t *notmuch)
 notmuch-value_range_processor = NULL;
 delete notmuch-date_range_processor;
 notmuch-date_range_processor = NULL;
+delete notmuch-last_mod_range_processor;
+notmuch-last_mod_range_processor = NULL;
 
 return status;
 }
-- 
2.1.0

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[WIP PATCH 2/4] lib: Add per-message last modification tracking

2014-10-13 Thread Austin Clements
From: Austin Clements amdra...@mit.edu

This adds a new document value that stores the revision of the last
modification to message metadata, where the revision number increases
monotonically with each database commit.

An alternative would be to store the wall-clock time of the last
modification of each message.  In principle this is simpler and has
the advantage that any process can determine the current timestamp
without support from libnotmuch.  However, even assuming a computer's
clock never goes backward and ignoring clock skew in networked
environments, this has a fatal flaw.  Xapian uses (optimistic)
snapshot isolation, which means reads can be concurrent with writes.
Given this, consider the following time line with a write and two read
transactions:

   write  |-X-A--|
   read 1   |---B---|
   read 2  |---|

The write transaction modifies message X and records the wall-clock
time of the modification at A.  The writer hangs around for a while
and later commits its change.  Read 1 is concurrent with the write, so
it doesn't see the change to X.  It does some query and records the
wall-clock time of its results at B.  Transaction read 2 later starts
after the write commits and queries for changes since wall-clock time
B (say the reads are performing an incremental backup).  Even though
read 1 could not see the change to X, read 2 is told (correctly) that
X has not changed since B, the time of the last read.  In fact, X
changed before wall-clock time A, but the change was not visible until
*after* wall-clock time B, so read 2 misses the change to X.

This is tricky to solve in full-blown snapshot isolation, but because
Xapian serializes writes, we can use a simple, monotonically
increasing database revision number.  Furthermore, maintaining this
revision number requires no more IO than a wall-clock time solution
because Xapian already maintains statistics on the upper (and lower)
bound of each value stream.
---
 lib/database-private.h | 15 ++-
 lib/database.cc| 49 +++--
 lib/message.cc | 22 ++
 lib/notmuch-private.h  | 10 +-
 4 files changed, 92 insertions(+), 4 deletions(-)

diff --git a/lib/database-private.h b/lib/database-private.h
index 15e03cc..465065d 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -92,6 +92,12 @@ enum _notmuch_features {
  *
  * Introduced: version 3. */
 NOTMUCH_FEATURE_GHOSTS = 1  4,
+
+/* If set, messages store the revision number of the last
+ * modification in NOTMUCH_VALUE_LAST_MOD.
+ *
+ * Introduced: version 3. */
+NOTMUCH_FEATURE_LAST_MOD = 1  5,
 };
 
 /* In C++, a named enum is its own type, so define bitwise operators
@@ -137,6 +143,8 @@ struct _notmuch_database {
 
 notmuch_database_mode_t mode;
 int atomic_nesting;
+/* TRUE if changes have been made in this atomic section */
+notmuch_bool_t atomic_dirty;
 Xapian::Database *xapian_db;
 
 /* Bit mask of features used by this database.  This is a
@@ -145,6 +153,10 @@ struct _notmuch_database {
 
 unsigned int last_doc_id;
 uint64_t last_thread_id;
+/* Highest committed revision number.  Modifications are recorded
+ * under a higher revision number, which can be generated with
+ * notmuch_database_new_revision. */
+unsigned long revision;
 
 Xapian::QueryParser *query_parser;
 Xapian::TermGenerator *term_gen;
@@ -166,7 +178,8 @@ struct _notmuch_database {
  * databases will have it). */
 #define NOTMUCH_FEATURES_CURRENT \
 (NOTMUCH_FEATURE_FILE_TERMS | NOTMUCH_FEATURE_DIRECTORY_DOCS | \
- NOTMUCH_FEATURE_BOOL_FOLDER | NOTMUCH_FEATURE_GHOSTS)
+ NOTMUCH_FEATURE_BOOL_FOLDER | NOTMUCH_FEATURE_GHOSTS | \
+ NOTMUCH_FEATURE_LAST_MOD)
 
 /* Return the list of terms from the given iterator matching a prefix.
  * The prefix will be stripped from the strings in the returned list.
diff --git a/lib/database.cc b/lib/database.cc
index 6e51a72..45d32ab 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -101,6 +101,9 @@ typedef struct {
  *
  * SUBJECT:The value of the Subject header
  *
+ * LAST_MOD:   The revision number as of the last tag or
+ * filename change.
+ *
  * In addition, terms from the content of the message are added with
  * from, to, attachment, and subject prefixes for use by the
  * user in searching. Similarly, terms from the path of the mail
@@ -304,6 +307,8 @@ static const struct {
   exact folder:/path: search, rw },
 { NOTMUCH_FEATURE_GHOSTS,
   mail documents for missing messages, w},
+{ NOTMUCH_FEATURE_LAST_MOD,
+  modification tracking, w},
 };
 
 const char *
@@ -678,6 +683,23 @@ _notmuch_database_ensure_writable (notmuch_database_t 
*notmuch)
 return NOTMUCH_STATUS_SUCCESS;
 }
 
+/* Allocate a revision number for the next change. */
+unsigned long
+_notmuch_database_new_revision

[WIP PATCH 1/4] lib: Only sync modified message documents

2014-10-13 Thread Austin Clements
From: Austin Clements amdra...@mit.edu

Previously, we updated the database copy of a message on every call to
_notmuch_message_sync, even if nothing had changed.  In particular,
this always happens on a thaw, so a freeze/thaw pair with no
modifications between still caused a database update.

We only modify message documents in a handful of places, so keep track
of whether the document has been modified and only sync it when
necessary.  This will be particularly important when we add message
revision tracking.
---
 lib/message.cc | 12 
 1 file changed, 12 insertions(+)

diff --git a/lib/message.cc b/lib/message.cc
index a7a13cc..cf2fd7c 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -43,6 +43,9 @@ struct visible _notmuch_message {
  * if each flag has been initialized. */
 unsigned long lazy_flags;
 
+/* Message document modified since last sync */
+notmuch_bool_t modified;
+
 Xapian::Document doc;
 Xapian::termcount termpos;
 };
@@ -538,6 +541,7 @@ _notmuch_message_remove_terms (notmuch_message_t *message, 
const char *prefix)
 
try {
message-doc.remove_term ((*i));
+   message-modified = TRUE;
} catch (const Xapian::InvalidArgumentError) {
/* Ignore failure to remove non-existent term. */
}
@@ -791,6 +795,7 @@ void
 _notmuch_message_clear_data (notmuch_message_t *message)
 {
 message-doc.set_data ();
+message-modified = TRUE;
 }
 
 static void
@@ -988,6 +993,7 @@ _notmuch_message_set_header_values (notmuch_message_t 
*message,
Xapian::sortable_serialise (time_value));
 message-doc.add_value (NOTMUCH_VALUE_FROM, from);
 message-doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
+message-modified = TRUE;
 }
 
 /* Synchronize changes made to message-doc out into the database. */
@@ -999,8 +1005,12 @@ _notmuch_message_sync (notmuch_message_t *message)
 if (message-notmuch-mode == NOTMUCH_DATABASE_MODE_READ_ONLY)
return;
 
+if (! message-modified)
+   return;
+
 db = static_cast Xapian::WritableDatabase * 
(message-notmuch-xapian_db);
 db-replace_document (message-doc_id, message-doc);
+message-modified = FALSE;
 }
 
 /* Delete a message document from the database. */
@@ -1075,6 +1085,7 @@ _notmuch_message_add_term (notmuch_message_t *message,
return NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG;
 
 message-doc.add_term (term, 0);
+message-modified = TRUE;
 
 talloc_free (term);
 
@@ -1143,6 +1154,7 @@ _notmuch_message_remove_term (notmuch_message_t *message,
 
 try {
message-doc.remove_term (term);
+   message-modified = TRUE;
 } catch (const Xapian::InvalidArgumentError) {
/* We'll let the philosopher's try to wrestle with the
 * question of whether failing to remove that which was not
-- 
2.1.0

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v2 12/12] lib: Remove unnecessary thread linking steps when using ghost messages

2014-10-06 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

Previously, it was necessary to link new messages to children to work
around some (though not all) problems with the old metadata-based
approach to stored thread IDs.  With ghost messages, this is no longer
necessary, so don't bother with child linking when ghost messages are
in use.
---
 lib/database.cc | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 1316529..6e51a72 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -2169,10 +2169,23 @@ _notmuch_database_link_message (notmuch_database_t 
*notmuch,
 if (status)
goto DONE;

-status = _notmuch_database_link_message_to_children (notmuch, message,
-_id);
-if (status)
-   goto DONE;
+if (! (notmuch->features & NOTMUCH_FEATURE_GHOSTS)) {
+   /* In general, it shouldn't be necessary to link children,
+* since the earlier indexing of those children will have
+* stored a thread ID for the missing parent.  However, prior
+* to ghost messages, these stored thread IDs were NOT
+* rewritten during thread merging (and there was no
+* performant way to do so), so if indexed children were
+* pulled into a different thread ID by a merge, it was
+* necessary to pull them *back* into the stored thread ID of
+* the parent.  With ghost messages, we just rewrite the
+* stored thread IDs during merging, so this workaround isn't
+* necessary. */
+   status = _notmuch_database_link_message_to_children (notmuch, message,
+_id);
+   if (status)
+   goto DONE;
+}

 /* If not part of any existing thread, generate a new thread ID. */
 if (thread_id == NULL) {
-- 
2.1.0



[PATCH v2 11/12] test: Test upgrade to ghost messages feature

2014-10-06 Thread Austin Clements
---
 test/T530-upgrade.sh | 21 +
 1 file changed, 21 insertions(+)

diff --git a/test/T530-upgrade.sh b/test/T530-upgrade.sh
index c4c4ac8..6b42a69 100755
--- a/test/T530-upgrade.sh
+++ b/test/T530-upgrade.sh
@@ -116,4 +116,25 @@ MAIL_DIR/bar/new/21:2,
 MAIL_DIR/bar/new/22:2,
 MAIL_DIR/cur/51:2,"

+# Ghost messages are difficult to test since they're nearly invisible.
+# However, if the upgrade works correctly, the ghost message should
+# retain the right thread ID even if all of the original messages in
+# the thread are deleted.  That's what we test.  This won't detect if
+# the upgrade just plain didn't happen, but it should detect if
+# something went wrong.
+test_begin_subtest "ghost message retains thread ID"
+# Upgrade database
+notmuch new
+# Get thread ID of real message
+thread=$(notmuch search --output=threads id:4EFC743A.3060609 at april.org)
+# Delete all real messages in that thread
+rm $(notmuch search --output=files $thread)
+notmuch new
+# "Deliver" ghost message
+add_message '[subject]=Ghost' '[id]=4EFC3931.6030007 at april.org'
+# If the ghost upgrade worked, the new message should be attached to
+# the existing thread ID.
+nthread=$(notmuch search --output=threads id:4EFC3931.6030007 at april.org)
+test_expect_equal "$thread" "$nthread"
+
 test_done
-- 
2.1.0



[PATCH v2 10/12] lib: Enable ghost messages feature

2014-10-06 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

This fixes the broken thread order test.
---
 lib/database-private.h| 2 +-
 test/T260-thread-order.sh | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/database-private.h b/lib/database-private.h
index e2e4bc8..15e03cc 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -166,7 +166,7 @@ struct _notmuch_database {
  * databases will have it). */
 #define NOTMUCH_FEATURES_CURRENT \
 (NOTMUCH_FEATURE_FILE_TERMS | NOTMUCH_FEATURE_DIRECTORY_DOCS | \
- NOTMUCH_FEATURE_BOOL_FOLDER)
+ NOTMUCH_FEATURE_BOOL_FOLDER | NOTMUCH_FEATURE_GHOSTS)

 /* Return the list of terms from the given iterator matching a prefix.
  * The prefix will be stripped from the strings in the returned list.
diff --git a/test/T260-thread-order.sh b/test/T260-thread-order.sh
index b435d79..99f5833 100755
--- a/test/T260-thread-order.sh
+++ b/test/T260-thread-order.sh
@@ -30,7 +30,6 @@ expected=$(for ((i = 0; i < $nthreads; i++)); do
 test_expect_equal "$output" "$expected"

 test_begin_subtest "Messages with all parents get linked in all delivery 
orders"
-test_subtest_known_broken
 # Here we do the same thing as the previous test, but each message
 # references all of its parents.  Since every message references the
 # root of the thread, each thread should always be fully joined.  This
-- 
2.1.0



[PATCH v2 09/12] lib: Implement upgrade to ghost messages feature

2014-10-06 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

Somehow this is the first upgrade pass that actually does *any* error
checking, so this also adds the bit of necessary infrastructure to
handle that.
---
 lib/database.cc | 66 +++--
 1 file changed, 64 insertions(+), 2 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index fdcc526..1316529 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1231,6 +1231,7 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
 notmuch_bool_t timer_is_active = FALSE;
 enum _notmuch_features target_features, new_features;
 notmuch_status_t status;
+notmuch_private_status_t private_status;
 unsigned int count = 0, total = 0;

 status = _notmuch_database_ensure_writable (notmuch);
@@ -1275,6 +1276,13 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
for (t = db->allterms_begin ("XTIMESTAMP"); t != t_end; t++)
++total;
 }
+if (new_features & NOTMUCH_FEATURE_GHOSTS) {
+   /* The ghost message upgrade converts all thread_id_*
+* metadata values into ghost message documents. */
+   t_end = db->metadata_keys_end ("thread_id_");
+   for (t = db->metadata_keys_begin ("thread_id_"); t != t_end; ++t)
+   ++total;
+}

 /* Perform the upgrade in a transaction. */
 db->begin_transaction (true);
@@ -1378,10 +1386,64 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
}
 }

+/* Perform metadata upgrades. */
+
+/* Prior to NOTMUCH_FEATURE_GHOSTS, thread IDs for missing
+ * messages were stored as database metadata. Change these to
+ * ghost messages.
+ */
+if (new_features & NOTMUCH_FEATURE_GHOSTS) {
+   notmuch_message_t *message;
+   std::string message_id, thread_id;
+
+   t_end = db->metadata_keys_end (NOTMUCH_METADATA_THREAD_ID_PREFIX);
+   for (t = db->metadata_keys_begin (NOTMUCH_METADATA_THREAD_ID_PREFIX);
+t != t_end; ++t) {
+   if (do_progress_notify) {
+   progress_notify (closure, (double) count / total);
+   do_progress_notify = 0;
+   }
+
+   message_id = (*t).substr (
+   strlen (NOTMUCH_METADATA_THREAD_ID_PREFIX));
+   thread_id = db->get_metadata (*t);
+
+   /* Create ghost message */
+   message = _notmuch_message_create_for_message_id (
+   notmuch, message_id.c_str (), _status);
+   if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
+   /* Document already exists; ignore the stored thread ID */
+   } else if (private_status ==
+  NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+   private_status = _notmuch_message_initialize_ghost (
+   message, thread_id.c_str ());
+   if (! private_status)
+   _notmuch_message_sync (message);
+   }
+
+   if (private_status) {
+   fprintf (stderr,
+"Upgrade failed while creating ghost messages.\n");
+   status = COERCE_STATUS (private_status, "Unexpected status from 
_notmuch_message_initialize_ghost");
+   goto DONE;
+   }
+
+   /* Clear saved metadata thread ID */
+   db->set_metadata (*t, "");
+
+   ++count;
+   }
+}
+
+status = NOTMUCH_STATUS_SUCCESS;
 db->set_metadata ("features", _print_features (local, notmuch->features));
 db->set_metadata ("version", STRINGIFY (NOTMUCH_DATABASE_VERSION));

-db->commit_transaction ();
+ DONE:
+if (status == NOTMUCH_STATUS_SUCCESS)
+   db->commit_transaction ();
+else
+   db->cancel_transaction ();

 if (timer_is_active) {
/* Now stop the timer. */
@@ -1397,7 +1459,7 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
 }

 talloc_free (local);
-return NOTMUCH_STATUS_SUCCESS;
+return status;
 }

 notmuch_status_t
-- 
2.1.0



[PATCH v2 08/12] lib: Implement ghost-based thread linking

2014-10-06 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

This updates the thread linking code to use ghost messages instead of
user metadata to link messages into threads.

In contrast with the old approach, this is actually correct.
Previously, thread merging updated only the thread IDs of message
documents, not thread IDs stored in user metadata.  As originally
diagnosed by Mark Walters [1] and as demonstrated by the broken
T260-thread-order test, this can cause notmuch to fail to link
messages even though they're in the same thread.  In principle the old
approach could have been fixed by updating the user metadata thread
IDs as well, but these are not indexed and hence this would have
required a full scan of all stored thread IDs.  Ghost messages solve
this problem naturally by reusing the exact same thread ID and message
ID representation and indexing as regular messages.

Furthermore, thanks to this greater symmetry, ghost messages are also
algorithmically simpler.  We continue to support the old user metadata
format, so this patch can't delete any code, but when we do remove
support for the old format, several functions can simply be deleted.

[1] id:8738h7kv2q.fsf at qmul.ac.uk
---
 lib/database.cc | 86 +
 1 file changed, 75 insertions(+), 11 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index c641bcd..fdcc526 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1752,6 +1752,12 @@ _get_metadata_thread_id_key (void *ctx, const char 
*message_id)
message_id);
 }

+static notmuch_status_t
+_resolve_message_id_to_thread_id_old (notmuch_database_t *notmuch,
+ void *ctx,
+ const char *message_id,
+ const char **thread_id_ret);
+
 /* Find the thread ID to which the message with 'message_id' belongs.
  *
  * Note: 'thread_id_ret' must not be NULL!
@@ -1760,9 +1766,9 @@ _get_metadata_thread_id_key (void *ctx, const char 
*message_id)
  *
  * Note: If there is no message in the database with the given
  * 'message_id' then a new thread_id will be allocated for this
- * message and stored in the database metadata, (where this same
+ * message ID and stored in the database metadata so that the
  * thread ID can be looked up if the message is added to the database
- * later).
+ * later.
  */
 static notmuch_status_t
 _resolve_message_id_to_thread_id (notmuch_database_t *notmuch,
@@ -1770,6 +1776,49 @@ _resolve_message_id_to_thread_id (notmuch_database_t 
*notmuch,
  const char *message_id,
  const char **thread_id_ret)
 {
+notmuch_private_status_t status;
+notmuch_message_t *message;
+
+if (! (notmuch->features & NOTMUCH_FEATURE_GHOSTS))
+   return _resolve_message_id_to_thread_id_old (notmuch, ctx, message_id,
+thread_id_ret);
+
+/* Look for this message (regular or ghost) */
+message = _notmuch_message_create_for_message_id (
+   notmuch, message_id, );
+if (status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
+   /* Message exists */
+   *thread_id_ret = talloc_steal (
+   ctx, notmuch_message_get_thread_id (message));
+} else if (status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+   /* Message did not exist.  Give it a fresh thread ID and
+* populate this message as a ghost message. */
+   *thread_id_ret = talloc_strdup (
+   ctx, _notmuch_database_generate_thread_id (notmuch));
+   if (! *thread_id_ret) {
+   status = NOTMUCH_PRIVATE_STATUS_OUT_OF_MEMORY;
+   } else {
+   status = _notmuch_message_initialize_ghost (message, 
*thread_id_ret);
+   if (status == 0)
+   /* Commit the new ghost message */
+   _notmuch_message_sync (message);
+   }
+} else {
+   /* Create failed. Fall through. */
+}
+
+notmuch_message_destroy (message);
+
+return COERCE_STATUS (status, "Error creating ghost message");
+}
+
+/* Pre-ghost messages _resolve_message_id_to_thread_id */
+static notmuch_status_t
+_resolve_message_id_to_thread_id_old (notmuch_database_t *notmuch,
+ void *ctx,
+ const char *message_id,
+ const char **thread_id_ret)
+{
 notmuch_status_t status;
 notmuch_message_t *message;
 string thread_id_string;
@@ -2007,7 +2056,7 @@ _consume_metadata_thread_id (void *ctx, 
notmuch_database_t *notmuch,
 }
 }

-/* Given a (mostly empty) 'message' and its corresponding
+/* Given a blank or ghost 'message' and its corresponding
  * 'message_file' link it to existing threads in the database.
  *
  * The first check is in the metadata of the database to see if we
@@ -2035,16 +2084,22 @@ _consume_metadata_thread_id (void *c

[PATCH v2 07/12] lib: Internal support for querying and creating ghost messages

2014-10-06 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

This updates the message abstraction to support ghost messages: it
adds a message flag that distinguishes regular messages from ghost
messages, and an internal function for initializing a newly created
(blank) message as a ghost message.
---
 lib/message.cc| 52 +--
 lib/notmuch-private.h |  4 
 lib/notmuch.h |  9 -
 3 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 55d2ff6..a7a13cc 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -39,6 +39,9 @@ struct visible _notmuch_message {
 notmuch_message_file_t *message_file;
 notmuch_message_list_t *replies;
 unsigned long flags;
+/* For flags that are initialized on-demand, lazy_flags indicates
+ * if each flag has been initialized. */
+unsigned long lazy_flags;

 Xapian::Document doc;
 Xapian::termcount termpos;
@@ -99,6 +102,7 @@ _notmuch_message_create_for_document (const void 
*talloc_owner,

 message->frozen = 0;
 message->flags = 0;
+message->lazy_flags = 0;

 /* Each of these will be lazily created as needed. */
 message->message_id = NULL;
@@ -192,7 +196,7 @@ _notmuch_message_create (const void *talloc_owner,
  *
  * There is already a document with message ID 'message_id' in the
  * database. The returned message can be used to query/modify the
- * document.
+ * document. The message may be a ghost message.
  *
  *   NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND:
  *
@@ -305,6 +309,7 @@ _notmuch_message_ensure_metadata (notmuch_message_t 
*message)
 const char *thread_prefix = _find_prefix ("thread"),
*tag_prefix = _find_prefix ("tag"),
*id_prefix = _find_prefix ("id"),
+   *type_prefix = _find_prefix ("type"),
*filename_prefix = _find_prefix ("file-direntry"),
*replyto_prefix = _find_prefix ("replyto");

@@ -337,10 +342,25 @@ _notmuch_message_ensure_metadata (notmuch_message_t 
*message)
message->message_id =
_notmuch_message_get_term (message, i, end, id_prefix);

+/* Get document type */
+assert (strcmp (id_prefix, type_prefix) < 0);
+if (! NOTMUCH_TEST_BIT (message->lazy_flags, NOTMUCH_MESSAGE_FLAG_GHOST)) {
+   i.skip_to (type_prefix);
+   /* "T" is the prefix "type" fields.  See
+* BOOLEAN_PREFIX_INTERNAL. */
+   if (*i == "Tmail")
+   NOTMUCH_CLEAR_BIT (>flags, NOTMUCH_MESSAGE_FLAG_GHOST);
+   else if (*i == "Tghost")
+   NOTMUCH_SET_BIT (>flags, NOTMUCH_MESSAGE_FLAG_GHOST);
+   else
+   INTERNAL_ERROR ("Message without type term");
+   NOTMUCH_SET_BIT (>lazy_flags, NOTMUCH_MESSAGE_FLAG_GHOST);
+}
+
 /* Get filename list.  Here we get only the terms.  We lazily
  * expand them to full file names when needed in
  * _notmuch_message_ensure_filename_list. */
-assert (strcmp (id_prefix, filename_prefix) < 0);
+assert (strcmp (type_prefix, filename_prefix) < 0);
 if (!message->filename_term_list && !message->filename_list)
message->filename_term_list =
_notmuch_database_get_terms_with_prefix (message, i, end,
@@ -371,6 +391,11 @@ _notmuch_message_invalidate_metadata (notmuch_message_t 
*message,
message->tag_list = NULL;
 }

+if (strcmp ("type", prefix_name) == 0) {
+   NOTMUCH_CLEAR_BIT (>flags, NOTMUCH_MESSAGE_FLAG_GHOST);
+   NOTMUCH_CLEAR_BIT (>lazy_flags, NOTMUCH_MESSAGE_FLAG_GHOST);
+}
+
 if (strcmp ("file-direntry", prefix_name) == 0) {
talloc_free (message->filename_term_list);
talloc_free (message->filename_list);
@@ -869,6 +894,10 @@ notmuch_bool_t
 notmuch_message_get_flag (notmuch_message_t *message,
  notmuch_message_flag_t flag)
 {
+if (flag == NOTMUCH_MESSAGE_FLAG_GHOST &&
+   ! NOTMUCH_TEST_BIT (message->lazy_flags, flag))
+   _notmuch_message_ensure_metadata (message);
+
 return NOTMUCH_TEST_BIT (message->flags, flag);
 }

@@ -880,6 +909,7 @@ notmuch_message_set_flag (notmuch_message_t *message,
NOTMUCH_SET_BIT (>flags, flag);
 else
NOTMUCH_CLEAR_BIT (>flags, flag);
+NOTMUCH_SET_BIT (>lazy_flags, flag);
 }

 time_t
@@ -989,6 +1019,24 @@ _notmuch_message_delete (notmuch_message_t *message)
 return NOTMUCH_STATUS_SUCCESS;
 }

+/* Transform a blank message into a ghost message.  The caller must
+ * _notmuch_message_sync the message. */
+notmuch_private_status_t
+_notmuch_message_initialize_ghost (notmuch_message_t *message,
+  const char *thread_id)
+{
+notmuch_private_status_t status;
+
+status = _notmuch_message_add_term (message, "type", 

[PATCH v2 06/12] lib: Introduce macros for bit operations

2014-10-06 Thread Austin Clements
These macros help clarify basic bit-twiddling code and are written to
be robust against C undefined behavior of shift operators.
---
 lib/message.cc|  6 +++---
 lib/notmuch-private.h | 11 +++
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 38bc929..55d2ff6 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -869,7 +869,7 @@ notmuch_bool_t
 notmuch_message_get_flag (notmuch_message_t *message,
  notmuch_message_flag_t flag)
 {
-return message->flags & (1 << flag);
+return NOTMUCH_TEST_BIT (message->flags, flag);
 }

 void
@@ -877,9 +877,9 @@ notmuch_message_set_flag (notmuch_message_t *message,
  notmuch_message_flag_t flag, notmuch_bool_t enable)
 {
 if (enable)
-   message->flags |= (1 << flag);
+   NOTMUCH_SET_BIT (>flags, flag);
 else
-   message->flags &= ~(1 << flag);
+   NOTMUCH_CLEAR_BIT (>flags, flag);
 }

 time_t
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 36cc12b..7250291 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -63,6 +63,17 @@ NOTMUCH_BEGIN_DECLS
 #define STRNCMP_LITERAL(var, literal) \
 strncmp ((var), (literal), sizeof (literal) - 1)

+/* Robust bit test/set/reset macros */
+#define NOTMUCH_TEST_BIT(val, bit) \
+((bit < 0 || bit >= CHAR_BIT * sizeof (unsigned long long)) ? 0\
+ : !!((val) & (1ull << bit)))
+#define NOTMUCH_SET_BIT(valp, bit) \
+((bit < 0 || bit >= CHAR_BIT * sizeof (unsigned long long)) ? *(valp) \
+ : (*(valp) |= (1ull << bit)))
+#define NOTMUCH_CLEAR_BIT(valp,  bit) \
+((bit < 0 || bit >= CHAR_BIT * sizeof (unsigned long long)) ? *(valp) \
+ : (*(valp) &= ~(1ull << bit)))
+
 #define unused(x) x __attribute__ ((unused))

 #ifdef __cplusplus
-- 
2.1.0



[PATCH v2 05/12] lib: Update database schema doc for ghost messages

2014-10-06 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

This describes the structure of ghost mail documents.  Ghost messages
are not yet implemented.
---
 lib/database.cc | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 8fd7fad..c641bcd 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -50,8 +50,8 @@ typedef struct {

 /* Here's the current schema for our database (for NOTMUCH_DATABASE_VERSION):
  *
- * We currently have two different types of documents (mail and
- * directory) and also some metadata.
+ * We currently have three different types of documents (mail, ghost,
+ * and directory) and also some metadata.
  *
  * Mail document
  * -
@@ -109,6 +109,15 @@ typedef struct {
  *
  * The data portion of a mail document is empty.
  *
+ * Ghost mail document [if NOTMUCH_FEATURE_GHOSTS]
+ * ---
+ * A ghost mail document is like a mail document, but where we don't
+ * have the message content.  These are used to track thread reference
+ * information for messages we haven't received.
+ *
+ * A ghost mail document has type: ghost; id and thread fields that
+ * are identical to the mail document fields; and a MESSAGE_ID value.
+ *
  * Directory document
  * --
  * A directory document is used by a client of the notmuch library to
@@ -172,6 +181,13 @@ typedef struct {
  * generated is 1 and the value will be
  * incremented for each thread ID.
  *
+ * Obsolete metadata
+ * -
+ *
+ * If ! NOTMUCH_FEATURE_GHOSTS, there are no ghost mail documents.
+ * Instead, the database has the following additional database
+ * metadata:
+ *
  * thread_id_* A pre-allocated thread ID for a particular
  * message. This is actually an arbitrarily large
  * family of metadata name. Any particular name is
-- 
2.1.0



[PATCH v2 04/12] lib: Add a ghost messages database feature

2014-10-06 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

This will be implemented over the next several patches.  The feature
is not yet "enabled" (this does not add it to
NOTMUCH_FEATURES_CURRENT).
---
 lib/database-private.h | 7 +++
 lib/database.cc| 2 ++
 2 files changed, 9 insertions(+)

diff --git a/lib/database-private.h b/lib/database-private.h
index ca0751c..e2e4bc8 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -85,6 +85,13 @@ enum _notmuch_features {
  *
  * Introduced: version 2. */
 NOTMUCH_FEATURE_BOOL_FOLDER = 1 << 3,
+
+/* If set, missing messages are stored in ghost mail documents.
+ * If unset, thread IDs of ghost messages are stored as database
+ * metadata instead of in ghost documents.
+ *
+ * Introduced: version 3. */
+NOTMUCH_FEATURE_GHOSTS = 1 << 4,
 };

 /* In C++, a named enum is its own type, so define bitwise operators
diff --git a/lib/database.cc b/lib/database.cc
index 1c6ffc5..8fd7fad 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -286,6 +286,8 @@ static const struct {
   "from/subject/message-ID in database", "w" },
 { NOTMUCH_FEATURE_BOOL_FOLDER,
   "exact folder:/path: search", "rw" },
+{ NOTMUCH_FEATURE_GHOSTS,
+  "mail documents for missing messages", "w"},
 };

 const char *
-- 
2.1.0



[PATCH v2 03/12] lib: Handle empty date value

2014-10-06 Thread Austin Clements
From: Austin Clements <amdra...@mit.edu>

In the interest of robustness, avoid undefined behavior of
sortable_unserialise if the date value is missing.  This shouldn't
happen now, but ghost messages will have blank date values.
---
 lib/message.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/message.cc b/lib/message.cc
index bbfc250..38bc929 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -896,6 +896,9 @@ notmuch_message_get_date (notmuch_message_t *message)
return 0;
 }

+if (value.empty ())
+   /* sortable_unserialise is undefined on empty string */
+   return 0;
 return Xapian::sortable_unserialise (value);
 }

-- 
2.1.0



  1   2   3   4   5   6   7   8   9   10   >