[PATCH] lib: return "" rather than NULL from notmuch_thread_get_authors

2017-12-14 Thread David Bremner
The current beheviour is at best underdocumented. The modified test in
T470-missing-headers.sh previously relied on printf doing the right
thing with NULL, which seems ick.

The use of talloc_strdup here is probably overkill, but it avoids
having to enforce that thread->authors is never mutated outside
_resolve_thread_authors_string.
---
 lib/thread.cc| 3 +++
 test/T470-missing-headers.sh | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/thread.cc b/lib/thread.cc
index 1632da4c..3561b27f 100644
--- a/lib/thread.cc
+++ b/lib/thread.cc
@@ -160,6 +160,9 @@ _resolve_thread_authors_string (notmuch_thread_t *thread)
 thread->authors_array = NULL;
 g_ptr_array_free (thread->matched_authors_array, true);
 thread->matched_authors_array = NULL;
+
+if (!thread->authors)
+   thread->authors = talloc_strdup(thread, "");
 }
 
 /* clean up the ugly "Lastname, Firstname" format that some mail systems
diff --git a/test/T470-missing-headers.sh b/test/T470-missing-headers.sh
index 4bf5d285..555fd4e9 100755
--- a/test/T470-missing-headers.sh
+++ b/test/T470-missing-headers.sh
@@ -25,7 +25,7 @@ NOTMUCH_NEW >/dev/null
 test_begin_subtest "Search: text"
 output=$(notmuch search '*' | notmuch_search_sanitize)
 test_expect_equal "$output" "\
-thread:XXX   2001-01-05 [1/1] (null);  (inbox unread)
+thread:XXX   2001-01-05 [1/1] ;  (inbox unread)
 thread:XXX   1970-01-01 [1/1] Notmuch Test Suite;  (inbox unread)"
 
 test_begin_subtest "Search: json"
-- 
2.15.1

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] doc: unify definition list usage across man pages

2017-12-14 Thread David Bremner
David Bremner  writes:

> Jani Nikula  writes:
>
>> Make all parameter descriptions etc. use reStructuredText definition
>> lists with uniform style and indentation. Remove redundant indentation
>> from around the lists. Remove blank lines between term lines and
>> definition blocks. Use four spaces for indentation.
>>
>> This is almost completely whitespace and paragraph reflow changes.
>
> Of course this doesn't apply anymore, and I don't really understand the
> conflicts. I'm not totally sure about the benefits, but if you think
> it's worth rebasing your patch, I'm fine with it.
>
> d

and I meant to reply to v2.

d
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] doc: unify definition list usage across man pages

2017-12-14 Thread David Bremner
Jani Nikula  writes:

> Make all parameter descriptions etc. use reStructuredText definition
> lists with uniform style and indentation. Remove redundant indentation
> from around the lists. Remove blank lines between term lines and
> definition blocks. Use four spaces for indentation.
>
> This is almost completely whitespace and paragraph reflow changes.

Of course this doesn't apply anymore, and I don't really understand the
conflicts. I'm not totally sure about the benefits, but if you think
it's worth rebasing your patch, I'm fine with it.

d
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: subjects and duplicated message id's

2017-12-14 Thread David Bremner

>>- we could also force the value slot to have the lexico first files'
>>  subject during indexing. This would be a bit fiddly, but localized.
>>  It would have the surprising effect of having the subject updated
>>  when new messages arrived.
>
> This is a bit weird, unless we also force "notmuch show" to always show
> the lexicographically-first file as well, no?

AFAIU, this is the current behaviour, see the second test in T670-duplicate-mid.

> Obviously, i'd prefer the original subject, so that it's searchable.
> I'd also like it if "notmuch show" displayed the original subject when
> showing the encrypted message.
>
> However, if someone does "notmuch show --decrypt=false" i'd want it to
> display the as-delivered header.
>
> Does this help push you toward any specific decision?  I'm also not sure
> what the correct solution is right now.

I don't think it really pushes one way or the other. The --decrypt=false
case could just look at the file instead of the database (that's what
notmuch-show does now.

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: subjects and duplicated message id's

2017-12-14 Thread Daniel Kahn Gillmor
On Thu 2017-12-14 10:03:12 -0400, David Bremner wrote:
> There are currently several somewhat related issues with notmuch's
> handling of subject headers for messages with duplicate message-ids
> (i.e. several files on disk with the same message id).  These are all
> reflections of the fact that we use a value slot for subjects in the
> database message document (i.e. the database object keyed by the
> message-id).  Among other things, using a value slot is what makes
> regular expression searching (and potentially sorting) by subject work.
>
> When we have multiple files with the same message-id, but different
> subjects (probably indicating a "real" mid collision).
>
> 1. The output of notmuch-show can be inconsistent with notmuch-search
>
>- this is because show is reading from the lexicographically first
>  file, while show is reading the database value slot.

you've got two "show"s here.  i think the second "show" is meant to be
"search".

>- in principle this could be fixed by making show read the value
>  slot; but then the subject might not be consistent with the rest of
>  the message content. Also, it looks like a bit of a pain to refactor
>  so all that sprinter code has database access.
>
>- we could also force the value slot to have the lexico first files'
>  subject during indexing. This would be a bit fiddly, but localized.
>  It would have the surprising effect of having the subject updated
>  when new messages arrived.

This is a bit weird, unless we also force "notmuch show" to always show
the lexicographically-first file as well, no?

> 2. Regular expression search doesn't work for subjects not in the value
>slot.
>
>- this could be fixed by putting all subjects in the value slot,
>  perhaps as newline seperated strings. This would also be a
>  potential solution for the "subject hiding" issue mentioned above,
>  although it would take some front-end effort as well to deal with
>  "multi-subjects".  This could be reported in e.g. json output as an
>  array of subjects.
>
> I'm open to other, better ideas of how to do this. I'm also curious how
> important people think these bugs are.

I think this is important to get right, thanks for raising it. I'll add
my own wrinkle below:

I'm looking at implementing "protected headers" (for recieving messages)
right now, where Subject: is the most important header that is typically
sent under encrypted cover (e.g. enigmail's "memory hole" implementation
does this).

With indexing of cleartext, we have a decision about which thing to put
in the value slot -- the "original" subject (that is, the subject that
the message sender wrote, which arrives inside the message encryption
for messages with protected headers) or the as-delivered external "stub"
header (typically the literal string "encrypted message").

Obviously, i'd prefer the original subject, so that it's searchable.
I'd also like it if "notmuch show" displayed the original subject when
showing the encrypted message.

However, if someone does "notmuch show --decrypt=false" i'd want it to
display the as-delivered header.

Does this help push you toward any specific decision?  I'm also not sure
what the correct solution is right now.

 --dkg


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] test: add known broken test for regexp search of second subject

2017-12-14 Thread David Bremner
We expect this to give the same answer as the non-regexp subject
search. It does not because the regexp search relies on the value
slot, which currently contains only one subject.
---
 test/T670-duplicate-mid.sh | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
index c198c506..bf8cc3a8 100755
--- a/test/T670-duplicate-mid.sh
+++ b/test/T670-duplicate-mid.sh
@@ -47,6 +47,16 @@ EOF
 notmuch search --output=files subject:'"message 2"' | notmuch_dir_sanitize > 
OUTPUT
 test_expect_equal_file EXPECTED OUTPUT
 
+test_begin_subtest 'Regexp search for second subject'
+test_subtest_known_broken
+cat  
OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
 add_message '[id]="duplicate"' '[body]="sekrit" [filename]=copy3'
 test_begin_subtest 'search for body in duplicate file'
 cat 

subjects and duplicated message id's

2017-12-14 Thread David Bremner

There are currently several somewhat related issues with notmuch's
handling of subject headers for messages with duplicate message-ids
(i.e. several files on disk with the same message id).  These are all
reflections of the fact that we use a value slot for subjects in the
database message document (i.e. the database object keyed by the
message-id).  Among other things, using a value slot is what makes
regular expression searching (and potentially sorting) by subject work.

When we have multiple files with the same message-id, but different
subjects (probably indicating a "real" mid collision).

1. The output of notmuch-show can be inconsistent with notmuch-search

   - this is because show is reading from the lexicographically first
 file, while show is reading the database value slot.

   - in principle this could be fixed by making show read the value
 slot; but then the subject might not be consistent with the rest of
 the message content. Also, it looks like a bit of a pain to refactor
 so all that sprinter code has database access.

   - we could also force the value slot to have the lexico first files'
 subject during indexing. This would be a bit fiddly, but localized.
 It would have the surprising effect of having the subject updated
 when new messages arrived.

2. Regular expression search doesn't work for subjects not in the value
   slot.

   - this could be fixed by putting all subjects in the value slot,
 perhaps as newline seperated strings. This would also be a
 potential solution for the "subject hiding" issue mentioned above,
 although it would take some front-end effort as well to deal with
 "multi-subjects".  This could be reported in e.g. json output as an
 array of subjects.

I'm open to other, better ideas of how to do this. I'm also curious how
important people think these bugs are.

d
   
 
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch