Re: [patch v3 06/12] lib: index message files with duplicate message-ids

2017-06-09 Thread David Bremner
David Bremner  writes:

> Daniel Kahn Gillmor  writes:

>> for example, i could follow up on the current message with another
>> message with Message-Id: 20170604123235.24466-7-da...@tethera.net and
>> give it a subject "Re: [patch v3 06/12] lib: do *not* index message
>> files with duplicate message-ids".  that's a bit odd, no?
>
> Yes, I agree that's a bit strange.  We should make some effort to
> display the subject that belongs with a given message body. I think it's
> not too hard [1] to preserve the old behaviour of keeping the first
> subject, date, and from. This leaves us with a version of the original
> hiding message attack, but only for the special case of regex searches,
> since those rely exclusively on the value slots.

I had a slightly radical idea for how to deal with that. Subject/from
from extra files could be appended to the value slot (e.g. separated by
newlines). Then regexp searches would behave similarly to term based
searches in that matching any file would match the message. We'd have to
be slightly careful about what anchors meant.  A further enhancement
would be to expose the search result as an array. This kind of approach
doesn't really make sense for dates, as we essentially search for those
as numbers, and such a hack would break the built-in xapian range
search.  
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: [patch v3 06/12] lib: index message files with duplicate message-ids

2017-06-05 Thread David Bremner
Daniel Kahn Gillmor  writes:

> On Sun 2017-06-04 09:32:29 -0300, David Bremner wrote:
>> The corresponding xapian document just gets more terms added to it,
>> but this doesn't seem to break anything. Values on the other hand get
>> overwritten, which is a bit annoying, but arguably it is not worse to
>> take the values (from, subject, date) from the last file indexed
>> rather than the first.
[snip]
> for example, i could follow up on the current message with another
> message with Message-Id: 20170604123235.24466-7-da...@tethera.net and
> give it a subject "Re: [patch v3 06/12] lib: do *not* index message
> files with duplicate message-ids".  that's a bit odd, no?

Yes, I agree that's a bit strange.  We should make some effort to
display the subject that belongs with a given message body. I think it's
not too hard [1] to preserve the old behaviour of keeping the first
subject, date, and from. This leaves us with a version of the original
hiding message attack, but only for the special case of regex searches,
since those rely exclusively on the value slots.

[1]: should be just a matter of guarding the call to
_notmuch_message_set_header_values() with if (is_new || is_ghost), but
that needs testing.
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: [patch v3 06/12] lib: index message files with duplicate message-ids

2017-06-05 Thread Daniel Kahn Gillmor
On Sun 2017-06-04 09:32:29 -0300, David Bremner wrote:
> The corresponding xapian document just gets more terms added to it,
> but this doesn't seem to break anything. Values on the other hand get
> overwritten, which is a bit annoying, but arguably it is not worse to
> take the values (from, subject, date) from the last file indexed
> rather than the first.

It's certainly a change in behavior, though.  This suggests that i can
send you mail and have it change how an existing message shows up in the
summary view, for example.

for example, i could follow up on the current message with another
message with Message-Id: 20170604123235.24466-7-da...@tethera.net and
give it a subject "Re: [patch v3 06/12] lib: do *not* index message
files with duplicate message-ids".  that's a bit odd, no?

I'm not saying it's wrong, i'm just hoping to acknowledge that this
might be a controversial change.

this whole series is pretty elaborate, of course, but so far it's
looking like reasonable steps toward a clearer and more hackable
codebase.

  --dkg


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[patch v3 06/12] lib: index message files with duplicate message-ids

2017-06-04 Thread David Bremner
The corresponding xapian document just gets more terms added to it,
but this doesn't seem to break anything. Values on the other hand get
overwritten, which is a bit annoying, but arguably it is not worse to
take the values (from, subject, date) from the last file indexed
rather than the first.
---
 lib/add-message.cc | 20 +++-
 test/T160-json.sh  |  4 ++--
 test/T670-duplicate-mid.sh |  2 --
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/lib/add-message.cc b/lib/add-message.cc
index 2922eaa9..ae9b14a7 100644
--- a/lib/add-message.cc
+++ b/lib/add-message.cc
@@ -529,19 +529,21 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
if (is_ghost)
/* Convert ghost message to a regular message */
_notmuch_message_remove_term (message, "type", "ghost");
-   ret = _notmuch_database_link_message (notmuch, message,
+   }
+
+   ret = _notmuch_database_link_message (notmuch, message,
  message_file, is_ghost);
-   if (ret)
-   goto DONE;
+   if (ret)
+   goto DONE;
 
-   _notmuch_message_set_header_values (message, date, from, subject);
+   _notmuch_message_set_header_values (message, date, from, subject);
 
-   ret = _notmuch_message_index_file (message, message_file);
-   if (ret)
-   goto DONE;
-   } else {
+   ret = _notmuch_message_index_file (message, message_file);
+   if (ret)
+   goto DONE;
+
+   if (! is_new && !is_ghost)
ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
-   }
 
_notmuch_message_sync (message);
 } catch (const Xapian::Error ) {
diff --git a/test/T160-json.sh b/test/T160-json.sh
index ac51895e..07955a2b 100755
--- a/test/T160-json.sh
+++ b/test/T160-json.sh
@@ -71,8 +71,8 @@ test_begin_subtest "Format version: too high"
 test_expect_code 21 "notmuch search --format-version=999 \\*"
 
 test_begin_subtest "Show message: multiple filenames"
-add_message "[id]=message...@example.com [filename]=copy1"
-add_message "[id]=message...@example.com [filename]=copy2"
+add_message '[id]=message...@example.com [filename]=copy1 [date]="Fri, 05 Jan 
2001 15:43:52 +"'
+add_message '[id]=message...@example.com [filename]=copy2 [date]="Fri, 05 Jan 
2001 15:43:52 +"'
 cat < EXPECTED
 [
 [
diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
index ced28a21..f1952555 100755
--- a/test/T670-duplicate-mid.sh
+++ b/test/T670-duplicate-mid.sh
@@ -6,7 +6,6 @@ add_message '[id]="duplicate"' '[subject]="message 1" 
[filename]=copy1'
 add_message '[id]="duplicate"' '[subject]="message 2" [filename]=copy2'
 
 test_begin_subtest 'Search for second subject'
-test_subtest_known_broken
 cat