Re: [patch v3 06/12] lib: index message files with duplicate message-ids
David Bremnerwrites: > Daniel Kahn Gillmor writes: >> for example, i could follow up on the current message with another >> message with Message-Id: 20170604123235.24466-7-da...@tethera.net and >> give it a subject "Re: [patch v3 06/12] lib: do *not* index message >> files with duplicate message-ids". that's a bit odd, no? > > Yes, I agree that's a bit strange. We should make some effort to > display the subject that belongs with a given message body. I think it's > not too hard [1] to preserve the old behaviour of keeping the first > subject, date, and from. This leaves us with a version of the original > hiding message attack, but only for the special case of regex searches, > since those rely exclusively on the value slots. I had a slightly radical idea for how to deal with that. Subject/from from extra files could be appended to the value slot (e.g. separated by newlines). Then regexp searches would behave similarly to term based searches in that matching any file would match the message. We'd have to be slightly careful about what anchors meant. A further enhancement would be to expose the search result as an array. This kind of approach doesn't really make sense for dates, as we essentially search for those as numbers, and such a hack would break the built-in xapian range search. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: [patch v3 06/12] lib: index message files with duplicate message-ids
Daniel Kahn Gillmorwrites: > On Sun 2017-06-04 09:32:29 -0300, David Bremner wrote: >> The corresponding xapian document just gets more terms added to it, >> but this doesn't seem to break anything. Values on the other hand get >> overwritten, which is a bit annoying, but arguably it is not worse to >> take the values (from, subject, date) from the last file indexed >> rather than the first. [snip] > for example, i could follow up on the current message with another > message with Message-Id: 20170604123235.24466-7-da...@tethera.net and > give it a subject "Re: [patch v3 06/12] lib: do *not* index message > files with duplicate message-ids". that's a bit odd, no? Yes, I agree that's a bit strange. We should make some effort to display the subject that belongs with a given message body. I think it's not too hard [1] to preserve the old behaviour of keeping the first subject, date, and from. This leaves us with a version of the original hiding message attack, but only for the special case of regex searches, since those rely exclusively on the value slots. [1]: should be just a matter of guarding the call to _notmuch_message_set_header_values() with if (is_new || is_ghost), but that needs testing. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: [patch v3 06/12] lib: index message files with duplicate message-ids
On Sun 2017-06-04 09:32:29 -0300, David Bremner wrote: > The corresponding xapian document just gets more terms added to it, > but this doesn't seem to break anything. Values on the other hand get > overwritten, which is a bit annoying, but arguably it is not worse to > take the values (from, subject, date) from the last file indexed > rather than the first. It's certainly a change in behavior, though. This suggests that i can send you mail and have it change how an existing message shows up in the summary view, for example. for example, i could follow up on the current message with another message with Message-Id: 20170604123235.24466-7-da...@tethera.net and give it a subject "Re: [patch v3 06/12] lib: do *not* index message files with duplicate message-ids". that's a bit odd, no? I'm not saying it's wrong, i'm just hoping to acknowledge that this might be a controversial change. this whole series is pretty elaborate, of course, but so far it's looking like reasonable steps toward a clearer and more hackable codebase. --dkg signature.asc Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
[patch v3 06/12] lib: index message files with duplicate message-ids
The corresponding xapian document just gets more terms added to it, but this doesn't seem to break anything. Values on the other hand get overwritten, which is a bit annoying, but arguably it is not worse to take the values (from, subject, date) from the last file indexed rather than the first. --- lib/add-message.cc | 20 +++- test/T160-json.sh | 4 ++-- test/T670-duplicate-mid.sh | 2 -- 3 files changed, 13 insertions(+), 13 deletions(-) diff --git a/lib/add-message.cc b/lib/add-message.cc index 2922eaa9..ae9b14a7 100644 --- a/lib/add-message.cc +++ b/lib/add-message.cc @@ -529,19 +529,21 @@ notmuch_database_add_message (notmuch_database_t *notmuch, if (is_ghost) /* Convert ghost message to a regular message */ _notmuch_message_remove_term (message, "type", "ghost"); - ret = _notmuch_database_link_message (notmuch, message, + } + + ret = _notmuch_database_link_message (notmuch, message, message_file, is_ghost); - if (ret) - goto DONE; + if (ret) + goto DONE; - _notmuch_message_set_header_values (message, date, from, subject); + _notmuch_message_set_header_values (message, date, from, subject); - ret = _notmuch_message_index_file (message, message_file); - if (ret) - goto DONE; - } else { + ret = _notmuch_message_index_file (message, message_file); + if (ret) + goto DONE; + + if (! is_new && !is_ghost) ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID; - } _notmuch_message_sync (message); } catch (const Xapian::Error ) { diff --git a/test/T160-json.sh b/test/T160-json.sh index ac51895e..07955a2b 100755 --- a/test/T160-json.sh +++ b/test/T160-json.sh @@ -71,8 +71,8 @@ test_begin_subtest "Format version: too high" test_expect_code 21 "notmuch search --format-version=999 \\*" test_begin_subtest "Show message: multiple filenames" -add_message "[id]=message...@example.com [filename]=copy1" -add_message "[id]=message...@example.com [filename]=copy2" +add_message '[id]=message...@example.com [filename]=copy1 [date]="Fri, 05 Jan 2001 15:43:52 +"' +add_message '[id]=message...@example.com [filename]=copy2 [date]="Fri, 05 Jan 2001 15:43:52 +"' cat < EXPECTED [ [ diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh index ced28a21..f1952555 100755 --- a/test/T670-duplicate-mid.sh +++ b/test/T670-duplicate-mid.sh @@ -6,7 +6,6 @@ add_message '[id]="duplicate"' '[subject]="message 1" [filename]=copy1' add_message '[id]="duplicate"' '[subject]="message 2" [filename]=copy2' test_begin_subtest 'Search for second subject' -test_subtest_known_broken cat