[RFC patch 2/2] lib: index message files with duplicate message-ids
The corresponding xapian document just gets more terms added to it, but this doesn't seem to break anything. --- lib/database.cc| 3 +++ test/T670-duplicate-mid.sh | 1 - 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/database.cc b/lib/database.cc index a679cbab..e83017ed 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -2582,6 +2582,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch, if (ret) goto DONE; } else { + ret = _notmuch_message_index_file (message, message_file); + if (ret) + goto DONE; ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID; } diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh index d28afc91..41c53bc8 100755 --- a/test/T670-duplicate-mid.sh +++ b/test/T670-duplicate-mid.sh @@ -6,7 +6,6 @@ add_message [id]=id:duplicate '[subject]="message 1"' add_message [id]=id:duplicate '[subject]="message 2"' test_begin_subtest 'Search for second subject' -test_subtest_known_broken cat
a first step for the duplicate message-id dilemma
These are mainly RFC because I'm not 100% sure about the performance impact. It seems OK for me: about 3% slower indexing my 500 K messages with about 35k duplicates. I didn't see a noticable increase in database size (both cases it's 5.8G / 3.5G before/after notmuch compact). There are also tons of UI issues: for example in the test case here, notmuch search subject:'"message 2"' will happily print thread:0001 2001-01-05 [1/1] Notmuch Test Suite; message 1 (inbox unread) I claim it's still an improvement over the current code, where that second message is not findable by any terms unique to it. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
[RFC patch 1/2] test: add known broken test for duplicate message id
There are many other problems that could be tested, but this one we have some hope of fixing because it doesn't require UI changes, just indexing changes. --- test/T670-duplicate-mid.sh | 17 + 1 file changed, 17 insertions(+) create mode 100755 test/T670-duplicate-mid.sh diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh new file mode 100755 index ..d28afc91 --- /dev/null +++ b/test/T670-duplicate-mid.sh @@ -0,0 +1,17 @@ +#!/usr/bin/env bash +test_description="duplicate message ids" +. ./test-lib.sh || exit 1 + +add_message [id]=id:duplicate '[subject]="message 1"' +add_message [id]=id:duplicate '[subject]="message 2"' + +test_begin_subtest 'Search for second subject' +test_subtest_known_broken +catOUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_done -- 2.11.0 ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 2/2] lib: clamp return value of g_mime_utils_header_decode_date to >=0
David Bremnerwrites: > For reasons not completely understood at this time, gmime (as of > 2.6.22) is returning a date before 1900 on bad date input. Since this > confuses some other software, we clamp such dates to 0, > i.e. 1970-01-01. series pushed, amended per Tomi's suggestion. It's possible I've been writing an unhealthy amount of scheme lately. Dunno what else would make the ternary if operator look sensible. d ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 2/2] lib: clamp return value of g_mime_utils_header_decode_date to >=0
On Sun, Mar 12 2017, David Bremnerwrote: > For reasons not completely understood at this time, gmime (as of > 2.6.22) is returning a date before 1900 on bad date input. Since this > confuses some other software, we clamp such dates to 0, > i.e. 1970-01-01. > --- > lib/message.cc | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/lib/message.cc b/lib/message.cc > index 007f1171..8a8a25b4 100644 > --- a/lib/message.cc > +++ b/lib/message.cc > @@ -1034,10 +1034,15 @@ _notmuch_message_set_header_values (notmuch_message_t > *message, > > /* GMime really doesn't want to see a NULL date, so protect its > * sensibilities. */ > -if (date == NULL || *date == '\0') > +if (date == NULL || *date == '\0') { > time_value = 0; "Too bad" we already do this time_value = 0, otherwise I'd suggested -21 $ perl -le 'print scalar localtime -21' Sat Feb 7 21:54:38 1903 That is something where Julian calendar is also in 20th century ;) > -else > +} else { > time_value = g_mime_utils_header_decode_date (date, NULL); > + /* > + * Workaround for https://bugzilla.gnome.org/show_bug.cgi?id=779923 > + */ > + time_value = (time_value < 0) ? 0 : time_value; Although the above probably realizes as..., I'd propose (IMO for clarity) if (time_value < 0) time_value = 0; Anyway, LGTM. Tomi Btw: I Added notmuch show --format=json '*' >&6 to the test script, and it printed: [[[{"id": "msg-001@notmuch-test-suite", "match": true, "excluded": false, "filename": ["/home/too/vc/ext/notmuch/test/tmp.T111-x/mail/msg-001"], "timestamp": 2085892096, "date_relative": "1899-12-31", "tags": ["inbox", "unread"], "headers": {"Subject": "Test message #1", "From": "Notmuch Test Suite ", "To": "Notmuch Test Suite ", "Date": "Sun, 31 Dec 1899 00:00:00 +"}, "body": [{"id": 1, "content-type": "text/plain", "content": "This is just a test message (#1)\n"}]}, [ (... which one can see I just pasted to a new file... ;) $ perl -le 'print scalar localtime 2085892096' Wed Feb 6 08:28:16 2036 So, it looks like we store the large negative time_value to a 32-bit signed integer... > +} > > message->doc.add_value (NOTMUCH_VALUE_TIMESTAMP, > Xapian::sortable_serialise (time_value)); > -- > 2.11.0 > > ___ > notmuch mailing list > notmuch@notmuchmail.org > https://notmuchmail.org/mailman/listinfo/notmuch ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch