[PATCH 3/3] lib: Reject multi-message mboxes and deprecate single-message mbox
On Sun, Nov 25 2012, Austin Clements wrote: > Quoth Tomi Ollila on Nov 25 at 3:26 pm: >> On Sun, Nov 25 2012, Austin Clements wrote: >> >> > Previously, we would treat multi-message mboxes as one giant email, >> > which, besides the obvious incorrect indexing, often led to >> > out-of-memory errors for archival mboxes. Now we explicitly reject >> > multi-message mboxes. For historical reasons, we retain support for >> > single-message mboxes, but official deprecate this behavior. >> >> >> The series looks good to me -- but I don't know about deprecating >> single-message mboxes: >> >> * If we someday support (read-only?) mbox format, then single-message >> mboxes are "normal" again. > > If notmuch does gain mbox support, then its handling of single-message > mboxes will *definitely* change because it will stop doing > maildir-like things to them (flag sync, moving from new to cur, etc), > which people may currently be depending on. This was one of the > motivations for deprecating the current handling of single-message > mboxes. > >> * Some na?ve mb2md scripts could leave the 'From ' -line intact: for >> example `formail -bz -s head -3 < $MAIL`(*) can be used to demonstrate this > > I would call that "buggy", rather than "na?ve". ]:--8) > >> * Some people may have large collection of single-file messages starting >> with 'From ' currently indexed. If those are to be re-indexed later >> without "single-message mbox" support that is somewhat of a burden to >> the users (**) > > That's why this only deprecates them (with a warning) and doesn't drop > support for them. The idea is to keep the historical handling for a > few releases and then we'll have the flexibility to do what we want > with single-message mboxes (including supporting them as real mbox). > > It's probably a good idea to include a script or a wiki pointer for > fixing single-message mboxes in the NEWS. As long as the file name is > kept the same, notmuch won't reindex it. Ok, I'm convinced. +1 Tomi > >> (*) my "mb2md" wannabe does gnus-like "$formail" -bz -R 'From ' X-From-Line: >> ... >> >> (**) Something like the following could be used to mangle "single-file >> mboxes"... >> find . -type f | xargs perl -e 'foreach (@ARGV) { open IO, "+<", $_ or >> next; sysread IO, $buf, 5; if ($buf eq "From ") { sysseek IO, 0, 0; >> syswrite IO, "Fro:"; }}' >> This breaks the multi-message mbox nicely... >;) >> >> >> Tomi > ___ > notmuch mailing list > notmuch at notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch
[PATCH 3/3] lib: Reject multi-message mboxes and deprecate single-message mbox
On Sun, Nov 25 2012, Austin Clements wrote: > Previously, we would treat multi-message mboxes as one giant email, > which, besides the obvious incorrect indexing, often led to > out-of-memory errors for archival mboxes. Now we explicitly reject > multi-message mboxes. For historical reasons, we retain support for > single-message mboxes, but official deprecate this behavior. The series looks good to me -- but I don't know about deprecating single-message mboxes: * If we someday support (read-only?) mbox format, then single-message mboxes are "normal" again. * Some na?ve mb2md scripts could leave the 'From ' -line intact: for example `formail -bz -s head -3 < $MAIL`(*) can be used to demonstrate this * Some people may have large collection of single-file messages starting with 'From ' currently indexed. If those are to be re-indexed later without "single-message mbox" support that is somewhat of a burden to the users (**) (*) my "mb2md" wannabe does gnus-like "$formail" -bz -R 'From ' X-From-Line: ... (**) Something like the following could be used to mangle "single-file mboxes"... find . -type f | xargs perl -e 'foreach (@ARGV) { open IO, "+<", $_ or next; sysread IO, $buf, 5; if ($buf eq "From ") { sysseek IO, 0, 0; syswrite IO, "Fro:"; }}' This breaks the multi-message mbox nicely... >;) Tomi
[PATCH 3/3] lib: Reject multi-message mboxes and deprecate single-message mbox
Quoth Tomi Ollila on Nov 25 at 3:26 pm: > On Sun, Nov 25 2012, Austin Clements wrote: > > > Previously, we would treat multi-message mboxes as one giant email, > > which, besides the obvious incorrect indexing, often led to > > out-of-memory errors for archival mboxes. Now we explicitly reject > > multi-message mboxes. For historical reasons, we retain support for > > single-message mboxes, but official deprecate this behavior. > > > The series looks good to me -- but I don't know about deprecating > single-message mboxes: > > * If we someday support (read-only?) mbox format, then single-message > mboxes are "normal" again. If notmuch does gain mbox support, then its handling of single-message mboxes will *definitely* change because it will stop doing maildir-like things to them (flag sync, moving from new to cur, etc), which people may currently be depending on. This was one of the motivations for deprecating the current handling of single-message mboxes. > * Some na?ve mb2md scripts could leave the 'From ' -line intact: for > example `formail -bz -s head -3 < $MAIL`(*) can be used to demonstrate this I would call that "buggy", rather than "na?ve". ]:--8) > * Some people may have large collection of single-file messages starting > with 'From ' currently indexed. If those are to be re-indexed later > without "single-message mbox" support that is somewhat of a burden to > the users (**) That's why this only deprecates them (with a warning) and doesn't drop support for them. The idea is to keep the historical handling for a few releases and then we'll have the flexibility to do what we want with single-message mboxes (including supporting them as real mbox). It's probably a good idea to include a script or a wiki pointer for fixing single-message mboxes in the NEWS. As long as the file name is kept the same, notmuch won't reindex it. > (*) my "mb2md" wannabe does gnus-like "$formail" -bz -R 'From ' X-From-Line: > ... > > (**) Something like the following could be used to mangle "single-file > mboxes"... > find . -type f | xargs perl -e 'foreach (@ARGV) { open IO, "+<", $_ or > next; sysread IO, $buf, 5; if ($buf eq "From ") { sysseek IO, 0, 0; > syswrite IO, "Fro:"; }}' > This breaks the multi-message mbox nicely... >;) > > > Tomi
Re: [PATCH 3/3] lib: Reject multi-message mboxes and deprecate single-message mbox
On Sun, Nov 25 2012, Austin Clements wrote: > Quoth Tomi Ollila on Nov 25 at 3:26 pm: >> On Sun, Nov 25 2012, Austin Clements wrote: >> >> > Previously, we would treat multi-message mboxes as one giant email, >> > which, besides the obvious incorrect indexing, often led to >> > out-of-memory errors for archival mboxes. Now we explicitly reject >> > multi-message mboxes. For historical reasons, we retain support for >> > single-message mboxes, but official deprecate this behavior. >> >> >> The series looks good to me -- but I don't know about deprecating >> single-message mboxes: >> >> * If we someday support (read-only?) mbox format, then single-message >> mboxes are "normal" again. > > If notmuch does gain mbox support, then its handling of single-message > mboxes will *definitely* change because it will stop doing > maildir-like things to them (flag sync, moving from new to cur, etc), > which people may currently be depending on. This was one of the > motivations for deprecating the current handling of single-message > mboxes. > >> * Some naïve mb2md scripts could leave the 'From ' -line intact: for >> example `formail -bz -s head -3 < $MAIL`(*) can be used to demonstrate this > > I would call that "buggy", rather than "naïve". ]:--8) > >> * Some people may have large collection of single-file messages starting >> with 'From ' currently indexed. If those are to be re-indexed later >> without "single-message mbox" support that is somewhat of a burden to >> the users (**) > > That's why this only deprecates them (with a warning) and doesn't drop > support for them. The idea is to keep the historical handling for a > few releases and then we'll have the flexibility to do what we want > with single-message mboxes (including supporting them as real mbox). > > It's probably a good idea to include a script or a wiki pointer for > fixing single-message mboxes in the NEWS. As long as the file name is > kept the same, notmuch won't reindex it. Ok, I'm convinced. +1 Tomi > >> (*) my "mb2md" wannabe does gnus-like "$formail" -bz -R 'From ' X-From-Line: >> ... >> >> (**) Something like the following could be used to mangle "single-file >> mboxes"... >> find . -type f | xargs perl -e 'foreach (@ARGV) { open IO, "+<", $_ or >> next; sysread IO, $buf, 5; if ($buf eq "From ") { sysseek IO, 0, 0; >> syswrite IO, "Fro:"; }}' >> This breaks the multi-message mbox nicely... >;) >> >> >> Tomi > ___ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 3/3] lib: Reject multi-message mboxes and deprecate single-message mbox
Quoth Tomi Ollila on Nov 25 at 3:26 pm: > On Sun, Nov 25 2012, Austin Clements wrote: > > > Previously, we would treat multi-message mboxes as one giant email, > > which, besides the obvious incorrect indexing, often led to > > out-of-memory errors for archival mboxes. Now we explicitly reject > > multi-message mboxes. For historical reasons, we retain support for > > single-message mboxes, but official deprecate this behavior. > > > The series looks good to me -- but I don't know about deprecating > single-message mboxes: > > * If we someday support (read-only?) mbox format, then single-message > mboxes are "normal" again. If notmuch does gain mbox support, then its handling of single-message mboxes will *definitely* change because it will stop doing maildir-like things to them (flag sync, moving from new to cur, etc), which people may currently be depending on. This was one of the motivations for deprecating the current handling of single-message mboxes. > * Some naïve mb2md scripts could leave the 'From ' -line intact: for > example `formail -bz -s head -3 < $MAIL`(*) can be used to demonstrate this I would call that "buggy", rather than "naïve". ]:--8) > * Some people may have large collection of single-file messages starting > with 'From ' currently indexed. If those are to be re-indexed later > without "single-message mbox" support that is somewhat of a burden to > the users (**) That's why this only deprecates them (with a warning) and doesn't drop support for them. The idea is to keep the historical handling for a few releases and then we'll have the flexibility to do what we want with single-message mboxes (including supporting them as real mbox). It's probably a good idea to include a script or a wiki pointer for fixing single-message mboxes in the NEWS. As long as the file name is kept the same, notmuch won't reindex it. > (*) my "mb2md" wannabe does gnus-like "$formail" -bz -R 'From ' X-From-Line: > ... > > (**) Something like the following could be used to mangle "single-file > mboxes"... > find . -type f | xargs perl -e 'foreach (@ARGV) { open IO, "+<", $_ or > next; sysread IO, $buf, 5; if ($buf eq "From ") { sysseek IO, 0, 0; > syswrite IO, "Fro:"; }}' > This breaks the multi-message mbox nicely... >;) > > > Tomi ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 3/3] lib: Reject multi-message mboxes and deprecate single-message mbox
On Sun, Nov 25 2012, Austin Clements wrote: > Previously, we would treat multi-message mboxes as one giant email, > which, besides the obvious incorrect indexing, often led to > out-of-memory errors for archival mboxes. Now we explicitly reject > multi-message mboxes. For historical reasons, we retain support for > single-message mboxes, but official deprecate this behavior. The series looks good to me -- but I don't know about deprecating single-message mboxes: * If we someday support (read-only?) mbox format, then single-message mboxes are "normal" again. * Some naïve mb2md scripts could leave the 'From ' -line intact: for example `formail -bz -s head -3 < $MAIL`(*) can be used to demonstrate this * Some people may have large collection of single-file messages starting with 'From ' currently indexed. If those are to be re-indexed later without "single-message mbox" support that is somewhat of a burden to the users (**) (*) my "mb2md" wannabe does gnus-like "$formail" -bz -R 'From ' X-From-Line: ... (**) Something like the following could be used to mangle "single-file mboxes"... find . -type f | xargs perl -e 'foreach (@ARGV) { open IO, "+<", $_ or next; sysread IO, $buf, 5; if ($buf eq "From ") { sysseek IO, 0, 0; syswrite IO, "Fro:"; }}' This breaks the multi-message mbox nicely... >;) Tomi ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[PATCH 3/3] lib: Reject multi-message mboxes and deprecate single-message mbox
Previously, we would treat multi-message mboxes as one giant email, which, besides the obvious incorrect indexing, often led to out-of-memory errors for archival mboxes. Now we explicitly reject multi-message mboxes. For historical reasons, we retain support for single-message mboxes, but official deprecate this behavior. --- lib/database.cc |4 +++- lib/index.cc| 28 test/new|8 +--- 3 files changed, 36 insertions(+), 4 deletions(-) diff --git a/lib/database.cc b/lib/database.cc index 4df3217..91d4329 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -1821,7 +1821,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch, date = notmuch_message_file_get_header (message_file, "date"); _notmuch_message_set_header_values (message, date, from, subject); - _notmuch_message_index_file (message, filename); + ret = _notmuch_message_index_file (message, filename); + if (ret) + goto DONE; } else { ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID; } diff --git a/lib/index.cc b/lib/index.cc index e377732..da0e6ce 100644 --- a/lib/index.cc +++ b/lib/index.cc @@ -435,6 +435,9 @@ _notmuch_message_index_file (notmuch_message_t *message, const char *from, *subject; notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS; static int initialized = 0; +char from_buf[5]; +bool is_mbox = false; +static bool mbox_warning = false; if (! initialized) { g_mime_init (0); @@ -448,13 +451,38 @@ _notmuch_message_index_file (notmuch_message_t *message, goto DONE; } +/* Is this mbox? */ +if (fread (from_buf, sizeof (from_buf), 1, file) == 1 && + strncmp (from_buf, "From ", 5) == 0) + is_mbox = true; +rewind (file); + /* Evil GMime steals my FILE* here so I won't fclose it. */ stream = g_mime_stream_file_new (file); parser = g_mime_parser_new_with_stream (stream); +g_mime_parser_set_scan_from (parser, is_mbox); mime_message = g_mime_parser_construct_message (parser); +if (is_mbox) { + if (!g_mime_parser_eos (parser)) { + /* This is a multi-message mbox. */ + ret = NOTMUCH_STATUS_FILE_NOT_EMAIL; + goto DONE; + } + /* For historical reasons, we support single-message mboxes, +* but this behavior is likely to change in the future, so +* warn. */ + if (!mbox_warning) { + mbox_warning = true; + fprintf (stderr, "\ +Warning: %s is an mbox containing a single message,\n\ +likely caused by misconfigured mail delivery. Support for single-message\n\ +mboxes is deprecated and may be removed in the future.\n", filename); + } +} + from = g_mime_message_get_sender (mime_message); addresses = internet_address_list_parse_string (from); diff --git a/test/new b/test/new index 29f9aff..f562cec 100755 --- a/test/new +++ b/test/new @@ -163,7 +163,7 @@ rm -rf "${MAIL_DIR}"/two output=$(NOTMUCH_NEW) test_expect_equal "$output" "No new mail. Removed 3 messages." -test_begin_subtest "Support single-message mbox" +test_begin_subtest "Support single-message mbox (deprecated)" cat > "${MAIL_DIR}"/mbox_file1 < @@ -174,11 +174,13 @@ Body. EOF output=$(NOTMUCH_NEW 2>&1) test_expect_equal "$output" \ -"Added 1 new message to the database." +"Warning: ${MAIL_DIR}/mbox_file1 is an mbox containing a single message, +likely caused by misconfigured mail delivery. Support for single-message +mboxes is deprecated and may be removed in the future. +Added 1 new message to the database." # This test requires that notmuch new has been run at least once. test_begin_subtest "Skip and report non-mail files" -test_subtest_known_broken generate_message mkdir -p "${MAIL_DIR}"/.git && touch "${MAIL_DIR}"/.git/config touch "${MAIL_DIR}"/ignored_file -- 1.7.10.4
[PATCH 3/3] lib: Reject multi-message mboxes and deprecate single-message mbox
Previously, we would treat multi-message mboxes as one giant email, which, besides the obvious incorrect indexing, often led to out-of-memory errors for archival mboxes. Now we explicitly reject multi-message mboxes. For historical reasons, we retain support for single-message mboxes, but official deprecate this behavior. --- lib/database.cc |4 +++- lib/index.cc| 28 test/new|8 +--- 3 files changed, 36 insertions(+), 4 deletions(-) diff --git a/lib/database.cc b/lib/database.cc index 4df3217..91d4329 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -1821,7 +1821,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch, date = notmuch_message_file_get_header (message_file, "date"); _notmuch_message_set_header_values (message, date, from, subject); - _notmuch_message_index_file (message, filename); + ret = _notmuch_message_index_file (message, filename); + if (ret) + goto DONE; } else { ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID; } diff --git a/lib/index.cc b/lib/index.cc index e377732..da0e6ce 100644 --- a/lib/index.cc +++ b/lib/index.cc @@ -435,6 +435,9 @@ _notmuch_message_index_file (notmuch_message_t *message, const char *from, *subject; notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS; static int initialized = 0; +char from_buf[5]; +bool is_mbox = false; +static bool mbox_warning = false; if (! initialized) { g_mime_init (0); @@ -448,13 +451,38 @@ _notmuch_message_index_file (notmuch_message_t *message, goto DONE; } +/* Is this mbox? */ +if (fread (from_buf, sizeof (from_buf), 1, file) == 1 && + strncmp (from_buf, "From ", 5) == 0) + is_mbox = true; +rewind (file); + /* Evil GMime steals my FILE* here so I won't fclose it. */ stream = g_mime_stream_file_new (file); parser = g_mime_parser_new_with_stream (stream); +g_mime_parser_set_scan_from (parser, is_mbox); mime_message = g_mime_parser_construct_message (parser); +if (is_mbox) { + if (!g_mime_parser_eos (parser)) { + /* This is a multi-message mbox. */ + ret = NOTMUCH_STATUS_FILE_NOT_EMAIL; + goto DONE; + } + /* For historical reasons, we support single-message mboxes, +* but this behavior is likely to change in the future, so +* warn. */ + if (!mbox_warning) { + mbox_warning = true; + fprintf (stderr, "\ +Warning: %s is an mbox containing a single message,\n\ +likely caused by misconfigured mail delivery. Support for single-message\n\ +mboxes is deprecated and may be removed in the future.\n", filename); + } +} + from = g_mime_message_get_sender (mime_message); addresses = internet_address_list_parse_string (from); diff --git a/test/new b/test/new index 29f9aff..f562cec 100755 --- a/test/new +++ b/test/new @@ -163,7 +163,7 @@ rm -rf "${MAIL_DIR}"/two output=$(NOTMUCH_NEW) test_expect_equal "$output" "No new mail. Removed 3 messages." -test_begin_subtest "Support single-message mbox" +test_begin_subtest "Support single-message mbox (deprecated)" cat > "${MAIL_DIR}"/mbox_file1 < @@ -174,11 +174,13 @@ Body. EOF output=$(NOTMUCH_NEW 2>&1) test_expect_equal "$output" \ -"Added 1 new message to the database." +"Warning: ${MAIL_DIR}/mbox_file1 is an mbox containing a single message, +likely caused by misconfigured mail delivery. Support for single-message +mboxes is deprecated and may be removed in the future. +Added 1 new message to the database." # This test requires that notmuch new has been run at least once. test_begin_subtest "Skip and report non-mail files" -test_subtest_known_broken generate_message mkdir -p "${MAIL_DIR}"/.git && touch "${MAIL_DIR}"/.git/config touch "${MAIL_DIR}"/ignored_file -- 1.7.10.4 ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch