Re: [PATCH v2] nmbug: write tags out to a temporary file, not 'nmbug.index'
On Sun, Feb 13 2022, Sean Whitton wrote: > Hello, > > On Sun 13 Feb 2022 at 09:54am -07, Sean Whitton wrote: > >> If more than nmbug process is running at once, then each will try to >> read and write the same file. The particular failure I've seen is >> that the process which finishes first deletes nmbug.index, and then >> the other process dies with a FileNotFoundError. So use a distinct >> temporary file per process. >> --- >> devel/nmbug/nmbug | 44 ++-- >> 1 file changed, 22 insertions(+), 22 deletions(-) >> >> Here is a second attempt, though I'm afraid I have little idea whether it is >> idiomatic Python. > > It would seem this causes 'nmbug status' to output just one result. > > I'll leave the fix to someone with more Python experience. Sorry for > the improperly tested v2 patch. One option would be using the first patch, but instead of mkstemp(), NamedTemporaryFile(dir=NMBGIT, prefix="nmbug.index") to be used instead (and then path.name and index.name in place of path and index...) Tomi > > -- > Sean Whitton ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: [PATCH v2] nmbug: write tags out to a temporary file, not 'nmbug.index'
Hello, On Sun 13 Feb 2022 at 09:54am -07, Sean Whitton wrote: > If more than nmbug process is running at once, then each will try to > read and write the same file. The particular failure I've seen is > that the process which finishes first deletes nmbug.index, and then > the other process dies with a FileNotFoundError. So use a distinct > temporary file per process. > --- > devel/nmbug/nmbug | 44 ++-- > 1 file changed, 22 insertions(+), 22 deletions(-) > > Here is a second attempt, though I'm afraid I have little idea whether it is > idiomatic Python. It would seem this causes 'nmbug status' to output just one result. I'll leave the fix to someone with more Python experience. Sorry for the improperly tested v2 patch. -- Sean Whitton ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: [PATCH v2] nmbug: write tags out to a temporary file, not 'nmbug.index'
[ drive-by comment based on a past mistake :/ ] Sean Whitton writes: > -def _index_tags(): > -"Write notmuch tags to the nmbug.index." > -path = _os.path.join(NMBGIT, 'nmbug.index') > +(_, index) = _tempfile.mkstemp() [...] > _git( > args=['read-tree', '--empty'], > -additional_env={'GIT_INDEX_FILE': path}, wait=True) > +additional_env={'GIT_INDEX_FILE': index}, wait=True) It's better to put the temporary index in $GIT_DIR due to this bit noted in git-read-tree(1): The file must allow to be rename(2)ed into from a temporary file that is created next to the usual index file; typically this means it needs to be on the same filesystem as the index file itself, and you need write permission to the directories the index file and index output file are located in. So, assuming NMBGIT matches $GIT_DIR, perhaps change the above mkstemp call to something like _tempfile.mkstemp(dir=NMBGIT, prefix="nmbug", suffix=".index") ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
[PATCH v2] nmbug: write tags out to a temporary file, not 'nmbug.index'
If more than nmbug process is running at once, then each will try to read and write the same file. The particular failure I've seen is that the process which finishes first deletes nmbug.index, and then the other process dies with a FileNotFoundError. So use a distinct temporary file per process. --- devel/nmbug/nmbug | 44 ++-- 1 file changed, 22 insertions(+), 22 deletions(-) Here is a second attempt, though I'm afraid I have little idea whether it is idiomatic Python. diff --git a/devel/nmbug/nmbug b/devel/nmbug/nmbug index 043c1863..396098d0 100755 --- a/devel/nmbug/nmbug +++ b/devel/nmbug/nmbug @@ -586,37 +586,38 @@ def get_status(): 'deleted': {}, 'missing': {}, } -index = _index_tags() -maybe_deleted = _diff_index(index=index, filter='D') -for id, tags in maybe_deleted.items(): -(_, stdout, stderr) = _spawn( -args=['notmuch', 'search', '--output=files', 'id:{0}'.format(id)], -stdout=_subprocess.PIPE, -wait=True) -if stdout: -status['deleted'][id] = tags -else: -status['missing'][id] = tags -status['added'] = _diff_index(index=index, filter='A') -_os.remove(index) -return status - - -def _index_tags(): -"Write notmuch tags to the nmbug.index." -path = _os.path.join(NMBGIT, 'nmbug.index') +(_, index) = _tempfile.mkstemp() +try: +_index_tags(index) +maybe_deleted = _diff_index(index=index, filter='D') +for id, tags in maybe_deleted.items(): +(_, stdout, stderr) = _spawn( +args=['notmuch', 'search', '--output=files', 'id:{0}'.format(id)], +stdout=_subprocess.PIPE, +wait=True) +if stdout: +status['deleted'][id] = tags +else: +status['missing'][id] = tags +status['added'] = _diff_index(index=index, filter='A') +return status +finally: +_os.remove(index) + +def _index_tags(index): +"Write notmuch tags to a file." query = ' '.join('tag:"{tag}"'.format(tag=tag) for tag in get_tags()) prefix = '+{0}'.format(_ENCODED_TAG_PREFIX) _git( args=['read-tree', '--empty'], -additional_env={'GIT_INDEX_FILE': path}, wait=True) +additional_env={'GIT_INDEX_FILE': index}, wait=True) with _spawn( args=['notmuch', 'dump', '--format=batch-tag', '--', query], stdout=_subprocess.PIPE) as notmuch: with _git( args=['update-index', '--index-info'], stdin=_subprocess.PIPE, -additional_env={'GIT_INDEX_FILE': path}) as git: +additional_env={'GIT_INDEX_FILE': index}) as git: for line in notmuch.stdout: if line.strip().startswith('#'): continue @@ -629,7 +630,6 @@ def _index_tags(): for line in _index_tags_for_message( id=id, status='A', tags=tags): git.stdin.write(line) -return path def _index_tags_for_message(id, status, tags): -- 2.30.2 ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
[PATCH 2/4] test: add known broken test for insert with mbox as input
It seems reasonable that notmuch should try to avoid delivering messages in formats it cannot index. --- test/T070-insert.sh | 6 ++ 1 file changed, 6 insertions(+) diff --git a/test/T070-insert.sh b/test/T070-insert.sh index ec170b30..a297fa73 100755 --- a/test/T070-insert.sh +++ b/test/T070-insert.sh @@ -292,4 +292,10 @@ for code in OUT_OF_MEMORY XAPIAN_EXCEPTION ; do test_expect_code 0 "notmuch_with_shim shim-$code insert --keep < \"$gen_msg_filename\"" done +test_begin_subtest "insert converts mboxes on delivery" +test_subtest_known_broken +notmuch insert +unmboxed < "${TEST_DIRECTORY}"/corpora/indexing/mbox-attachment.eml +output=$(notmuch count tag:unmboxed) +test_expect_equal "${output}" 1 + test_done -- 2.34.1 ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
[PATCH 3/4] CLI/insert: split copy_fd
This helps maintainability and enables code-reuse of our home-brewed buffered-write code. This commit is mostly code movement. --- notmuch-insert.c | 37 + 1 file changed, 21 insertions(+), 16 deletions(-) diff --git a/notmuch-insert.c b/notmuch-insert.c index 214d4d03..bec25a2a 100644 --- a/notmuch-insert.c +++ b/notmuch-insert.c @@ -241,6 +241,24 @@ maildir_mktemp (const void *ctx, const char *maildir, bool world_readable, char return fd; } +static bool +write_buf (const char *buf, int fdout, ssize_t remain) { +const char *p = buf; +do { + ssize_t written = write (fdout, p, remain); + if (written < 0 && errno == EINTR) + continue; + if (written <= 0) { + fprintf (stderr, "Error: writing to temporary file: %s", +strerror (errno)); + return false; + } + p += written; + remain -= written; +} while (remain > 0); +return true; +} + /* * Copy fdin to fdout, return true on success, and false on errors and * empty input. @@ -253,7 +271,6 @@ copy_fd (int fdout, int fdin) while (! interrupted) { ssize_t remain; char buf[4096]; - char *p; remain = read (fdin, buf, sizeof (buf)); if (remain == 0) @@ -265,21 +282,9 @@ copy_fd (int fdout, int fdin) strerror (errno)); return false; } - - p = buf; - do { - ssize_t written = write (fdout, p, remain); - if (written < 0 && errno == EINTR) - continue; - if (written <= 0) { - fprintf (stderr, "Error: writing to temporary file: %s", -strerror (errno)); - return false; - } - p += written; - remain -= written; - empty = false; - } while (remain > 0); + if (! write_buf (buf, fdout, remain)) + return false; + empty = false; } return (! interrupted && ! empty); -- 2.34.1 ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
[PATCH 1/4] test: start new corpus of test messages for indexing code
This particular message is not recognized by notmuch as mail, but is fine according to e.g. mutt. The trigger for this bad behaviour seems to be a second "From " ocurring at the beginning of the line but inside an attachment. --- test/corpora/indexing/mbox-attachment.eml | 83 +++ 1 file changed, 83 insertions(+) create mode 100644 test/corpora/indexing/mbox-attachment.eml diff --git a/test/corpora/indexing/mbox-attachment.eml b/test/corpora/indexing/mbox-attachment.eml new file mode 100644 index ..98a8fc91 --- /dev/null +++ b/test/corpora/indexing/mbox-attachment.eml @@ -0,0 +1,83 @@ +From da...@tethera.net Sat Feb 5 09:19:10 2022 +From: David Bremner +To: David Bremner +Subject: Re: [RFC PATCH v2 12/12] emacs: whitespace cleanup for keybindings +Date: Sat, 05 Feb 2022 10:19:09 -0400 +Message-ID: <87k0e9o0pu@tethera.net> +MIME-Version: 1.0 +Content-Type: multipart/mixed; boundary="=-=-=" + +--=-=-= +Content-Type: text/plain +Content-Disposition: inline + + +I figured out the race condition in the tests. The previous test was +still running when the failing test started, the joys of using a shared +emacs for running all of the tests in one file. + +The attached diff is split into the the commits that introduce the tests +in question in my working series, but you should be able to just apply +it on top of the posted series if you want. + + +--=-=-= +Content-Type: text/x-diff +Content-Disposition: inline; filename=0001-test-fixups.patch + +From fc88cba7f1f37b9cf3b296eace2422dd0e173502 Mon Sep 17 00:00:00 2001 +From: David Bremner +Date: Thu, 3 Feb 2022 21:05:05 -0400 +Subject: [PATCH] test fixups + +--- + test/T315-emacs-tagging.sh | 9 - + 1 file changed, 4 insertions(+), 5 deletions(-) + +diff --git a/test/T315-emacs-tagging.sh b/test/T315-emacs-tagging.sh +index c9e3e53a..c26413ce 100755 +--- a/test/T315-emacs-tagging.sh b/test/T315-emacs-tagging.sh +@@ -119,7 +119,8 @@ for mode in search show tree unthreaded; do + (notmuch-$mode \"$os_x_darwin_thread\") + (notmuch-test-wait) + (execute-kbd-macro \"+tag-to-be-undone-$mode\") +- (notmuch-tag-undo))" ++ (notmuch-tag-undo) ++ (notmuch-test-wait))" + count=$(notmuch count "tag:tag-to-be-undone-$mode") + test_expect_equal "$count" "0" + +@@ -128,9 +129,7 @@ for mode in search show tree unthreaded; do + (notmuch-$mode \"$os_x_darwin_thread\") + (notmuch-test-wait) + (execute-kbd-macro \"+one-$mode\") +- (notmuch-test-wait) + (execute-kbd-macro \"+two-$mode\") +- (notmuch-test-wait) + (notmuch-tag-undo) + (notmuch-test-wait) + (execute-kbd-macro \"+three-$mode\"))" +@@ -143,7 +142,6 @@ for mode in search show tree unthreaded; do + (notmuch-$mode \"$os_x_darwin_thread\") + (notmuch-test-wait) + (execute-kbd-macro \"+one-$mode\") +- (notmuch-test-wait) + (execute-kbd-macro \"+two-$mode\") + (notmuch-tag-undo) + (notmuch-test-wait) +@@ -159,7 +157,8 @@ for mode in search show tree unthreaded; do + (notmuch-$mode \"$os_x_darwin_thread\") + (notmuch-test-wait) + (execute-kbd-macro \"+tag-to-be-undone-$mode\") +- (execute-kbd-macro (kbd \"C-x u\")))" ++ (execute-kbd-macro (kbd \"C-x u\")) ++ (notmuch-test-wait))" + count=$(notmuch count "tag:tag-to-be-undone-$mode") + test_expect_equal "$count" "0" + done +-- +2.30.2 + + +--=-=-=-- -- 2.34.1 ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
[PATCH 4/4] CLI/insert: escape envelope from
The idea is to do as little parsing and modification of the delivered message as possible. Luckily the position of the "envelope header" lets us escape it by replacing the first 5 characters of the stream with a regular header name (with ':'). --- notmuch-insert.c| 15 ++- test/T070-insert.sh | 1 - 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/notmuch-insert.c b/notmuch-insert.c index bec25a2a..d1244384 100644 --- a/notmuch-insert.c +++ b/notmuch-insert.c @@ -267,10 +267,13 @@ static bool copy_fd (int fdout, int fdin) { bool empty = true; +bool first = true; +const char *header="X-Envelope-From: "; while (! interrupted) { ssize_t remain; char buf[4096]; + const char *p = buf; remain = read (fdin, buf, sizeof (buf)); if (remain == 0) @@ -282,7 +285,17 @@ copy_fd (int fdout, int fdin) strerror (errno)); return false; } - if (! write_buf (buf, fdout, remain)) + + if (first && remain >= 5 && 0 == strncmp (buf, "From ", 5)) { + if (! write_buf (header, fdout, strlen (header))) + return false; + p += 5; + remain -= 5; + } + + first = false; + + if (! write_buf (p, fdout, remain)) return false; empty = false; } diff --git a/test/T070-insert.sh b/test/T070-insert.sh index a297fa73..e1e3b151 100755 --- a/test/T070-insert.sh +++ b/test/T070-insert.sh @@ -293,7 +293,6 @@ for code in OUT_OF_MEMORY XAPIAN_EXCEPTION ; do done test_begin_subtest "insert converts mboxes on delivery" -test_subtest_known_broken notmuch insert +unmboxed < "${TEST_DIRECTORY}"/corpora/indexing/mbox-attachment.eml output=$(notmuch count tag:unmboxed) test_expect_equal "${output}" 1 -- 2.34.1 ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Escape mbox "From " header in notmuch-insert
This obsoletes the WIP patches at [1]. I've settled on this being a reasonable thing to do; this series tries to do it in a more maintainable way. Compared to the WIP series, it is also more careful about writing out the inserted head name. It again repeats the patch adding a test message in mbox form. [1]: id:20220212025503.1690413-1-da...@tethera.net ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
[RFC PATCH] CLI/insert: convert copy_fd to use stdio
Rather than doing our own buffer management, use stdio, and line based I/O. This will simplify doing some simple header filtering. --- As hinted in id:87pmnqx3mk@tethera.net , this looks like a dead end to me at the moment, but perhaps of interest to someone looking at a larger rewrite of notmuch-insert. notmuch-insert.c | 43 +++ 1 file changed, 19 insertions(+), 24 deletions(-) diff --git a/notmuch-insert.c b/notmuch-insert.c index 214d4d03..4ba6f993 100644 --- a/notmuch-insert.c +++ b/notmuch-insert.c @@ -249,38 +249,33 @@ static bool copy_fd (int fdout, int fdin) { bool empty = true; +FILE *fin = fdopen (fdin, "r"); +FILE *fout = fdopen (fdout, "w"); +char *line = NULL; +size_t line_size; +ssize_t line_len; -while (! interrupted) { - ssize_t remain; - char buf[4096]; - char *p; +if (! fin || ! fout) + return false; - remain = read (fdin, buf, sizeof (buf)); - if (remain == 0) - break; - if (remain < 0) { +while (! interrupted && + ( (line_len = getline (, _size, fin)) >= 0)) { + empty = false; + if (fputs (line, fout) < 0) { if (errno == EINTR) continue; - fprintf (stderr, "Error: reading from standard input: %s\n", + fprintf (stderr, "Error: writing to temporary file: %s", strerror (errno)); return false; } - - p = buf; - do { - ssize_t written = write (fdout, p, remain); - if (written < 0 && errno == EINTR) - continue; - if (written <= 0) { - fprintf (stderr, "Error: writing to temporary file: %s", -strerror (errno)); - return false; - } - p += written; - remain -= written; - empty = false; - } while (remain > 0); } +if (line_len < 0 && (errno == EINVAL || errno == ENOMEM)) { + fprintf (stderr, "Error: reading from standard input: %s\n", +strerror (errno)); + return false; +} + +fflush(fout); return (! interrupted && ! empty); } -- 2.34.1 ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: WIP: filter out envelope headers in notmuch-insert.
David Bremner writes: > David Bremner writes: > >> I thought about a more ambitious version that would replace any >> existing "Return-Path" headers, but it seems like significantly more >> work (the current code is not line based), and not obviously >> better. Or maybe I missed the wording in the RFCs that talks about how >> MDAs should behave here. > > The RFC wording I _did_ find is as follows. > > RFC5598 says that (ยง4.3.3) > > The MDA records the RFC5321.MailFrom address into the > RFC5321.Return-Path field. > > where the first is the (address in) envelope from and the second is the > Return-Path: header. I'm not sure how authoritative RFC5598 really is; > it is "Category: Informational", whatever that means. Starting at RFCs some more (as one does), I think that the "envelope header" added by e.g. postfix's local(8) does not count as a RFC5321.MailFrom (although it should contain the same information). These mbox style envelope headers do not seem to super well defined. RFC 4155, which is _about_ mbox, seems to go out of its way not to define the syntax. The following (from RFC5598) seems to indicate that MDAs speak SMTP. Transfer into the MDA is accomplished by a normal MTA transfer mechanism. If correct, that means that notmuch-insert is not an MDA in the sense meant by RFC5598 As an experiment I converted copy_fd to line oriented form (which I guess I'll post seperately), but I realized that further progress would require parsing the "envelope header" line. ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: Test suite timing issues?
On Sun, Feb 13 2022, Tomi Ollila wrote: > On Sat, Feb 12 2022, David Bremner wrote: > >> Tomi Ollila writes: >> >>> >>> Does such a change hide "buggy" functionality ? >> >> We mostly don't use add_message, call notmuch new via NOTMUCH_NEW in >> T050-new.sh. So I think it would mostly not hide bugs in notmuch >> new. OTOH, I'm surprised the same issues with timestamps don't show up >> there, if that is really the problem. >> >>> Or do we consider notmuch new buggy if it does not notice all new messages >>> arrived every time ? >> >> That's a harder question. Maybe? But I don't know how serious a bug it >> is for actual users (who often get mail delivered in various concurrent >> ways). > > The bug could be that the filesystem mtime resolution is 1 sec (hmm what > is the resolution in database?), and 2 emails are delivered to one > directory during one second -- and at the same time notmuch new is scanning > emails and found only one of those. then no more emails are delivered to > that one directory for a while -- so notmuch new does not find those. > > (now that I think of it that could happen to me :O -- just very improbable > (or not, `stat .` outputs Modify: 2022-02-12 22:47:19.550838056 +0200) -- > have to check that timestamp resolution in database). Looked notmuch-new.c -- time_t (seconds since epoch) is used as timestamp comparisons (which would indicate the subsecond resolution most fs' provide is not used)... ... and if so, I wonder why some of our tests are not failing all the time for everyone...? Tomi > > Tomi ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: [PATCH] nmbug: write tags out to a temporary file, not 'nmbug.index'
On Sun, Feb 13 2022, Sean Whitton wrote: > If more than nmbug process is running at once, then each will try to > read and write the same file. The particular failure I've seen is > that the process which finishes first deletes nmbug.index, and then > the other process dies with a FileNotFoundError. So use a distinct > temporary file per process. > --- > devel/nmbug/nmbug | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/devel/nmbug/nmbug b/devel/nmbug/nmbug > index 043c1863..3c069701 100755 > --- a/devel/nmbug/nmbug > +++ b/devel/nmbug/nmbug > @@ -603,8 +603,8 @@ def get_status(): > > > def _index_tags(): > -"Write notmuch tags to the nmbug.index." > -path = _os.path.join(NMBGIT, 'nmbug.index') > +"Write notmuch tags to a temporary index." > +(_ , path) = _tempfile.mkstemp() > query = ' '.join('tag:"{tag}"'.format(tag=tag) for tag in get_tags()) > prefix = '+{0}'.format(_ENCODED_TAG_PREFIX) > _git( Looks like a good idea. Looked a bit when the index file is removed; it is but intermediate error could leave stray index file(s) around. The current version in git always uses the same file, but now with this temporary files some of those could be left hanging around... Tomi > -- > 2.30.2 ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org