Re: bug: chokes on long directory names (was: Re: out of memory on idle machine)

2021-03-17 Thread David Bremner
Gregor Zattler  writes:

> Hi David, Olly, notmuch and xapian developers,
> * David Bremner  [11. Feb. 2021]:
>> David Bremner  writes:
>> As a kind of desperation move, you could try bisecting your mailstore,
>> to see how small of a set of messages you can duplicate the problem
>> with.
>
> this I did, somehow.  I found the culprit: It's a maildir
> with one single mail in it.  The name of the maildir is
> exceptionally long [because generated from a List-Id:
> -Header] and the mail arrived at the very day, my notmuch
> database corrupted.  This maildir alone provokes that every
> next notmuch new will rescan all (?) files.

Hi Gregor;

I am very impressed with your persistence. I suspect it is a bug in
notmuch. I don't know all the details yet, but in the normal case the
directory name is added to the database prefixed with XDIRECTORY. I
noticed this isn't happening in the case of directories 234 or
longer. That is roughly the Xapian term limit of 245 characters in
total. I'm not sure why the discrepency of one character, but the main
point is that notmuch is probably improperly ignoring an error from
Xapian when adding these overlong terms.

Thanks again for the debugging, I suspect would have never found this
bug on my own.

David
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


[PATCH] test: add known broken test for long directory bug

2021-03-17 Thread David Bremner
In [1] Gregor Zattler explained the results of his hard working
tracking down a bug in notmuch with long directories. This test
duplicates the bug.

[1]: id:20210317194728.GB5561@no.workgroup
---
 test/T050-new.sh | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/test/T050-new.sh b/test/T050-new.sh
index 76bda959..f84dc2b0 100755
--- a/test/T050-new.sh
+++ b/test/T050-new.sh
@@ -339,6 +339,20 @@ test_expect_code 1 "NOTMUCH_NEW --debug 2>&1"
 
 notmuch config set new.tags $OLDCONFIG
 
+test_begin_subtest "Long directory names don't cause rescan"
+test_subtest_known_broken
+name=$(printf 'z%.0s' {1..234})
+generate_message [dir]=$name
+NOTMUCH_NEW  > OUTPUT
+notmuch new  >> OUTPUT
+rm -r ${MAIL_DIR}/${name}
+notmuch new >> OUTPUT
+cat < EXPECTED
+Added 1 new message to the database.
+No new mail.
+No new mail. Removed 1 message.
+EOF
+test_expect_equal_file EXPECTED OUTPUT
 
 test_begin_subtest "Xapian exception: read only files"
 chmod u-w ${MAIL_DIR}/.notmuch/xapian/*.*
-- 
2.30.2
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


bug: chokes on long directory names (was: Re: out of memory on idle machine)

2021-03-17 Thread Gregor Zattler
Hi David, Olly, notmuch and xapian developers,
* David Bremner  [11. Feb. 2021]:
> David Bremner  writes:
> As a kind of desperation move, you could try bisecting your mailstore,
> to see how small of a set of messages you can duplicate the problem
> with.

this I did, somehow.  I found the culprit: It's a maildir
with one single mail in it.  The name of the maildir is
exceptionally long [because generated from a List-Id:
-Header] and the mail arrived at the very day, my notmuch
database corrupted.  This maildir alone provokes that every
next notmuch new will rescan all (?) files.

Then I tried to only index this maildir, it showed the same
strange re-indexing but even when running notmuch new for a
while in a loop (>1000 times), the database showed no
corruption.

When instead I shorten the name of the maildir to three
characters with the very same email file in it, nothing
happens, it indexes the file once and not again.

Then I prolonged the name of the file instead of the
directory and even with the longest possible filename (or
path?)

/home/grfz/Mail/nuk/new/1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no1607641473.31514_2.no16076414734160.14_2.no

notmuch has no problem indexing this and not to reindex it
in the next run.


So notmuch or xapian (I don't know) chokes on extreme long
directory names.  I consider this to be a bug.



My scripts create this long names from List-Id and some
such.  The one which triggered the problems is from an online
shop:

u+mq6tamjqhe3cm2j5giydembrgiytamrtga2deojogexdsmzygm4egnbuifatcnrsgazdejjugbzgkylmfvxw43djnzsxg2dpoaxgizjgna6ton3bg4zdsobsgmytczlcme3dentehaydmnjxmy4doyrwha4tgobgoi6xizlmmvtxeylqnastimdhnv4c43tfoqthipldovzxi33nmvzhgllxmvwgg33...@real-onlineshop.de/

Since, as I tested, this can be reproduced with the simplest
of email in a maildir with an extremly long name, I do not
attach the maildir in question.  But if anyone wants it I
can send it.



I then had a look at other long directory names and there is
another one which also triggers the problem, it also has
only one email in it and arrived on 12th of January:

u+mq6wcodfgmygcjtjhuzdamrrgaytemjrhe2dqmbqfyys4mbxgazugnbsie3doobsgfcdmobfgqygg5ltorxw2zlsomxgo2lunrqweltdn5wsm2b5mu3tkmddhbrdoyrwgvsgeobymi2dszbtg4zdamztmm4dsmzvgjssm4r5orswyzlhojqxa2bfgqygo3lyfzxgk5bgoq6xa4tjozqw...@customers.gitlab.com


Since I removed both on my laptop, notmuch new works again,
yeah!  Now I will have a look on my .procmailrc.

Thanks for your attention, thanks for notmuch and for xapian,
Grgeor

--
 -... --- .-. . -.. ..--.. ...-.-
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org