Re: [notmuch] Duplicate In-reply-to line 326 lib/message.cc

2009-11-28 Thread Carl Worth
On Sat, 28 Nov 2009 05:40:13 -0400, David Bremner  
wrote:
> Now it seems that any search that is non-empty (i.e. matches
> something) crashes with a duplicate In-Reply-To ID. This is in git
> revision 92c4dcc (although it was the same yesterday).  The oddest
> thing is that the second message-id is a common English word.
...
> Internal error: Message 877htzhn9e.wl%jema...@gnu.org has duplicate 
> In-Reply-To IDs: 1e5bcefd0911081424p12eb6fa9te57ff4cfeb83f...@mail.gmail.com 
> and data
>  (lib/message.cc:326).

Thanks David,

I replicated this without any difficulty. And the fix was to just
correct a stupid mistake on my part. The only reason I hadn't noticed
this myself earlier is that I've been doing debug builds with:

make CFLAGS="-g -DDEBUG"

instead of:

make CFLAGS="-g -DDEBUG" CXXFLAGS="-g -DDEBUG"

If we can, I'd like to see about making the former work, to avoid hiding
things like this in the future.

> At the moment I don't have any real good ideas for how to debug this
> (or any real familiarity with notmuch internals).  I put a test corpus
> of messages (all from public mailing lists) at

Before I realized how easy the bug was to replicate and fix, I was going
to give a couple of debugging ideas here. I guess I'll briefly mention
things anyway.

The core of what we store in the database for each message is a single
list of "terms", (each a string of text). We use different terms for
different purposes by prefixing some with particular sub-strings. See
the large comment at the top of lib/database.cc for some details on
this.

So if there *were* an actual case of a duplicate In-Reply-To term here,
the first thing to do would be to inspect the actual terms in the
database for the document of the message of interest. Up until now, what
I've been using for this is a little utility I wrote called
xapian-dump. It exists deep in the code history of notmuch. So one could
use git log to find the commit that removed it and then check out the
commit before that to get the utility.

But xapian-dump is pretty dumb and all it does is dump all terms from
all documents in the database, (it also dumps all the data and values
From those documents, but we're not talking about those parts
here). So that's a *lot* of output. More interesting would be a tool to
dump just the terms from the message you're wanting to debug. So that's
why I want to introduce a new "notmuch search --for=terms" or so to have
a much more useful debugging tool.

Anyway, I hope that was informative.

Thanks for reporting the bug!

-Carl

commit 64c8d6227a90ea6c37ea112ee20b14f16b9b46e7
Author: Carl Worth 
Date:   Sat Nov 28 10:01:22 2009 -0800

Avoid bogus internal error reporting duplicate In-Reply-To IDs.

This error was tirggered with a debugging build via:

make CXXFLAGS="-DDEBUG"

and reported by David Bremner. The actual error is that I'm an
idiot that doesn't know how to use strcmp's return value. Of
course, the strcmp interface scores a negative 7 on Rusty Russell
ranking of bad interfaces:

http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html

diff --git a/lib/message.cc b/lib/message.cc
index 03b8c81..49519f1 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -318,7 +318,7 @@ _notmuch_message_get_in_reply_to (notmuch_message_t *message
 in_reply_to = *i;
 
 if (i != message->doc.termlist_end () &&
-   strncmp ((*i).c_str (), prefix, prefix_len))
+   strncmp ((*i).c_str (), prefix, prefix_len) == 0)
 {
INTERNAL_ERROR ("Message %s has duplicate In-Reply-To IDs: %s and %s\n",
notmuch_message_get_message_id (message),


pgpAMGgje8iiU.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] Duplicate In-reply-to line 326 lib/message.cc

2009-11-28 Thread Carl Worth
On Sat, 28 Nov 2009 05:40:13 -0400, David Bremner  
wrote:
> Now it seems that any search that is non-empty (i.e. matches
> something) crashes with a duplicate In-Reply-To ID. This is in git
> revision 92c4dcc (although it was the same yesterday).  The oddest
> thing is that the second message-id is a common English word.
...
> Internal error: Message 877htzhn9e.wl%jemarch at gnu.org has duplicate 
> In-Reply-To IDs: 1e5bcefd0911081424p12eb6fa9te57ff4cfeb83fcdd at 
> mail.gmail.com and data
>  (lib/message.cc:326).

Thanks David,

I replicated this without any difficulty. And the fix was to just
correct a stupid mistake on my part. The only reason I hadn't noticed
this myself earlier is that I've been doing debug builds with:

make CFLAGS="-g -DDEBUG"

instead of:

make CFLAGS="-g -DDEBUG" CXXFLAGS="-g -DDEBUG"

If we can, I'd like to see about making the former work, to avoid hiding
things like this in the future.

> At the moment I don't have any real good ideas for how to debug this
> (or any real familiarity with notmuch internals).  I put a test corpus
> of messages (all from public mailing lists) at

Before I realized how easy the bug was to replicate and fix, I was going
to give a couple of debugging ideas here. I guess I'll briefly mention
things anyway.

The core of what we store in the database for each message is a single
list of "terms", (each a string of text). We use different terms for
different purposes by prefixing some with particular sub-strings. See
the large comment at the top of lib/database.cc for some details on
this.

So if there *were* an actual case of a duplicate In-Reply-To term here,
the first thing to do would be to inspect the actual terms in the
database for the document of the message of interest. Up until now, what
I've been using for this is a little utility I wrote called
xapian-dump. It exists deep in the code history of notmuch. So one could
use git log to find the commit that removed it and then check out the
commit before that to get the utility.

But xapian-dump is pretty dumb and all it does is dump all terms from
all documents in the database, (it also dumps all the data and values


[notmuch] Duplicate In-reply-to line 326 lib/message.cc

2009-11-28 Thread David Bremner

On the trail of a searching problem, I enabled debugging with 
   make CFLAGS="-g -DDEBUG" CXXFLAGS="-g -DDEBUG"

Now it seems that any search that is non-empty (i.e. matches
something) crashes with a duplicate In-Reply-To ID. This is in git
revision 92c4dcc (although it was the same yesterday).  The oddest
thing is that the second message-id is a common English word.

Here is a trace

dulcinea:~/tmp % ~/projects/notmuch/notmuch search spam
Query string is:
spam
Final query is:
Xapian::Query((Tmail AND Zspam:(pos=1)))
Query string is:
thread:13c033781712e92541a5591320ac0ff4
Query string is:
thread:13c033781712e92541a5591320ac0ff4 AND (spam)
Final query is:
Xapian::Query((Tmail AND 0 * G13c033781712e92541a5591320ac0ff4))
Final query is:
Xapian::Query((Tmail AND 0 * G13c033781712e92541a5591320ac0ff4 AND 
Zspam:(pos=1)))
Internal error: Message 877htzhn9e.wl%jema...@gnu.org has duplicate In-Reply-To 
IDs: 1e5bcefd0911081424p12eb6fa9te57ff4cfeb83f...@mail.gmail.com and data
 (lib/message.cc:326).

At the moment I don't have any real good ideas for how to debug this
(or any real familiarity with notmuch internals).  I put a test corpus
of messages (all from public mailing lists) at

   http://pivot.cs.unb.ca/scratch/mailtest.tgz

The current tarball is about 5M.  The machine has plenty of bandwidth
(not meant as a challenge to DDOS hobbyists :) ).

d

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] Duplicate In-reply-to line 326 lib/message.cc

2009-11-28 Thread David Bremner

On the trail of a searching problem, I enabled debugging with 
   make CFLAGS="-g -DDEBUG" CXXFLAGS="-g -DDEBUG"

Now it seems that any search that is non-empty (i.e. matches
something) crashes with a duplicate In-Reply-To ID. This is in git
revision 92c4dcc (although it was the same yesterday).  The oddest
thing is that the second message-id is a common English word.

Here is a trace

dulcinea:~/tmp % ~/projects/notmuch/notmuch search spam
Query string is:
spam
Final query is:
Xapian::Query((Tmail AND Zspam:(pos=1)))
Query string is:
thread:13c033781712e92541a5591320ac0ff4
Query string is:
thread:13c033781712e92541a5591320ac0ff4 AND (spam)
Final query is:
Xapian::Query((Tmail AND 0 * G13c033781712e92541a5591320ac0ff4))
Final query is:
Xapian::Query((Tmail AND 0 * G13c033781712e92541a5591320ac0ff4 AND 
Zspam:(pos=1)))
Internal error: Message 877htzhn9e.wl%jemarch at gnu.org has duplicate 
In-Reply-To IDs: 1e5bcefd0911081424p12eb6fa9te57ff4cfeb83fcdd at mail.gmail.com 
and data
 (lib/message.cc:326).

At the moment I don't have any real good ideas for how to debug this
(or any real familiarity with notmuch internals).  I put a test corpus
of messages (all from public mailing lists) at

   http://pivot.cs.unb.ca/scratch/mailtest.tgz

The current tarball is about 5M.  The machine has plenty of bandwidth
(not meant as a challenge to DDOS hobbyists :) ).

d