Re: Correcting message references
On 25/04/2023, David Bremner wrote: > I would be interested if it finds your problematic ghost message (and > how long it takes). Thanks! This is much quicker than a script that I wrote using quest and xapian-delve (which took minutes!) Your code took 0.03 seconds to find 74 unreferenced ghost messages out of 9335 ghost messages, I can't imagine why so many un-referenced ghost messages were created. 47 of the 74 messages have "draft" in the ID (seemingly created by notmuch). At first your code didn't find my problematic message (which caused a draft with the ID in `In-Reply-To` to be grouped with unrelated messages from a completely separate thread). But then I deleted the draft (including the file), ran `notmuch new` and re-ran the script and the problematic ghost message was correctly reported. So this approach would work to find un-referenced messages, but not messages which are being erroneously grouped (without first deleting the offending message), correct? -- Al ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: Correcting message references
David Bremner writes: > Al Haji-Ali writes: > >> So it does seem to be a lingering ghost message, but I am sure that there >> are no messages in the database referring to this ID (except messages in >> this current thread which have the ID in the message body). >> I don't know why this particular ID is associated to messages in another >> seemingly unrelated thread as you in the pdf. >> >> Is there a way to remove this ghost message record somehow to test it? Or is >> there a better way of figuring this out. > > It turns out notmuch does not remove ghost messages until all the other > messages in the thread are deleted. I guess if you temporarily move > the other messages in the thread out of the way and run notmuch new, the > ghost message should be deleted. > > I don't know how often this lazy deletion is a problem. Deleting > messages is already a bottleneck in notmuch-new so I am a bit hesitant > to make it more complicated. It is possible to "garbage collect" > unreferenced ghost messages. I'll have to think about how big a > performance hit it would be to add this to notmuch new. > > d Here is a prototype standalone program to find lingering unreferenced ghosts. I find 33 (out of about 60k total ghost messages) in about 0.3s on this laptop. Currently it does not modify the database, but the next step would be to delete the documents rather than just printing them out. If you have libxapian-dev (or equivalent) installed you can build it with $ c++ ggc.cc -o ggc -lxapian and then run it $ ./ggc ~/.local/share/notmuch/default/xapian I would be interested if it finds your problematic ghost message (and how long it takes). #include #include int main(int argc, char **argv){ if (argc != 2) { fprintf (stderr, "usage: ggc xapian-database\n"); exit (1); } Xapian::Database db(argv[1]); Xapian::Enquire enquire(db); enquire.set_query(Xapian::Query("Tghost")); auto mset = enquire.get_mset (0,db.get_doccount ()); for (auto iter=mset.begin (); iter != mset.end(); iter++){ std::string mid; auto doc = iter.get_document (); auto term_iter = doc.termlist_begin (); term_iter.skip_to ("Q"); mid=(*term_iter).substr(1); std::string ref_term = "XREFERENCE" + mid; auto ref_count = db.get_termfreq (ref_term); std::string reply_term = "XREPLYTO" + mid; auto reply_count = db.get_termfreq (reply_term); if (ref_count+reply_count == 0){ std::cout << "docid=" << *iter; std::cout << " mid=" << mid; std::cout << std::endl; } } } ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: Correcting message references
On 22/04/2023, David Bremner wrote: > You need to give the appropriate term prefix. Q for message id, > or XREPLYTO or XREFERENCE as in my last message. My apologies. I misunderstood the syntax. >> The first one is the draft. The second hit is the reason I thought the >> only place left for notmuch to associate these messages is in the >> xapian database. Note that if I delete the draft and reindex, only the >> postlist.glass hit stubbornly remains and there seems to be no way to >> make notmuch forget about this ID. > > As long as some message refers to that ID, notmuch will create a "ghost > message", used for threading. I've deleted all messages/draft referring to this message ID, then got these results: $ export MSG_ID=jwvczk7opm8.fsf-monnier+em...@gnu.org $ export NM_DB=~/.mail/.notmuch/xapian $ xapian-delve -d $NM_DB -t "XREPLYTO${MSG_ID}" term 'xreplytojwvczk7opm8.fsf-monnier+em...@gnu.org' not in database $ xapian-delve -d $NM_DB -t "XREFERENCE${MSG_ID}" term 'xreferencejwvczk7opm8.fsf-monnier+em...@gnu.org' not in database $ quest -btype:T -b id:Q -d ~/.mail/.notmuch/xapian "id:${MSG_ID}" Parsed Query: Query(0 * (qjwvczk7opm8.fsf-monnier+em...@gnu.org AND Tghost)) Exactly 1 matches MSet: 75982: [0] $ xapian-delve -d $NM_DB -r 75982 -1 Data for record #75982: Term List for record #75982: Gcc96 qjwvczk7opm8.fsf-monnier+em...@gnu.org Tghost So it does seem to be a lingering ghost message, but I am sure that there are no messages in the database referring to this ID (except messages in this current thread which have the ID in the message body). I don't know why this particular ID is associated to messages in another seemingly unrelated thread as you in the pdf. Is there a way to remove this ghost message record somehow to test it? Or is there a better way of figuring this out. Best regards, -- Al ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: Correcting message references
> > As long as some message refers to that ID, notmuch will create a "ghost > message", used for threading. You can look for a specific ghost message with something like $ quest -btype:T -b id:Q -d .local/share/notmuch/default/xapian \ "type:ghost and id:jwvczk7opm8.fsf-monnier+em...@gnu.org" quest is also part of xapian-tools. Unfortunately I don't think quest understands the way notmuch uses multiletter prefixes (without a :), so to find references you still need to use xapian-delve. ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: Correcting message references
Al Haji-Ali writes: > is completely unconnected the other 4 messages in the thread. Note > that if I change the "In-Reply-To" field in this message to anything > else, notmuch no longer groups these 5 messages into a single thread. > Yes, that's puzzling. I did not think about "ghost messages" (see below) when writing that script, so maybe that's the issue. > I tried searching for "jwvczk7opm8.fsf-monnier+em...@gnu.org" using > `xapian-delve` but got > > , > | term 'jwvczk7opm8.fsf-monnier+em...@gnu.org' not in database > ` You need to give the appropriate term prefix. Q for message id, or XREPLYTO or XREFERENCE as in my last message. > The first one is the draft. The second hit is the reason I thought the > only place left for notmuch to associate these messages is in the > xapian database. Note that if I delete the draft and reindex, only the > postlist.glass hit stubbornly remains and there seems to be no way to > make notmuch forget about this ID. As long as some message refers to that ID, notmuch will create a "ghost message", used for threading. ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: Correcting message references
Thanks David for the script and the instruction. I am still not sure where notmuch is getting the association between my 5 messages in the thread. The attached pdf is the output of the script. As you can see the draft "draft-m25y9oqhep@gmail.com" which contains the header , | From: Al Haji-Ali | To: bug-gnu-em...@gnu.org, monn...@iro.umontreal.ca, Eli Zaretskii | Subject: bug#53632: Function definition history | In-Reply-To: | Message-ID: | Date: Sat, 22 Apr 2023 12:34:54 +0100 | X-Notmuch-Emacs-Draft: True | MIME-Version: 1.0 | Content-Type: text/plain ` is completely unconnected the other 4 messages in the thread. Note that if I change the "In-Reply-To" field in this message to anything else, notmuch no longer groups these 5 messages into a single thread. I tried searching for "jwvczk7opm8.fsf-monnier+em...@gnu.org" using `xapian-delve` but got , | term 'jwvczk7opm8.fsf-monnier+em...@gnu.org' not in database ` Finally, I tried grepping for the same ID in my notmuch folder (with all mails and database) and got two hits (actually three including this message which I am currently writing): , | ./Drafts/cur/1682165661.M337064P23717.m2air.local,U=151:2,DS:In-Reply-To: | Binary file ./.notmuch/xapian/postlist.glass matches ` The first one is the draft. The second hit is the reason I thought the only place left for notmuch to associate these messages is in the xapian database. Note that if I delete the draft and reindex, only the postlist.glass hit stubbornly remains and there seems to be no way to make notmuch forget about this ID. I am running notmuch 0.37 and xapian 1.4.21 if that's relevant. -- Al thread2.pdf Description: Adobe PDF document ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org
Re: Correcting message references
Al Haji-Ali writes: > I changed the message, removed "B" from "References" and deleted the > files of all old (and intermediate) drafts that have "B" in > "References". But no matter what I do, I have "B" grouped with "D" > and any other messages which I create with "In-Reply-To" being "A". How did you find the files to delete? One trap to watch out for is that if using notmuch, you should use notmuch search --exlude=false, to make sure messages are not being hidden because of their tags. > I suspect that somewhere in the database the IDs of "A" and "B" are > linked now. Is there a way (short of deleting the database and > re-indexing) to correct this and remove this connection? The database does not store relationships explicitely, only via messages with references to other messages. At a high level you can try the attached script to get a picture of the corresponding thread. If you can't run the script, or it doesn't help, you can interrogate the database directly without going through notmuch. if the message-id of B is 'f...@example.org' you can search with for replies with xapian-delve (in xapian-tools on Debian and derivatives). xapian-delve -d .local/share/notmuch/default/xapian \ -t 'xreplyto...@example.org' and for references xapian-delve -d .local/share/notmuch/default/xapian \ -t 'xreference...@example.org' That will give you Xapian record numbers, and you can turn those into files with something like xapian-delve -d .local/share/notmuch/default/xapian -r 801793 -1 | \ perl -ne 's/XF(D|O).*?:// && print' For records with multiple files, you will have to figure out with file goes with which directory (or just find the file names, which supposed ot be unique). draw-thread Description: Binary data ___ notmuch mailing list -- notmuch@notmuchmail.org To unsubscribe send an email to notmuch-le...@notmuchmail.org