Re: Correcting message references

2023-04-25 Thread Al Haji-Ali


On 25/04/2023, David Bremner wrote:
> I would be interested if it finds your problematic ghost message (and
> how long it takes).

Thanks! This is much quicker than a script that I wrote using quest and 
xapian-delve (which took
minutes!)

Your code took 0.03 seconds to find 74 unreferenced ghost messages out of 9335 
ghost messages,
I can't imagine why so many un-referenced ghost messages were
created. 47 of the 74 messages have "draft" in the ID (seemingly created by 
notmuch).

At first your code didn't find my problematic message (which caused a draft 
with the ID  in `In-Reply-To` to be 
grouped with unrelated messages from a completely separate thread).
But then I deleted the draft (including the file), ran `notmuch new` and re-ran 
the script and the problematic ghost message was correctly reported.

So this approach would work to find un-referenced messages, but not messages 
which are being erroneously grouped (without first deleting the offending 
message), correct?

-- Al
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


Re: Correcting message references

2023-04-25 Thread David Bremner
David Bremner  writes:

> Al Haji-Ali  writes:
>
>> So it does seem to be a lingering ghost message, but I am sure that there 
>> are no messages in the database referring to this ID (except messages in 
>> this current thread which have the ID in the message body).
>> I don't know why this particular ID is associated to messages in another 
>> seemingly unrelated thread as you in the pdf.
>>
>> Is there a way to remove this ghost message record somehow to test it? Or is 
>> there a better way of figuring this out.
>
> It turns out notmuch does not remove ghost messages until all the other
> messages in the thread are deleted. I guess if you temporarily move
> the other messages in the thread out of the way and run notmuch new, the
> ghost message should be deleted.
>
> I don't know how often this lazy deletion is a problem. Deleting
> messages is already a bottleneck in notmuch-new so I am a bit hesitant
> to make it more complicated. It is possible to "garbage collect"
> unreferenced ghost messages. I'll have to think about how big a
> performance hit it would be to add this to notmuch new.
>
> d

Here is a prototype standalone program to find lingering unreferenced
ghosts.  I find 33 (out of about 60k total ghost messages) in about 0.3s
on this laptop. Currently it does not modify the database, but the next
step would be to delete the documents rather than just printing them
out.

If you have libxapian-dev (or equivalent) installed you can build it
with

$ c++ ggc.cc -o ggc -lxapian

and then run it

$ ./ggc ~/.local/share/notmuch/default/xapian

I would be interested if it finds your problematic ghost message (and
how long it takes).


#include 
#include 
int main(int argc, char **argv){
  if (argc != 2) {
fprintf (stderr, "usage: ggc xapian-database\n");
exit (1);
  }

  Xapian::Database db(argv[1]);
  Xapian::Enquire enquire(db);

  enquire.set_query(Xapian::Query("Tghost"));

  auto mset = enquire.get_mset (0,db.get_doccount ());

  for (auto iter=mset.begin (); iter != mset.end(); iter++){
  std::string mid;
  auto doc = iter.get_document ();
  auto term_iter = doc.termlist_begin ();

  term_iter.skip_to ("Q");
  mid=(*term_iter).substr(1);

  std::string ref_term = "XREFERENCE" + mid;
  auto ref_count = db.get_termfreq (ref_term);

  std::string reply_term = "XREPLYTO" + mid;
  auto reply_count = db.get_termfreq (reply_term);

  if (ref_count+reply_count == 0){
	  std::cout << "docid=" <<  *iter;
	  std::cout << " mid=" << mid;
	  std::cout << std::endl;
  }
  }
}
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


Re: Correcting message references

2023-04-23 Thread Al Haji-Ali


On 22/04/2023, David Bremner wrote:
> You need to give the appropriate term prefix. Q for message id,
> or XREPLYTO or XREFERENCE as in my last message.
My apologies. I misunderstood the syntax.

>> The first one is the draft. The second hit is the reason I thought the
>> only place left for notmuch to associate these messages is in the
>> xapian database. Note that if I delete the draft and reindex, only the
>> postlist.glass hit stubbornly remains and there seems to be no way to
>> make notmuch forget about this ID.
>
> As long as some message refers to that ID, notmuch will create a "ghost
> message", used for threading.
I've deleted all messages/draft referring to this message ID, then got these 
results:


$ export MSG_ID=jwvczk7opm8.fsf-monnier+em...@gnu.org
$ export NM_DB=~/.mail/.notmuch/xapian

$ xapian-delve -d $NM_DB -t "XREPLYTO${MSG_ID}"
term 'xreplytojwvczk7opm8.fsf-monnier+em...@gnu.org' not in database

$ xapian-delve -d $NM_DB -t "XREFERENCE${MSG_ID}"
term 'xreferencejwvczk7opm8.fsf-monnier+em...@gnu.org' not in database

$ quest -btype:T -b id:Q -d ~/.mail/.notmuch/xapian "id:${MSG_ID}"
Parsed Query: Query(0 * (qjwvczk7opm8.fsf-monnier+em...@gnu.org AND Tghost))
Exactly 1 matches
MSet:
75982: [0]

$ xapian-delve -d $NM_DB -r 75982 -1
Data for record #75982:

Term List for record #75982:
Gcc96
qjwvczk7opm8.fsf-monnier+em...@gnu.org
Tghost


So it does seem to be a lingering ghost message, but I am sure that there are 
no messages in the database referring to this ID (except messages in this 
current thread which have the ID in the message body).
I don't know why this particular ID is associated to messages in another 
seemingly unrelated thread as you in the pdf.

Is there a way to remove this ghost message record somehow to test it? Or is 
there a better way of figuring this out.

Best regards,
-- Al
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


Re: Correcting message references

2023-04-22 Thread David Bremner
>
> As long as some message refers to that ID, notmuch will create a "ghost
> message", used for threading.

You can look for a specific ghost message with something like

$ quest -btype:T -b id:Q -d .local/share/notmuch/default/xapian \
 "type:ghost and id:jwvczk7opm8.fsf-monnier+em...@gnu.org"

quest is also part of xapian-tools. Unfortunately I don't think quest
understands the way notmuch uses multiletter prefixes (without a :), so
to find references you still need to use xapian-delve.
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


Re: Correcting message references

2023-04-22 Thread David Bremner
Al Haji-Ali  writes:


> is completely unconnected the other 4 messages in the thread. Note
> that if I change the "In-Reply-To" field in this message to anything
> else, notmuch no longer groups these 5 messages into a single thread.
>

Yes, that's puzzling. I did not think about "ghost messages" (see below)
when writing that script, so maybe that's the issue. 

> I tried searching for "jwvczk7opm8.fsf-monnier+em...@gnu.org" using 
> `xapian-delve` but got
>
> ,
> | term 'jwvczk7opm8.fsf-monnier+em...@gnu.org' not in database
> `

You need to give the appropriate term prefix. Q for message id,
or XREPLYTO or XREFERENCE as in my last message.

> The first one is the draft. The second hit is the reason I thought the
> only place left for notmuch to associate these messages is in the
> xapian database. Note that if I delete the draft and reindex, only the
> postlist.glass hit stubbornly remains and there seems to be no way to
> make notmuch forget about this ID.

As long as some message refers to that ID, notmuch will create a "ghost
message", used for threading.
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


Re: Correcting message references

2023-04-22 Thread Al Haji-Ali
Thanks David for the script and the instruction. I am still not sure where 
notmuch is getting the association between my 5 messages in the thread.

The attached pdf is the output of the script. As you can see the draft 
"draft-m25y9oqhep@gmail.com" which contains the header

,
| From: Al Haji-Ali 
| To: bug-gnu-em...@gnu.org, monn...@iro.umontreal.ca, Eli Zaretskii 

| Subject: bug#53632: Function definition history
| In-Reply-To: 
| Message-ID: 
| Date: Sat, 22 Apr 2023 12:34:54 +0100
| X-Notmuch-Emacs-Draft: True
| MIME-Version: 1.0
| Content-Type: text/plain
`

is completely unconnected the other 4 messages in the thread. Note that if I 
change the "In-Reply-To" field in this message to anything else, notmuch no 
longer groups these 5 messages into a single thread.

I tried searching for "jwvczk7opm8.fsf-monnier+em...@gnu.org" using 
`xapian-delve` but got

,
| term 'jwvczk7opm8.fsf-monnier+em...@gnu.org' not in database
`

Finally, I tried grepping for the same ID in my notmuch folder (with all mails 
and database) and got two hits (actually three including this message which I 
am currently writing):

,
| ./Drafts/cur/1682165661.M337064P23717.m2air.local,U=151:2,DS:In-Reply-To: 

| Binary file ./.notmuch/xapian/postlist.glass matches
`

The first one is the draft. The second hit is the reason I thought the only 
place left for notmuch to associate these messages is in the xapian database. 
Note that if I delete the draft and reindex, only the postlist.glass hit 
stubbornly remains and there seems to be no way to make notmuch forget about 
this ID. 

I am running notmuch 0.37 and xapian 1.4.21 if that's relevant.

-- Al



thread2.pdf
Description: Adobe PDF document
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


Re: Correcting message references

2023-04-22 Thread David Bremner
Al Haji-Ali  writes:

> I changed the message, removed "B" from "References" and deleted the
> files of all old (and intermediate) drafts that have "B" in
> "References".  But no matter what I do, I have "B" grouped with "D"
> and any other messages which I create with "In-Reply-To" being "A".

How did you find the files to delete? One trap to watch out for is that
if using notmuch, you should use notmuch search --exlude=false, to make
sure messages are not being hidden because of their tags.

> I suspect that somewhere in the database the IDs of "A" and "B" are
> linked now. Is there a way (short of deleting the database and
> re-indexing) to correct this and remove this connection?

The database does not store relationships explicitely, only via messages
with references to other messages. At a high level you can try the
attached script to get a picture of the corresponding thread.

If you can't run the script, or it doesn't help, you can interrogate the
database directly without going through notmuch.

if the message-id of B is 'f...@example.org' you can search with for
replies with xapian-delve (in xapian-tools on Debian and derivatives).

xapian-delve -d .local/share/notmuch/default/xapian \
 -t 'xreplyto...@example.org'

and for references

xapian-delve -d .local/share/notmuch/default/xapian \
 -t 'xreference...@example.org'

That will give you Xapian record numbers, and you can turn those into
files with something like

xapian-delve -d .local/share/notmuch/default/xapian -r 801793 -1 | \
 perl -ne  's/XF(D|O).*?:// && print'

For records with multiple files, you will have to figure out with file
goes with which directory (or just find the file names, which supposed
ot be unique).




draw-thread
Description: Binary data
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org