Re: Correcting message references

2023-04-25 Thread Al Haji-Ali


On 25/04/2023, David Bremner wrote:
> I would be interested if it finds your problematic ghost message (and
> how long it takes).

Thanks! This is much quicker than a script that I wrote using quest and 
xapian-delve (which took
minutes!)

Your code took 0.03 seconds to find 74 unreferenced ghost messages out of 9335 
ghost messages,
I can't imagine why so many un-referenced ghost messages were
created. 47 of the 74 messages have "draft" in the ID (seemingly created by 
notmuch).

At first your code didn't find my problematic message (which caused a draft 
with the ID  in `In-Reply-To` to be 
grouped with unrelated messages from a completely separate thread).
But then I deleted the draft (including the file), ran `notmuch new` and re-ran 
the script and the problematic ghost message was correctly reported.

So this approach would work to find un-referenced messages, but not messages 
which are being erroneously grouped (without first deleting the offending 
message), correct?

-- Al
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org


Re: Correcting message references

2023-04-25 Thread David Bremner
David Bremner  writes:

> Al Haji-Ali  writes:
>
>> So it does seem to be a lingering ghost message, but I am sure that there 
>> are no messages in the database referring to this ID (except messages in 
>> this current thread which have the ID in the message body).
>> I don't know why this particular ID is associated to messages in another 
>> seemingly unrelated thread as you in the pdf.
>>
>> Is there a way to remove this ghost message record somehow to test it? Or is 
>> there a better way of figuring this out.
>
> It turns out notmuch does not remove ghost messages until all the other
> messages in the thread are deleted. I guess if you temporarily move
> the other messages in the thread out of the way and run notmuch new, the
> ghost message should be deleted.
>
> I don't know how often this lazy deletion is a problem. Deleting
> messages is already a bottleneck in notmuch-new so I am a bit hesitant
> to make it more complicated. It is possible to "garbage collect"
> unreferenced ghost messages. I'll have to think about how big a
> performance hit it would be to add this to notmuch new.
>
> d

Here is a prototype standalone program to find lingering unreferenced
ghosts.  I find 33 (out of about 60k total ghost messages) in about 0.3s
on this laptop. Currently it does not modify the database, but the next
step would be to delete the documents rather than just printing them
out.

If you have libxapian-dev (or equivalent) installed you can build it
with

$ c++ ggc.cc -o ggc -lxapian

and then run it

$ ./ggc ~/.local/share/notmuch/default/xapian

I would be interested if it finds your problematic ghost message (and
how long it takes).


#include 
#include 
int main(int argc, char **argv){
  if (argc != 2) {
fprintf (stderr, "usage: ggc xapian-database\n");
exit (1);
  }

  Xapian::Database db(argv[1]);
  Xapian::Enquire enquire(db);

  enquire.set_query(Xapian::Query("Tghost"));

  auto mset = enquire.get_mset (0,db.get_doccount ());

  for (auto iter=mset.begin (); iter != mset.end(); iter++){
  std::string mid;
  auto doc = iter.get_document ();
  auto term_iter = doc.termlist_begin ();

  term_iter.skip_to ("Q");
  mid=(*term_iter).substr(1);

  std::string ref_term = "XREFERENCE" + mid;
  auto ref_count = db.get_termfreq (ref_term);

  std::string reply_term = "XREPLYTO" + mid;
  auto reply_count = db.get_termfreq (reply_term);

  if (ref_count+reply_count == 0){
	  std::cout << "docid=" <<  *iter;
	  std::cout << " mid=" << mid;
	  std::cout << std::endl;
  }
  }
}
___
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org