Re: [MlMt] Improving search performance?

2023-06-25 Thread Robert M . Münch
On 16 Jun 2023, at 17:17, Bill Cole wrote:

> MM uses an indexing mechanism that appears to be custom-designed for the 
> specific purpose of searching email. You can see the artifacts of that in 
> ~/Library/Application Support/MailMate/Database.noindex/.

That looks much like a file-based approach to store data, which wouldn't scale 
if things are read on-demand and need to be parsed.

> Only Benny could conceivably explain the details,

Sure.

> but it seems to me to be unlikely that he would get much from ripping all 
> that out and replacing it with SQLite or some other off-the-shelf tool.

Using a classical database with indexes and the mail information MM recognizes 
put into columns, should give mostly instant answers.

> One serious issue with indexing email is that email is highly divergent in 
> data structure, and while you can do a simple index for basic standard mail 
> metadata, "full text" and "all headers" search for mail is a nightmare 
> because real-world mail breaks almost every rule theoretically governing it 
> and it is not a simple matter to determine what is or is not body text. Email 
> typically arrives with multiple alternative parts theoretically representing 
> the same message, possibly QP or B64 encoded and usually including one 
> version with HTML markup. And that markup can be bad, wrong, or even 
> intentionally malicious.

Well, MM already handles all this, otherwise we couldn't use it as we do. Those 
parts are will known to MM.

> Very large mail stores are inherently tough to search.

After pre-processing all the mail mess, I don't think so. Searching in Gmail is 
fast. MM is already much better than other clients.

IMO the use-case search *1+ million emails as fast as possible* is just not in 
scope for most of the clients.

--

Robert M. Münch


**Note**: The .ASC file contains a digital PGP signature of this email. It can 
be used to check that this email is from me and was not changed since I wrote 
it. **You can’t do anything else with this file.**

**Hinweis**: Die .ASC Datei enthält eine digitale PGP Signatur dieser Email. 
Mit dieser kann überprüft werden, dass diese Email von mir geschrieben und 
seitdem nicht verändert wurde. **Sie können mit der Datei ansonsten nichts 
weiter anfangen.**


signature.asc
Description: OpenPGP digital signature
___
mailmate mailing list
mailmate@lists.freron.com
https://lists.freron.com/listinfo/mailmate


Re: [MlMt] Improving search performance?

2023-06-25 Thread Robert M . Münch
On 16 Jun 2023, at 12:28, aisrael wrote:

> FWIW, I searched a word in 100 000 messages and it took 2 seconds. I use 
> release 5937.

It takes about 6s-8s for me and I have about 300.000 emails.

However, such searches utilizing full-text indexing and normal indexing should 
really be instant these days.

--

Robert M. Münch


signature.asc
Description: OpenPGP digital signature
___
mailmate mailing list
mailmate@lists.freron.com
https://lists.freron.com/listinfo/mailmate