-- On Tuesday, March 23, 2004 03:31:35 +1200 Sidney Markowitz wrote:
I would like to post about a very rough test I ran to get some comments and in case anybody wants to pursue this more. I may end up not having time to take this all the way to a finished patch.
Just to see what the effects would be, I did some crude patches to make the following changes to Bayes SQL:
[...]
Also in the bayes_token changed token from VARCHAR to BIGINT. I patched SQL.pm to convert the token string into the low order 15 hex digits of the SHA-1 hash of the string. By putting a "0x" in front of that in the SELECT, MySQL will treat the string as a 64bit integer even though perl doesn't itself support 64 bit integers.
[...]
So it looks like if we are willing to sacrifice being able to see the tokens in a readable form when someone dumps the Bayes database, we can makes things about twice as fast and the database a lot smaller.
It is not necessary to sacrifice the ability of reading the tokens as you may add a SHA-1-hash->token dictionary table. This would give a greater database and a slightly slower sa-learn, but there should be no difference in the spamc times.
-- Michael Fischer v. Mollard, network administration Heise Zeitschriften Verlag GmbH & Co KG Helstorfer Stra�e 7 D-30625 Hannover Tel: +49 511 5352 477; Email: [EMAIL PROTECTED]
