http://bugzilla.spamassassin.org/show_bug.cgi?id=3225





------- Additional Comments From [EMAIL PROTECTED]  2004-04-16 13:03 -------
Subject: Re:  RFE: Bayes optimizations

On Thu, Apr 15, 2004 at 04:17:19PM -0700, [EMAIL PROTECTED] wrote:
> ------- Additional Comments From [EMAIL PROTECTED]  2004-04-15 16:17 -------
> Michael, I don't want you to forget the other optimizations:

I haven't forgotten, I was shooting for some low hanging fruit and
correct some bad design decisions (hindsight is 20/20).

> 1) Use an INT(11) for username, using the UID instead of a variable character 
> string
> 

Pretty much done.  The plan, since not everyone will have a system
UID, is to create entries in bayes_vars per username and give them an
id (via sequence).  This requires some additional work on DBs that
don't easily support AUTO_INCREMENT but shouldn't be too bad.

I may also add a seperate table and make the mapping query
customizable to help support folks who want to map IDs to an existing
system or provide a way to limit who can use bayes (Bug 3215).

> 2) Express atime as a two byte int with granularity of one day instead of a 4
> byte int with granularity of a second. That not only cuts two bytes off of the
> atime in every record, but it means that the record is updated for the time no
> more than once a day.

I'm unsure on this one.  Anyone else have any opinions?


> 3) The more radical change, use CHAR(5) from SHA1 hash for the tokens.
> 

Close to done.  Gonna have to encode this binary data on dump and
backup and then recode it on restore, but I figured out a nice way to
handle that I think.  Will be a bummer to lose the ability to see the
raw token data, but I suppose if it makes things smaller/faster it
will be worth it.

Anyone know how well Berkeley DB handles binary keys?

I think we'll need to do a 10-fold cross-validation test after we put
this change in to make sure we didn't degrade our results.

Michael





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to