http://bugzilla.spamassassin.org/show_bug.cgi?id=3225
------- Additional Comments From [EMAIL PROTECTED] 2004-04-16 13:03 ------- Subject: Re: RFE: Bayes optimizations On Thu, Apr 15, 2004 at 04:17:19PM -0700, [EMAIL PROTECTED] wrote: > ------- Additional Comments From [EMAIL PROTECTED] 2004-04-15 16:17 ------- > Michael, I don't want you to forget the other optimizations: I haven't forgotten, I was shooting for some low hanging fruit and correct some bad design decisions (hindsight is 20/20). > 1) Use an INT(11) for username, using the UID instead of a variable character > string > Pretty much done. The plan, since not everyone will have a system UID, is to create entries in bayes_vars per username and give them an id (via sequence). This requires some additional work on DBs that don't easily support AUTO_INCREMENT but shouldn't be too bad. I may also add a seperate table and make the mapping query customizable to help support folks who want to map IDs to an existing system or provide a way to limit who can use bayes (Bug 3215). > 2) Express atime as a two byte int with granularity of one day instead of a 4 > byte int with granularity of a second. That not only cuts two bytes off of the > atime in every record, but it means that the record is updated for the time no > more than once a day. I'm unsure on this one. Anyone else have any opinions? > 3) The more radical change, use CHAR(5) from SHA1 hash for the tokens. > Close to done. Gonna have to encode this binary data on dump and backup and then recode it on restore, but I figured out a nice way to handle that I think. Will be a bummer to lose the ability to see the raw token data, but I suppose if it makes things smaller/faster it will be worth it. Anyone know how well Berkeley DB handles binary keys? I think we'll need to do a 10-fold cross-validation test after we put this change in to make sure we didn't degrade our results. Michael ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.