[Bug 3225] RFE: Bayes optimizations

bugzilla-daemon 24 Apr 2004 02:58:36 -0000

http://bugzilla.spamassassin.org/show_bug.cgi?id=3225

------- Additional Comments From [EMAIL PROTECTED]  2004-04-23 19:58 -------
Subject: Re:  RFE: Bayes optimizations

On Fri, Apr 23, 2004 at 06:54:20PM -0700, [EMAIL PROTECTED] wrote:
> So I think this means we *can* use a 2-byte atime format safely, since the
> problems we had with that before were *definitely* caused by using
> message-scan and message-learn time as the atime, instead of
> message-received time.

IIRC, my opinion on this was: yeah, we could save a byte or two per
token, but I'd rather have the full 32-bit time value 1) in case we
want to change the expire algorithm again, 2) in case we want to allow
people to change the granularity of expiry, 3) it requires less coding,
less complexity, less testing, less run-time resources to execute said
code, etc.  Oh, and if we wanted to change any of those, we'd pretty
much have to upgrade the database data.

Keeping it as the full raw 32-bit value gives us the most flexibility
while not requiring a lot of resources.  Doing something like hashing
the tokens would save more space on average than doing something to the
atime value anyway.

<sort of a rant>
I don't think shaving off a byte or two would really buy any I/O time
anyway.  IIRC from testing, the majority of the I/O usage was reading the
data from disk.  Since our I/O usage is relatively random, we're not able
to use caching/prefetching, so the time is from having to read whole disk
blocks to get our data.  Disk blocks are typically 512 bytes in size.
So since our data is smaller than that, we'll always have to read the
whole block anyway, so there's no savings for us to have (for example)
30 bytes of data instead of 32 bytes of data.
</rant>

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 3225] RFE: Bayes optimizations

Reply via email to