http://bugzilla.spamassassin.org/show_bug.cgi?id=3225
------- Additional Comments From [EMAIL PROTECTED] 2004-04-23 19:58 ------- Subject: Re: RFE: Bayes optimizations On Fri, Apr 23, 2004 at 06:54:20PM -0700, [EMAIL PROTECTED] wrote: > So I think this means we *can* use a 2-byte atime format safely, since the > problems we had with that before were *definitely* caused by using > message-scan and message-learn time as the atime, instead of > message-received time. IIRC, my opinion on this was: yeah, we could save a byte or two per token, but I'd rather have the full 32-bit time value 1) in case we want to change the expire algorithm again, 2) in case we want to allow people to change the granularity of expiry, 3) it requires less coding, less complexity, less testing, less run-time resources to execute said code, etc. Oh, and if we wanted to change any of those, we'd pretty much have to upgrade the database data. Keeping it as the full raw 32-bit value gives us the most flexibility while not requiring a lot of resources. Doing something like hashing the tokens would save more space on average than doing something to the atime value anyway. <sort of a rant> I don't think shaving off a byte or two would really buy any I/O time anyway. IIRC from testing, the majority of the I/O usage was reading the data from disk. Since our I/O usage is relatively random, we're not able to use caching/prefetching, so the time is from having to read whole disk blocks to get our data. Disk blocks are typically 512 bytes in size. So since our data is smaller than that, we'll always have to read the whole block anyway, so there's no savings for us to have (for example) 30 bytes of data instead of 32 bytes of data. </rant> ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.