http://bugzilla.spamassassin.org/show_bug.cgi?id=3331
------- Additional Comments From [EMAIL PROTECTED] 2004-04-29 10:27 ------- Subject: Re: Bayes option to keep original token as db data (not key). On Thu, Apr 29, 2004 at 10:09:55AM -0700, [EMAIL PROTECTED] wrote: > After thinking on this a little I have a possible proposal that > involves a seperate db_file/table but I haven't worked everything out > yet. I'll think on it some more and reply back to see if it interests > folks. Just to put it my two cents... I too will miss being able to tell what actual tokens are what, but if it gives a performance gain, fine. That information is largely, well, informational. What matters for scoring is that we see "block of data", not how we choose to represent said block. In general, we do _NOT_ want to add unnecessary complexity to this system. It's bad for performance, and it's horrid for maintenance. Adding the original token to the same DB is pointless, you'll eliminate all the performance benefits, at least in DBM. Adding another DB is doable, but fairly complex. Expiry is going to be even more complex than it is now, and take more time. Backup/Restore would just need to dump/restore the extra DB. The code would have to always access the new DB so that the original tokens can be accessed for dump and header display. Oh, and internally, we'd need to track both the hashed and the original. Oh, and the journal will have to be modified to have a "hash-to-text" line type. Not to mention issues with doing multi-word tokens. There's very likely more, but this scares me enough. In short, it adds much more complexity than I'm comfortable with. So unless there's a compelling reason, I'm -1 on the idea of keeping hash<->text data around. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
