[Bug 3331] Bayes option to keep original token as db data (not key).

bugzilla-daemon 29 Apr 2004 17:27:13 -0000

http://bugzilla.spamassassin.org/show_bug.cgi?id=3331

------- Additional Comments From [EMAIL PROTECTED]  2004-04-29 10:27 -------
Subject: Re:  Bayes option to keep original token as db data (not key).

On Thu, Apr 29, 2004 at 10:09:55AM -0700, [EMAIL PROTECTED] wrote:
> After thinking on this a little I have a possible proposal that
> involves a seperate db_file/table but I haven't worked everything out
> yet.  I'll think on it some more and reply back to see if it interests
> folks.

Just to put it my two cents...

I too will miss being able to tell what actual tokens are what, but if
it gives a performance gain, fine.  That information is largely, well,
informational.  What matters for scoring is that we see "block of data",
not how we choose to represent said block.

In general, we do _NOT_ want to add unnecessary complexity to this
system.  It's bad for performance, and it's horrid for maintenance.

Adding the original token to the same DB is pointless, you'll eliminate
all the performance benefits, at least in DBM.

Adding another DB is doable, but fairly complex.  Expiry is going to be
even more complex than it is now, and take more time.  Backup/Restore
would just need to dump/restore the extra DB.  The code would have to
always access the new DB so that the original tokens can be accessed for
dump and header display.  Oh, and internally, we'd need to track both the
hashed and the original.  Oh, and the journal will have to be modified
to have a "hash-to-text" line type.  Not to mention issues with doing
multi-word tokens.  There's very likely more, but this scares me enough.

In short, it adds much more complexity than I'm comfortable with.
So unless there's a compelling reason, I'm -1 on the idea of keeping
hash<->text data around.

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 3331] Bayes option to keep original token as db data (not key).

Reply via email to