Re: Limit Bayes Token Length

Daniel Quinlan 3 May 2004 04:34:29 -0000

Theo Van Dinter <[EMAIL PROTECTED]> writes:

> What's the issue exactly?  If we're hashing down to 5 bytes anyway,
> who cares what size the input is?  The large length tokens aren't a
> big deal unless huge mails start going around (who cares if we have a
> handful of large tokens?)


1. We should probably not truncate tokens (at least not so much) since
   we're hashing now.  Some amount of truncation may still be helpful,
   though, so a 10fcv would be a good idea.

   Um, I don't recall anyone posting a 10fcv for the hashing.  Someone
   did do that, right?

2. Second, the thing you may be missing is Herk's idea to optionally
   include the original token as part of the value -- not the key.  In
   SQL, it would be a separate column.  In DBM, it would optionally
   appear at the end of the hashed token's value.

Daniel

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

Re: Limit Bayes Token Length

Reply via email to