On Sun, May 02, 2004 at 05:39:14PM -0500, Michael Parker wrote:
> I'm contemplating limiting bayes tokens to 128 chars, in the tokenize
> method.  Anyone see a problem with that?

Am I missing something?

use constant MAX_TOKEN_LENGTH => 15;

... although, I don't see a substr() that actually limits it ...  :(

> Maybe 128 is too large in a theoretical worst-case attack (of someone
> turning on storage of original tokens).  32 or 64 might be better.

What's the issue exactly?  If we're hashing down to 5 bytes anyway,
who cares what size the input is?  The large length tokens aren't a
big deal unless huge mails start going around (who cares if we have a
handful of large tokens?)

Limiting the size would also cause us issue if we wanted to do multi-word
tokens.

-- 
Randomly Generated Tagline:
"I couldn't NT my way out of a wet paper bag." - Unknown at LISA '99

Attachment: pgpIHyMaheNwO.pgp
Description: PGP signature

Reply via email to