http://bugzilla.spamassassin.org/show_bug.cgi?id=2266
------- Additional Comments From [EMAIL PROTECTED] 2004-04-28 07:27 ------- Token hashing doesn't look like it will be too much of a problem if it is implemented using something like http://bugzilla.spamassassin.org/attachment.cgi?id=1917&action=view from bug 3225. In that patch hashing is done in Bayes::tokenize: # Go ahead and uniq the array, skip null tokens (can happen sometimes) - my %tokens = map { $_ => 1 } grep(length, @tokens); + # generate an SHA1 hash and take the lower 40 bits as our token + my %tokens = map { substr(sha1($_), -5) => 1 } grep(length, @tokens); # return the keys == tokens ... keys %tokens; And so instead of my %tokens = map { substr(sha1($_), -5) => 1 } grep(length, @tokens); do my %tokens = map { substr(sha1($_), -5) => $_ } grep(length, @tokens); and then either return the hash table (requiring changes to callers of tokenize) or else store it somewhere. The code in scan can then use the hash table to retrieve the original text for each token to be displayed. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
