http://bugzilla.spamassassin.org/show_bug.cgi?id=2266
------- Additional Comments From [EMAIL PROTECTED] 2004-04-28 07:50 -------
Subject: Re: New features: Tokens in report, status of Bayesian classification.
On Wed, Apr 28, 2004 at 07:27:14AM -0700, [EMAIL PROTECTED] wrote:
>
> And so instead of
> my %tokens = map { substr(sha1($_), -5) => 1 } grep(length, @tokens);
> do
> my %tokens = map { substr(sha1($_), -5) => $_ } grep(length, @tokens);
>
> and then either return the hash table (requiring changes to callers of
> tokenize) or else store it somewhere. The code in scan can then use
> the hash table to retrieve the original text for each token to be
> displayed.
>
This is pretty much my thinking, except something like this:
my %tokens = map { substr(sha1($_), -5) => {'orig' => $_} } grep(length,
@tokens);
Then I can add on to that in scan with the counts, atime and prob.
Then use it later in building up the hammy/spammy arrays.
Michael
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.