On 09/09/2014 03:50 PM, Alex Regan wrote:
Hi,

Did you understand that all
tokens are learned, regardless whether they have been seen before?

That doesn't really matter from a user perspective, though, right? I
mean, if there are tokens that have already been learned are learned
again, the net result is zero.

Very much not zero. Each token has several values assocated with it:
  # ham
  # spam
  time-stamp

So each time it's learned its respective ham/spam counter is incremented
which indicates how spammy or hammy a given token is and its
time-stamp is
updated indicating how "fresh" a token is. The bayes expiry process
removes
"stale" tokens when it does its job to prune the database down to size.

Ah, yes, of course. I knew about that, but somehow didn't put it
together with this.

I would like to know why, after training similar messages a number of
times, it still shows the same bayes score on new similar messages.

I'd also like to figure out why or how many more times it's necessary
for a message to be re-trained to reflect the new desired persuasion.

I've had this particular FN with frequently a bayes50, sometimes lower,
that also have a few dozen every day that are tagged as spam properly,
but still have bayes50. I pull them out of the quarantine and keep
training them as spam, but there's still a few that get through every day.

Is there any particular analysis I can do on one of the FNs that can
tell me how far off the bayes50 is from becoming bayes99 in a similar
message?

Hopefully that's clear. I understand there's a large number of variables
involved here, and I would think the fewer number of tokens in a
message, the more difficult it probably should be to persuade, but it's
frustrating to see bayes50 so repeatedly...

you could add

report BAYES_HT _HAMMYTOKENS(50)_
report BAYES_ST _SPAMMYTOKENS(50)_

to your local.cf to add a header report & see what tokens are being seen


Reply via email to