On 16.03.21 13:16, Steve Dondley wrote:
I have been accumulating spam/ham samples and sorting them out into different directories on my server. As new spam/ham comes in, I throw it into the existing pile and then run "sa-learn --spam|--ham" on the whole pile.

It dawned on me that this will get very slow as I eventually collect tens of thousand of emails. So I'm wondernig if it's better to:

1) Place all new, incoming spam/ham into empty directories
2) Run sa-learn only on these directories with small samples
3) Once done, move these new emails to an archive of spam/ham samples
4) Repeat

Is this typically how it's done?

I usually take care mostly about false positives, false negatives, nearly
false-negatives that don't hit BAYES_999 and phish.

that means, once you have your bayes well trained, ocasional retraining is
necessary, but on multiple places one false negative is enough to multiple
similar mail from BAYES_50 to BAYES_999

--
Matus UHLAR - fantomas, [email protected] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
- Holmes, what kind of school did you study to be a detective?
- Elementary, Watkins.  -- Daffy Duck & Porky Pig

Reply via email to