Re: Workflow for adding new ham/spam to existing site-wide database?

Matus UHLAR - fantomas Wed, 17 Mar 2021 01:56:58 -0700

On 16.03.21 13:16, Steve Dondley wrote:

I have been accumulating spam/ham samples and sorting them out intodifferent directories on my server. As new spam/ham comes in, I throwit into the existing pile and then run "sa-learn --spam|--ham" on thewhole pile.
It dawned on me that this will get very slow as I eventually collecttens of thousand of emails. So I'm wondernig if it's better to:
1) Place all new, incoming spam/ham into empty directories
2) Run sa-learn only on these directories with small samples
3) Once done, move these new emails to an archive of spam/ham samples
4) Repeat

Is this typically how it's done?


I usually take care mostly about false positives, false negatives, nearly
false-negatives that don't hit BAYES_999 and phish.

that means, once you have your bayes well trained, ocasional retraining is
necessary, but on multiple places one false negative is enough to multiple
similar mail from BAYES_50 to BAYES_999

--
Matus UHLAR - fantomas, [email protected] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
- Holmes, what kind of school did you study to be a detective?
- Elementary, Watkins.  -- Daffy Duck & Porky Pig

Re: Workflow for adding new ham/spam to existing site-wide database?

Reply via email to