Reindl Harald [mailto:h.rei...@thelounge.net] wrote:
> > However, that doesn't happen.
> > 0.000 0 338770 0 non-token data: nspam
> > 0.000 0 1460807 0 non-token data: nham
> what do you expect when you train 4 times more ham than spam?
> frankly you "flooded" your bayes with 1.4 Mio ham-samples and i thought
> our 140k total corpus is large - don' forget that ham messages are
> typically larger than junk trying to point you with some words to a URL
> 108897 SPAM
> 31492 HAM
This is a production mail gateway serving since 2015. I saw that a few messages
(both hams and spams) automatically learned by amavisd/spamassassin. Today's
I think I have no control over what is learnt automatically.
Let's just assume for a moment that 1.4M ham-samples are valid.
Is there a ham:spam ratio I should stick to it? I presume if we have a 1:1
ratio then future messages won't be considered as spam as well.