On Wed, 14 Feb 2018 16:20:30 +0100
Matus UHLAR - fantomas wrote:
> >On Tue, 13 Feb 2018 21:02:46 +0000
> >Horváth Szabolcs wrote:
> >> One more question: is there a recommended ham to spam ratio? 1:1?
> On 14.02.18 15:09, RW wrote:
> >No, this is a myth. Bayes computes token probabilities from a
> >token's frequencies in spam and ham, so it all scales through. If
> >you have 2000 ham and 200 spam the problem is too few spams, not a
> >bad ratio.
> my experience says you will need more ham than spam, because you want
> to get rid of false positives (ham marked as spam) much more than of
> false negatives.
My point is that an imbalance doesn't create a bias.