Re: Quick question about training...

Reindl Harald Sun, 22 Feb 2015 15:23:45 -0800


Am 23.02.2015 um 00:11 schrieb RW:

On Fri, 20 Feb 2015 21:36:38 +0100
Reindl Harald wrote:

And I'd suggest the same for non-spam, train duplicative ham even
if it happens to be similarly addressed to different users. More
data is (nearly) always better for bayesian learning systems


of course


With the caveat that you keep an eye on retention.

of course, or you disable autoexpire and autolearning in case of hand-maintained bayes

i for myelf don't trust any automatism in that case because it leads easily in train false positives as well as false negatives or destroys the ham/spam balance in one or the other direction


been there, done that, the results can be both:

* spam detection becomes over the time unrelieable
* most mails, especially newsletters take spam direction

in doubt the amout of trained ham and spam should be near 50%,


This is myth. What's important is to have enough of each, the actual
ratio is not important.

true - but you don't have much to measure the "enough of each" and so try to keep 50/50 is a good starting point - hence i said "in doubt"


finally you get lest a problem in both cases:

* 1% ham samples, 99% spam samples
* 1% spam samples, 99% ham samples

they bayes occupies a trend

signature.asc
Description: OpenPGP digital signature

Re: Quick question about training...

Reply via email to