30.12.2012 22:19, Ned Slider kirjoitti: > On 30/12/12 19:27, Jari Fredriksson wrote: >> 30.12.2012 21:09, RW kirjoitti: >>> On Sun, 30 Dec 2012 19:13:01 +0200 >>> Jari Fredriksson wrote: >>> >>> >>>> Finally they are getting some Bayes too, and exterbal URIBL databases >>>> are recognizing URIs in the payload. So I have now lowered the points >>>> on my rule to 5.5. Also created a local anti-DNSWL_MED for mail >>>> coming from redhat having this RCVD_IN_DNSWL_MED on. >>> The list appears to be available at gmane.comp.java.jboss.user >>> >>> IIWY I'd look at how well Bayes is doing in the list, it may be that >>> you can safely add a meta rule to boost the score for the higher >>> scoring bayes rules in the list, and then add a few low-scoring >>> subject rules for "Vs" and some of the other common words. >>> >> I have not received ham from that list today. Bayes was very slow to >> adapt, and now that it finally gets usually between 60-90% it begins to >> work. But I am very afraid the ham from that list will get the same >> points! >> >> The email is pure HTML, looks like a "page" from their discussion site, >> and it has very much common in ham& spam. Remains to be seen how well >> bayes copes with it. >> >> >> > > Had you in the past trained bayes with a large amount of ham from that > list? I would imagine that would explain why you would then need to > train many spam from the same source before you see any change in the > behaviour of bayes. Most of the tokens are going to look the same to > bayes as it's all from the same source and the content differs only > slightly. > > > >
Yes, that is the case. I train all my mail with SA. And really trying to look after the corpus, so that spam is not trained as ham. -- Q: What do you call a half-dozen Indians with Asian flu? A: Six sick Sikhs (sic).
signature.asc
Description: OpenPGP digital signature