30.12.2012 22:19, Ned Slider kirjoitti:
> On 30/12/12 19:27, Jari Fredriksson wrote:
>> 30.12.2012 21:09, RW kirjoitti:
>>> On Sun, 30 Dec 2012 19:13:01 +0200
>>> Jari Fredriksson wrote:
>>>
>>>
>>>> Finally they are getting some Bayes too, and exterbal URIBL databases
>>>> are recognizing URIs in the payload. So I have now lowered the points
>>>> on my rule to 5.5. Also created a local anti-DNSWL_MED for mail
>>>> coming from redhat having this RCVD_IN_DNSWL_MED on.
>>> The list appears to be available at gmane.comp.java.jboss.user
>>>
>>> IIWY I'd look at how well Bayes is doing in the list, it may be that
>>> you can safely add a meta rule to boost the score for the higher
>>> scoring bayes rules in the list, and then add a few low-scoring
>>> subject rules for  "Vs" and some of the other common words.
>>>
>> I have not received ham from that list today. Bayes was very slow to
>> adapt, and now that it finally gets usually between 60-90% it begins to
>> work. But I am very afraid the ham from that list will get the same
>> points!
>>
>> The email is pure HTML, looks like a "page" from their discussion site,
>> and it has very much common in ham&  spam. Remains to be seen how well
>> bayes copes with it.
>>
>>
>>
>
> Had you in the past trained bayes with a large amount of ham from that
> list? I would imagine that would explain why you would then need to
> train many spam from the same source before you see any change in the
> behaviour of bayes. Most of the tokens are going to look the same to
> bayes as it's all from the same source and the content differs only
> slightly.
>
>
>
>

Yes, that is the case. I train all my mail with SA. And really trying to
look after the corpus, so that spam is not trained as ham.



-- 

Q:      What do you call a half-dozen Indians with Asian flu?
A:      Six sick Sikhs (sic).


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to