Hi,
On Sun, Sep 25, 2016 at 6:18 PM, John Hardin <[email protected]> wrote:
> On Sun, 25 Sep 2016, Alex wrote:
>
>> On Sun, Sep 25, 2016 at 4:54 PM, Sean Greenslade
>> <[email protected]> wrote:
>>>
>>> On Sun, Sep 25, 2016 at 04:46:28PM -0400, Alex wrote:
>>>>
>>>>
>>>> I have another rule with a questionable score that's hitting too much
>>>> ham.
>>>>
>>>> From: "Customer Support" <[email protected]>
>>>> dbg: rules: ran header rule __FROM_WORDY ======> got hit:
>>>> "Customer.Support@"
>
> It is causing those hams to be incorrectly classified as spam?
Yes.
X-Spam-Status: Yes, score=6.008 tag=-200 tag2=5 kill=5 tests=[BAYES_50=0.8,
DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
FROM_WORDY=2.699, HTML_FONT_LOW_CONTRAST=0.001, HTML_MESSAGE=0.001,
LOTS_OF_MONEY=0.001, MIME_HTML_ONLY=0.723, NUMERIC_HTTP_ADDR=1.242,
RCVD_IN_DNSWL_NONE=-0.0001, RELAYCOUNTRY_US=0.01,
RP_MATCHES_RCVD=-0.5, SPF_PASS=-0.001, T_DMARC_TESTS_PASS=0.01,
URI_HEX=1.122] autolearn=disabled
> BAYES_50? Are you training ham? :)
Yes :-) Does this hit bayes00 for you?
# sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 35152 0 non-token data: nspam
0.000 0 21542 0 non-token data: nham
0.000 0 4600265 0 non-token data: ntokens
0.000 0 1324316802 0 non-token data: oldest atime
0.000 0 1474845999 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal sync atime
0.000 0 1474783813 0 non-token data: last expiry atime
0.000 0 0 0 non-token data: last expire atime delta
0.000 0 0 0 non-token data: last expire
reduction count
I recently deleted the database of 11M tokens, disabled autolearn, and
have been retraining it for quite a while now.