Re: after months of training still most messages treated as SPAM
On 26. jan. 2015 17.25.06 John Hardin jhar...@impsec.org wrote: I don't quite understand what you're saying, can you unpack that a bit? i have forgot now what the quesstion is and i belive you know what happends if using skip rbl check is 1
Re: after months of training still most messages treated as SPAM
On 01/26/2015 04:56 PM, John Hardin wrote: OK, but: why does Bayes saying it looks as hammy as it looks spammy score so much when network tests are disabled? Highly un-scientific explanation: Probably because history/experience/gut feeling/etc decided, in absence of network tests, that it could/should/will/maybe/etc add the extra little to help detect spam.
Re: after months of training still most messages treated as SPAM
On Mon, 26 Jan 2015, Matus UHLAR - fantomas wrote: On Jan 23, 2015, at 6:55 AM, Wolf Drechsel drech...@verkehrsplanung.com wrote: 2.0 BAYES_50 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60% On 25.01.15 11:13, LuKreme wrote: This is incorrect. Bayes_50 should be scored at about 0.5, or lower. score BAYES_50 0 0 2.00.8 that would indicate nwtwork rules are not used there (too bad)... OK, but: why does Bayes saying it looks as hammy as it looks spammy score so much when network tests are disabled? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The real opiate of the masses isn't religion; it's the belief that somewhere there is a benefit that can be delivered without a corresponding cost. -- Tom of Radio Free NJ --- Tomorrow: the 48th anniversary of the loss of Apollo 1
Re: after months of training still most messages treated as SPAM
On 26. jan. 2015 16.57.09 John Hardin jhar...@impsec.org wrote: OK, but: why does Bayes saying it looks as hammy as it looks spammy score so much when network tests are disabled? dnswl is disabled, or missing training of ham, skip rbl check does not only disable blacklists
Re: after months of training still most messages treated as SPAM
On Mon, 26 Jan 2015, Benny Pedersen wrote: On 26. jan. 2015 16.57.09 John Hardin jhar...@impsec.org wrote: OK, but: why does Bayes saying it looks as hammy as it looks spammy score so much when network tests are disabled? dnswl is disabled, or missing training of ham, skip rbl check does not only disable blacklists I don't quite understand what you're saying, can you unpack that a bit? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Tomorrow: Wolfgang Amadeus Mozart's 259th Birthday
Re: after months of training still most messages treated as SPAM
Am 26.01.2015 um 17:17 schrieb Benny Pedersen: On 26. jan. 2015 16.57.09 John Hardin jhar...@impsec.org wrote: OK, but: why does Bayes saying it looks as hammy as it looks spammy score so much when network tests are disabled? dnswl is disabled, or missing training of ham, skip rbl check does not only disable blacklists it does only disable DNSBL/DNSWL (while wheter SA nor Postscreen make any difference between both, the difference is just a positive or negative score) in fact it even *does not* disable URIBL tests proven by a production submission server using SA where rbl_checks don't make any sense but URIBL is running and hitting skip_rbl_checks 1 skip_uribl_checks 0 signature.asc Description: OpenPGP digital signature
Re: after months of training still most messages treated as SPAM
On Mon, 26 Jan 2015, Benny Pedersen wrote: On 26. jan. 2015 17.25.06 John Hardin jhar...@impsec.org wrote: I don't quite understand what you're saying, can you unpack that a bit? i have forgot now what the quesstion is and i belive you know what happends if using skip rbl check is 1 I know why that scoreset is being chosen. What I'm questioning is why so many points are being assigned to a neutral result. Axb may have it right. I'm just surprised by a bias like that. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Tomorrow: Wolfgang Amadeus Mozart's 259th Birthday
Re: after months of training still most messages treated as SPAM
Matus UHLAR - fantomas skrev den 2015-01-26 09:41: score BAYES_50 0 0 2.00.8 that would indicate nwtwork rules are not used there (too bad)... why is it bad of missing train of ham ? :-)
Re: after months of training still most messages treated as SPAM
Am 26.01.2015 um 10:55 schrieb Benny Pedersen: Matus UHLAR - fantomas skrev den 2015-01-26 09:41: score BAYES_50 0 0 2.00.8 that would indicate nwtwork rules are not used there (too bad)... why is it bad of missing train of ham ? :-) WTF - it's bad if network tests are disabled - in general the whole topic has nothing to do with training and bayes at all because they problem is somewhere else and no training can fix a broken setup signature.asc Description: OpenPGP digital signature
Re: after months of training still most messages treated as SPAM
On Jan 23, 2015, at 6:55 AM, Wolf Drechsel drech...@verkehrsplanung.com wrote: 2.0 BAYES_50 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60% On 25.01.15 11:13, LuKreme wrote: This is incorrect. Bayes_50 should be scored at about 0.5, or lower. score BAYES_50 0 0 2.00.8 that would indicate nwtwork rules are not used there (too bad)... -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. (R)etry, (A)bort, (C)ancer
Re: after months of training still most messages treated as SPAM
Am Montag, 26. Januar 2015, 11:23:59 schrieb Reindl Harald: Am 26.01.2015 um 10:55 schrieb Benny Pedersen: Matus UHLAR - fantomas skrev den 2015-01-26 09:41: score BAYES_50 0 0 2.00.8 that would indicate nwtwork rules are not used there (too bad)... why is it bad of missing train of ham ? :-) WTF - it's bad if network tests are disabled - in general Will someone be friendly enough and explain to me how I can activate network tests? Tks, Wolf
Re: after months of training still most messages treated as SPAM
On 01/26/2015 12:05 PM, Wolf Drechsel wrote: Am Montag, 26. Januar 2015, 11:23:59 schrieb Reindl Harald: Am 26.01.2015 um 10:55 schrieb Benny Pedersen: Matus UHLAR - fantomas skrev den 2015-01-26 09:41: score BAYES_50 0 0 2.00.8 that would indicate nwtwork rules are not used there (too bad)... why is it bad of missing train of ham ? :-) WTF - it's bad if network tests are disabled - in general Will someone be friendly enough and explain to me how I can activate network tests? in local.cf you probably have: skip_rbl_checks 1 change to skip_rbl_checks 0 h2h
Re: after months of training still most messages treated as SPAM
On 01/26/2015 12:11 PM, Axb wrote: On 01/26/2015 12:05 PM, Wolf Drechsel wrote: Am Montag, 26. Januar 2015, 11:23:59 schrieb Reindl Harald: Am 26.01.2015 um 10:55 schrieb Benny Pedersen: Matus UHLAR - fantomas skrev den 2015-01-26 09:41: score BAYES_50 0 0 2.00.8 that would indicate nwtwork rules are not used there (too bad)... why is it bad of missing train of ham ? :-) WTF - it's bad if network tests are disabled - in general Will someone be friendly enough and explain to me how I can activate network tests? in local.cf you probably have: skip_rbl_checks 1 change to skip_rbl_checks 0 or you're running spamd with -L which as per http://spamassassin.apache.org/full/3.4.x/doc/spamd.txt -L, --local Use local tests only (no DNS) and there's the FM http://spamassassin.apache.org/full/3.4.x/doc/
Re: after months of training still most messages treated as SPAM
Am 25.01.2015 um 19:13 schrieb LuKreme: On Jan 23, 2015, at 6:55 AM, Wolf Drechsel drech...@verkehrsplanung.com wrote: 2.0 BAYES_50 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60% This is incorrect. Bayes_50 should be scored at about 0.5, or lower depends on the envirnoment and quality of bayes data but yes, in context of the subject it's too high, on the other hand after months of training if it is done right there should not be too much BAYES_50 hits not should the 2.0 points *alone* matter that much /etc/mail/spamassassin/local-*.cf score BAYES_00 -3.5 score BAYES_05 -1.5 score BAYES_20 -0.5 score BAYES_40 -0.2 score BAYES_50 2.5 score BAYES_60 3.0 score BAYES_80 5.0 score BAYES_95 6.5 score BAYES_99 7.5 score BAYES_999 0.4 signature.asc Description: OpenPGP digital signature
Re: after months of training still most messages treated as SPAM
On Jan 23, 2015, at 6:55 AM, Wolf Drechsel drech...@verkehrsplanung.com wrote: 2.0 BAYES_50 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60% This is incorrect. Bayes_50 should be scored at about 0.5, or lower. -- Your stepmom is cute Shut up, Ted Remember when she was a senior and we were freshmen? Shut up Ted!
Re: after months of training still most messages treated as SPAM
Am 25.01.2015 um 19:30 schrieb Reindl Harald: Am 25.01.2015 um 19:13 schrieb LuKreme: On Jan 23, 2015, at 6:55 AM, Wolf Drechsel drech...@verkehrsplanung.com wrote: 2.0 BAYES_50 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60% This is incorrect. Bayes_50 should be scored at about 0.5, or lower depends on the envirnoment and quality of bayes data but yes, in context of the subject it's too high, on the other hand after months of training if it is done right there should not be too much BAYES_50 hits not should the 2.0 points *alone* matter that much /etc/mail/spamassassin/local-*.cf score BAYES_00 -3.5 score BAYES_05 -1.5 score BAYES_20 -0.5 score BAYES_40 -0.2 score BAYES_50 2.5 score BAYES_60 3.0 score BAYES_80 5.0 score BAYES_95 6.5 score BAYES_99 7.5 score BAYES_999 0.4 to back that with data: the 6581 are just 14% of all messages made it to the content-scanner, BAYES_50 alone is not enough for most messages treated as SPAM, a large amount of the BAYES_50 messages are indeed junk and correctly rejected in combination with other tags so the OP's *real problem* is what *other tags* besides Bayes hit the affected messages and not a wrong BAYES_50 with only 2.0 points grep -c BAYES_00 maillog 33788 grep -c BAYES_05 maillog 655 grep -c BAYES_20 maillog 868 grep -c BAYES_40 maillog 983 grep -c BAYES_50 maillog 6581 grep -c BAYES_60 maillog 702 grep -c BAYES_80 maillog 532 grep -c BAYES_95 maillog 449 grep -c BAYES_99 maillog 2448 grep -c BAYES_999 maillog 2140 grep -c BAYES_ maillog 47006 signature.asc Description: OpenPGP digital signature
Re: after months of training still most messages treated as SPAM
Hello, thanks a lot for all of these answers! - I've to confess that I found a very stupid misconfiguration within kdepim's rules set - changing that resolved most of the issue. Sorry I caused that effort - but finally I found a solution for my prob... Am Freitag, 23. Januar 2015, 10:39:32 schrieb Kris Deugau: Looks like the OP doesn't have network tests enabled; those scores match the current stock ones for set 2 (Bayes enabled, DNS tests disabled). Enabling DNS tests would bring that back to 0.8 default (and RDNS_NONE to 0.8, and FORGED_YAHOO_RCVD to 1.6). Looks like a good idea to activate network tests anyway. How can I do that? Have you tried using -D bayes to see what tokens are being learned incorrectly? Your score for BAYES_50 seems high for a message that gets a neutral result from Bayes. Sorry - that hint was beyond my knowledge. How and where would I try -D bayes? Thanks a lot for everything! Wolf
Re: after months of training still most messages treated as SPAM
On Fri, 23 Jan 2015, John Hardin wrote: On Fri, 23 Jan 2015, Wolf Drechsel wrote: Hi everybody, I googled and read a lot - but couldnt find any trick... After months of training still round 90% of all messages are treated as SPAM, allthough I'm marking all of them as HAM. 2.0 BAYES_50 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60% [score: 0.4760] BAYES_50 means insufficient data to classify. Or, rather, I can't tell. Insufficient data would be no Bayes score at all. Why does that earn 2.0 points in scoreset 3?? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Vista is at best mildly annoying and at worst makes you want to rush to Redmond, Wash. and rip somebody's liver out. -- Forbes --- Today: John Moses Browning's 160th Birthday
Re: after months of training still most messages treated as SPAM
Am 23.01.2015 um 18:59 schrieb John Hardin: On Fri, 23 Jan 2015, Wolf Drechsel wrote: I googled and read a lot - but couldnt find any trick... After months of training still round 90% of all messages are treated as SPAM, allthough I'm marking all of them as HAM. 2.0 BAYES_50 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60% [score: 0.4760] BAYES_50 means insufficient data to classify Why does that earn 2.0 points in scoreset 3?? it means also no way to classify ever a good bayes don't do that for many ham messages we treat it with 2.5 points with a bayse of currently 8000 ham and 8000 spam samples because only a few messages have BAYES_50, most are BAYES_00 with a score of -3.5 or BAYES_99 with a score of 7.5 at a milter-reject level of 8.0, i dont't recall a single FP based on that otherwise most spam would not get rejected 95% of our messages with BAYES_50 reach the 8.0 points to get rejected by the milter and looking at the tags for a good reason signature.asc Description: OpenPGP digital signature
Re: after months of training still most messages treated as SPAM
On Fri, 23 Jan 2015, Wolf Drechsel wrote: Hi everybody, I googled and read a lot - but couldnt find any trick... After months of training still round 90% of all messages are treated as SPAM, allthough I'm marking all of them as HAM. 2.0 BAYES_50 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60% [score: 0.4760] BAYES_50 means insufficient data to classify. Why does that earn 2.0 points in scoreset 3?? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Vista is at best mildly annoying and at worst makes you want to rush to Redmond, Wash. and rip somebody's liver out. -- Forbes --- Today: John Moses Browning's 160th Birthday
after months of training still most messages treated as SPAM
Hi everybody, I googled and read a lot - but couldnt find any trick... After months of training still round 90% of all messages are treated as SPAM, allthough I'm marking all of them as HAM. My environment: Ubuntu 14.04 kmail 4.14.2 in the kontact (kdepim) suite SpamAssassin version 3.4.0 running on Perl version 5.18.2 I tried this installation/config procedure: http://www.spamtips.org/p/install-procedure.html but nothing changed. Here is one example: 2.6 FORGED_YAHOO_RCVD Gefälschte Received-Kopfzeile von yahoo.com gefunden 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit (sender_address[at]yahoo.com) 0.2 FREEMAIL_REPLYTO_END_DIGIT Reply-To freemail username ends in digit (sender_address) 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (sender_address) 2.0 BAYES_50 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60% [score: 0.4760] 0.0 HTML_MESSAGE BODY: Nachricht enthält HTML 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid 1.2 RDNS_NONE Delivered to internal network by a host with no rDNS 0.0 T_REMOTE_IMAGE Message contains an external image But not all of the messages do have that detailed report, some are just put into the SPAM folder. Any hints will be appreciated! Wolf
Re: after months of training still most messages treated as SPAM
To start, there are several very real things wrong with your example message. In my opinion, that message was correctly classified. Do you have any better-representative samples that you can paste in full? (http://pastebin.com/) Have you tried using -D bayes to see what tokens are being learned incorrectly? Your score for BAYES_50 seems high for a message that gets a neutral result from Bayes. On 1/23/2015 8:55 AM, Wolf Drechsel wrote: Hi everybody, I googled and read a lot - but couldnt find any trick... After months of training still round 90% of all messages are treated as SPAM, allthough I'm marking all of them as HAM. My environment: Ubuntu 14.04 kmail 4.14.2 in the kontact (kdepim) suite SpamAssassin version 3.4.0 running on Perl version 5.18.2 I tried this installation/config procedure: http://www.spamtips.org/p/install-procedure.html but nothing changed. Here is one example: 2.6 FORGED_YAHOO_RCVD Gefälschte Received-Kopfzeile von yahoo.com gefunden 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit (sender_address[at]yahoo.com) 0.2 FREEMAIL_REPLYTO_END_DIGIT Reply-To freemail username ends in digit (sender_address) 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (sender_address) 2.0 BAYES_50 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60% [score: 0.4760] 0.0 HTML_MESSAGE BODY: Nachricht enthält HTML 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid 1.2 RDNS_NONE Delivered to internal network by a host with no rDNS 0.0 T_REMOTE_IMAGE Message contains an external image But not all of the messages do have that detailed report, some are just put into the SPAM folder. Any hints will be appreciated! Wolf
Re: after months of training still most messages treated as SPAM
Joe Quinn wrote: To start, there are several very real things wrong with your example message. In my opinion, that message was correctly classified. Maybe, maybe not - without the actual message there's no more information. I've seen all too much legitimate mail hit some very strange combinations of rules... If the OP's mail server doesn't add rDNS for the connecting IP, or doesn't add it in a way that SA recognizes, that would trigger RDNS_NONE, and cause FORGED_YAHOO_RCVD. Do you have any better-representative samples that you can paste in full? (http://pastebin.com/) Have you tried using -D bayes to see what tokens are being learned incorrectly? Your score for BAYES_50 seems high for a message that gets a neutral result from Bayes. Looks like the OP doesn't have network tests enabled; those scores match the current stock ones for set 2 (Bayes enabled, DNS tests disabled). Enabling DNS tests would bring that back to 0.8 default (and RDNS_NONE to 0.8, and FORGED_YAHOO_RCVD to 1.6). -kgd