Re: after months of training still most messages treated as SPAM

2015-01-26 Thread Benny Pedersen

On 26. jan. 2015 17.25.06 John Hardin jhar...@impsec.org wrote:


I don't quite understand what you're saying, can you unpack that a bit?


i have forgot now what the quesstion is and i belive you know what happends 
if using skip rbl check is 1


Re: after months of training still most messages treated as SPAM

2015-01-26 Thread Axb

On 01/26/2015 04:56 PM, John Hardin wrote:

OK, but: why does Bayes saying it looks as hammy as it looks spammy
score so much when network tests are disabled?


Highly un-scientific explanation:

Probably because history/experience/gut feeling/etc decided, in absence 
of network tests, that it could/should/will/maybe/etc add the extra 
little to help detect spam.





Re: after months of training still most messages treated as SPAM

2015-01-26 Thread John Hardin

On Mon, 26 Jan 2015, Matus UHLAR - fantomas wrote:

On Jan 23, 2015, at 6:55 AM, Wolf Drechsel drech...@verkehrsplanung.com 
wrote:
  2.0 BAYES_50   BODY: Spamwahrscheinlichkeit nach Bayes-Test: 
  40-60%


On 25.01.15 11:13, LuKreme wrote:

This is incorrect.

Bayes_50 should be scored at about 0.5, or lower.


score BAYES_50  0  0  2.00.8

that would indicate nwtwork rules are not used there (too bad)...


OK, but: why does Bayes saying it looks as hammy as it looks spammy 
score so much when network tests are disabled?


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The real opiate of the masses isn't religion; it's the belief that
  somewhere there is a benefit that can be delivered without a
  corresponding cost.   -- Tom of Radio Free NJ
---
 Tomorrow: the 48th anniversary of the loss of Apollo 1


Re: after months of training still most messages treated as SPAM

2015-01-26 Thread Benny Pedersen

On 26. jan. 2015 16.57.09 John Hardin jhar...@impsec.org wrote:


OK, but: why does Bayes saying it looks as hammy as it looks spammy
score so much when network tests are disabled?


dnswl is disabled, or missing training of ham, skip rbl check does not only 
disable blacklists


Re: after months of training still most messages treated as SPAM

2015-01-26 Thread John Hardin

On Mon, 26 Jan 2015, Benny Pedersen wrote:


On 26. jan. 2015 16.57.09 John Hardin jhar...@impsec.org wrote:


 OK, but: why does Bayes saying it looks as hammy as it looks spammy
 score so much when network tests are disabled?


dnswl is disabled, or missing training of ham, skip rbl check does not only 
disable blacklists


I don't quite understand what you're saying, can you unpack that a bit?

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 Tomorrow: Wolfgang Amadeus Mozart's 259th Birthday


Re: after months of training still most messages treated as SPAM

2015-01-26 Thread Reindl Harald


Am 26.01.2015 um 17:17 schrieb Benny Pedersen:

On 26. jan. 2015 16.57.09 John Hardin jhar...@impsec.org wrote:


OK, but: why does Bayes saying it looks as hammy as it looks spammy
score so much when network tests are disabled?


dnswl is disabled, or missing training of ham, skip rbl check does not
only disable blacklists


it does only disable DNSBL/DNSWL (while wheter SA nor Postscreen make 
any difference between both, the difference is just a positive or 
negative score)


in fact it even *does not* disable URIBL tests proven by a production 
submission server using SA where rbl_checks don't make any sense but 
URIBL is running and hitting


skip_rbl_checks 1
skip_uribl_checks 0



signature.asc
Description: OpenPGP digital signature


Re: after months of training still most messages treated as SPAM

2015-01-26 Thread John Hardin

On Mon, 26 Jan 2015, Benny Pedersen wrote:


On 26. jan. 2015 17.25.06 John Hardin jhar...@impsec.org wrote:


 I don't quite understand what you're saying, can you unpack that a bit?


i have forgot now what the quesstion is and i belive you know what happends 
if using skip rbl check is 1


I know why that scoreset is being chosen.

What I'm questioning is why so many points are being assigned to a neutral 
result.


Axb may have it right. I'm just surprised by a bias like that.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 Tomorrow: Wolfgang Amadeus Mozart's 259th Birthday


Re: after months of training still most messages treated as SPAM

2015-01-26 Thread Benny Pedersen

Matus UHLAR - fantomas skrev den 2015-01-26 09:41:


score BAYES_50  0  0  2.00.8
that would indicate nwtwork rules are not used there (too bad)...


why is it bad of missing train of ham ? :-)


Re: after months of training still most messages treated as SPAM

2015-01-26 Thread Reindl Harald


Am 26.01.2015 um 10:55 schrieb Benny Pedersen:

Matus UHLAR - fantomas skrev den 2015-01-26 09:41:


score BAYES_50  0  0  2.00.8
that would indicate nwtwork rules are not used there (too bad)...


why is it bad of missing train of ham ? :-)


WTF - it's bad if network tests are disabled - in general

the whole topic has nothing to do with training and bayes at all because 
they problem is somewhere else and no training can fix a broken setup




signature.asc
Description: OpenPGP digital signature


Re: after months of training still most messages treated as SPAM

2015-01-26 Thread Matus UHLAR - fantomas

On Jan 23, 2015, at 6:55 AM, Wolf Drechsel drech...@verkehrsplanung.com wrote:

2.0 BAYES_50   BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60%


On 25.01.15 11:13, LuKreme wrote:

This is incorrect.

Bayes_50 should be scored at about 0.5, or lower.


score BAYES_50  0  0  2.00.8

that would indicate nwtwork rules are not used there (too bad)...
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
(R)etry, (A)bort, (C)ancer


Re: after months of training still most messages treated as SPAM

2015-01-26 Thread Wolf Drechsel
Am Montag, 26. Januar 2015, 11:23:59 schrieb Reindl Harald:
 Am 26.01.2015 um 10:55 schrieb Benny Pedersen:
  Matus UHLAR - fantomas skrev den 2015-01-26 09:41:
  score BAYES_50  0  0  2.00.8
  that would indicate nwtwork rules are not used there (too bad)...
  
  why is it bad of missing train of ham ? :-)
 
 WTF - it's bad if network tests are disabled - in general

Will someone be friendly enough and explain to me how I can 
activate network tests?

Tks,
Wolf



Re: after months of training still most messages treated as SPAM

2015-01-26 Thread Axb

On 01/26/2015 12:05 PM, Wolf Drechsel wrote:

Am Montag, 26. Januar 2015, 11:23:59 schrieb Reindl Harald:

Am 26.01.2015 um 10:55 schrieb Benny Pedersen:

Matus UHLAR - fantomas skrev den 2015-01-26 09:41:

score BAYES_50  0  0  2.00.8
that would indicate nwtwork rules are not used there (too bad)...


why is it bad of missing train of ham ? :-)


WTF - it's bad if network tests are disabled - in general


Will someone be friendly enough and explain to me how I can
activate network tests?



in local.cf you probably have:

skip_rbl_checks 1

change to

skip_rbl_checks 0


h2h



Re: after months of training still most messages treated as SPAM

2015-01-26 Thread Axb

On 01/26/2015 12:11 PM, Axb wrote:

On 01/26/2015 12:05 PM, Wolf Drechsel wrote:

Am Montag, 26. Januar 2015, 11:23:59 schrieb Reindl Harald:

Am 26.01.2015 um 10:55 schrieb Benny Pedersen:

Matus UHLAR - fantomas skrev den 2015-01-26 09:41:

score BAYES_50  0  0  2.00.8
that would indicate nwtwork rules are not used there (too bad)...


why is it bad of missing train of ham ? :-)


WTF - it's bad if network tests are disabled - in general


Will someone be friendly enough and explain to me how I can
activate network tests?



in local.cf you probably have:

skip_rbl_checks 1

change to

skip_rbl_checks 0


or you're running spamd with

 -L

which

as per http://spamassassin.apache.org/full/3.4.x/doc/spamd.txt

 -L, --local   Use local tests only (no DNS)

and there's the FM
http://spamassassin.apache.org/full/3.4.x/doc/


Re: after months of training still most messages treated as SPAM

2015-01-25 Thread Reindl Harald


Am 25.01.2015 um 19:13 schrieb LuKreme:

On Jan 23, 2015, at 6:55 AM, Wolf Drechsel drech...@verkehrsplanung.com wrote:

2.0 BAYES_50   BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60%


This is incorrect.

Bayes_50 should be scored at about 0.5, or lower


depends on the envirnoment and quality of bayes data

but yes, in context of the subject it's too high, on the other hand 
after months of training if it is done right there should not be too 
much BAYES_50 hits not should the 2.0 points *alone* matter that much


/etc/mail/spamassassin/local-*.cf
 score BAYES_00 -3.5
 score BAYES_05 -1.5
 score BAYES_20 -0.5
 score BAYES_40 -0.2
 score BAYES_50 2.5
 score BAYES_60 3.0
 score BAYES_80 5.0
 score BAYES_95 6.5
 score BAYES_99 7.5
 score BAYES_999 0.4



signature.asc
Description: OpenPGP digital signature


Re: after months of training still most messages treated as SPAM

2015-01-25 Thread LuKreme
On Jan 23, 2015, at 6:55 AM, Wolf Drechsel drech...@verkehrsplanung.com wrote:
 2.0 BAYES_50   BODY: Spamwahrscheinlichkeit nach Bayes-Test: 
 40-60%

This is incorrect.

Bayes_50 should be scored at about 0.5, or lower.

-- 
Your stepmom is cute
Shut up, Ted
Remember when she was a senior and we were freshmen?
Shut up Ted!



Re: after months of training still most messages treated as SPAM

2015-01-25 Thread Reindl Harald


Am 25.01.2015 um 19:30 schrieb Reindl Harald:

Am 25.01.2015 um 19:13 schrieb LuKreme:

On Jan 23, 2015, at 6:55 AM, Wolf Drechsel
drech...@verkehrsplanung.com wrote:

2.0 BAYES_50   BODY: Spamwahrscheinlichkeit nach
Bayes-Test: 40-60%


This is incorrect.

Bayes_50 should be scored at about 0.5, or lower


depends on the envirnoment and quality of bayes data

but yes, in context of the subject it's too high, on the other hand
after months of training if it is done right there should not be too
much BAYES_50 hits not should the 2.0 points *alone* matter that much

/etc/mail/spamassassin/local-*.cf
  score BAYES_00 -3.5
  score BAYES_05 -1.5
  score BAYES_20 -0.5
  score BAYES_40 -0.2
  score BAYES_50 2.5
  score BAYES_60 3.0
  score BAYES_80 5.0
  score BAYES_95 6.5
  score BAYES_99 7.5
  score BAYES_999 0.4


to back that with data: the 6581 are just 14% of all messages made it to 
the content-scanner, BAYES_50 alone is not enough for most messages 
treated as SPAM, a large amount of the BAYES_50 messages are indeed 
junk and correctly rejected in combination with other tags


so the OP's *real problem* is what *other tags* besides Bayes hit the 
affected messages and not a wrong BAYES_50 with only 2.0 points



grep -c BAYES_00 maillog
33788

grep -c BAYES_05 maillog
655

grep -c BAYES_20 maillog
868

grep -c BAYES_40 maillog
983

grep -c BAYES_50 maillog
6581

grep -c BAYES_60 maillog
702

grep -c BAYES_80 maillog
532

grep -c BAYES_95 maillog
449

grep -c BAYES_99 maillog
2448

grep -c BAYES_999 maillog
2140


grep -c BAYES_ maillog
47006



signature.asc
Description: OpenPGP digital signature


Re: after months of training still most messages treated as SPAM

2015-01-24 Thread Wolf Drechsel
Hello,

thanks a lot for all of these answers! - 
I've to confess that I found a very stupid misconfiguration within kdepim's 
rules 
set - changing that resolved most of the issue.

Sorry I caused that effort - but finally I found a solution for my prob...

Am Freitag, 23. Januar 2015, 10:39:32 schrieb Kris Deugau:
 Looks like the OP doesn't have network tests enabled;  those scores
 match the current stock ones for set 2 (Bayes enabled, DNS tests
 disabled).  Enabling DNS tests would bring that back to 0.8 default (and
 RDNS_NONE to 0.8, and FORGED_YAHOO_RCVD to 1.6).

Looks like a good idea to activate network tests anyway. How can I do that?

 Have you tried using -D bayes to see what tokens are being learned
 incorrectly? Your score for BAYES_50 seems high for a message that gets a
 neutral result from Bayes.

Sorry - that hint was beyond my knowledge. How and where would I try -D bayes?

Thanks a lot for everything!
Wolf


Re: after months of training still most messages treated as SPAM

2015-01-23 Thread John Hardin

On Fri, 23 Jan 2015, John Hardin wrote:


On Fri, 23 Jan 2015, Wolf Drechsel wrote:


 Hi everybody,

 I googled and read a lot - but couldnt find any trick...
 After months of training still round 90% of all messages are treated as
 SPAM,
 allthough I'm marking all of them as HAM.



 2.0 BAYES_50   BODY: Spamwahrscheinlichkeit nach Bayes-Test:
 40-60%
[score: 0.4760]


BAYES_50 means insufficient data to classify.


Or, rather, I can't tell. Insufficient data would be no Bayes score at 
all.



Why does that earn 2.0 points in scoreset 3??


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Vista is at best mildly annoying and at worst makes you want to
  rush to Redmond, Wash. and rip somebody's liver out.  -- Forbes
---
 Today: John Moses Browning's 160th Birthday


Re: after months of training still most messages treated as SPAM

2015-01-23 Thread Reindl Harald


Am 23.01.2015 um 18:59 schrieb John Hardin:

On Fri, 23 Jan 2015, Wolf Drechsel wrote:

I googled and read a lot - but couldnt find any trick...
After months of training still round 90% of all messages are treated
as SPAM,
allthough I'm marking all of them as HAM.



2.0 BAYES_50   BODY: Spamwahrscheinlichkeit nach
Bayes-Test: 40-60%
   [score: 0.4760]


BAYES_50 means insufficient data to classify
Why does that earn 2.0 points in scoreset 3??


it means also no way to classify ever

a good bayes don't do that for many ham messages

we treat it with 2.5 points with a bayse of currently 8000 ham and 8000 
spam samples because only a few messages have BAYES_50, most are 
BAYES_00 with a score of -3.5 or BAYES_99 with a score of 7.5 at a 
milter-reject level of 8.0, i dont't recall a single FP based on that


otherwise most spam would not get rejected

95% of our messages with BAYES_50 reach the 8.0 points to get rejected 
by the milter and looking at the tags for a good reason




signature.asc
Description: OpenPGP digital signature


Re: after months of training still most messages treated as SPAM

2015-01-23 Thread John Hardin

On Fri, 23 Jan 2015, Wolf Drechsel wrote:


Hi everybody,

I googled and read a lot - but couldnt find any trick...
After months of training still round 90% of all messages are treated as SPAM,
allthough I'm marking all of them as HAM.



2.0 BAYES_50   BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60%
   [score: 0.4760]


BAYES_50 means insufficient data to classify.

Why does that earn 2.0 points in scoreset 3??

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Vista is at best mildly annoying and at worst makes you want to
  rush to Redmond, Wash. and rip somebody's liver out.  -- Forbes
---
 Today: John Moses Browning's 160th Birthday


after months of training still most messages treated as SPAM

2015-01-23 Thread Wolf Drechsel
Hi everybody,

I googled and read a lot - but couldnt find any trick...
After months of training still round 90% of all messages are treated as SPAM, 
allthough I'm marking all of them as HAM. 

My environment:
Ubuntu 14.04
kmail 4.14.2 in the kontact (kdepim) suite
SpamAssassin version 3.4.0
running on Perl version 5.18.2

I tried this installation/config procedure:
http://www.spamtips.org/p/install-procedure.html

but nothing changed. 

Here is one example:

 2.6 FORGED_YAHOO_RCVD  Gefälschte Received-Kopfzeile von yahoo.com
gefunden
 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in
digit (sender_address[at]yahoo.com)
 0.2 FREEMAIL_REPLYTO_END_DIGIT Reply-To freemail username ends in digit
(sender_address)
 0.0 FREEMAIL_FROM  Sender email is commonly abused enduser mail 
provider
(sender_address)
 2.0 BAYES_50   BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60%
[score: 0.4760]
 0.0 HTML_MESSAGE   BODY: Nachricht enthält HTML
 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid
 1.2 RDNS_NONE  Delivered to internal network by a host with no rDNS
 0.0 T_REMOTE_IMAGE Message contains an external image

But not all of the messages do have that detailed report, some are just put 
into 
the SPAM folder.

Any hints will be appreciated!
Wolf


Re: after months of training still most messages treated as SPAM

2015-01-23 Thread Joe Quinn
To start, there are several very real things wrong with your example 
message. In my opinion, that message was correctly classified.


Do you have any better-representative samples that you can paste in 
full? (http://pastebin.com/)


Have you tried using -D bayes to see what tokens are being learned 
incorrectly? Your score for BAYES_50 seems high for a message that gets 
a neutral result from Bayes.


On 1/23/2015 8:55 AM, Wolf Drechsel wrote:


Hi everybody,

I googled and read a lot - but couldnt find any trick...

After months of training still round 90% of all messages are treated 
as SPAM, allthough I'm marking all of them as HAM.


My environment:

Ubuntu 14.04

kmail 4.14.2 in the kontact (kdepim) suite

SpamAssassin version 3.4.0

running on Perl version 5.18.2

I tried this installation/config procedure:

http://www.spamtips.org/p/install-procedure.html

but nothing changed.

Here is one example:

2.6 FORGED_YAHOO_RCVD Gefälschte Received-Kopfzeile von yahoo.com

gefunden

0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in

digit (sender_address[at]yahoo.com)

0.2 FREEMAIL_REPLYTO_END_DIGIT Reply-To freemail username ends in digit

(sender_address)

0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider

(sender_address)

2.0 BAYES_50 BODY: Spamwahrscheinlichkeit nach Bayes-Test: 40-60%

[score: 0.4760]

0.0 HTML_MESSAGE BODY: Nachricht enthält HTML

0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid

1.2 RDNS_NONE Delivered to internal network by a host with no rDNS

0.0 T_REMOTE_IMAGE Message contains an external image

But not all of the messages do have that detailed report, some are 
just put into the SPAM folder.


Any hints will be appreciated!

Wolf





Re: after months of training still most messages treated as SPAM

2015-01-23 Thread Kris Deugau
Joe Quinn wrote:
 To start, there are several very real things wrong with your example
 message. In my opinion, that message was correctly classified.

Maybe, maybe not - without the actual message there's no more
information.  I've seen all too much legitimate mail hit some very
strange combinations of rules...

If the OP's mail server doesn't add rDNS for the connecting IP, or
doesn't add it in a way that SA recognizes, that would trigger
RDNS_NONE, and cause FORGED_YAHOO_RCVD.

 Do you have any better-representative samples that you can paste in
 full? (http://pastebin.com/)
 
 Have you tried using -D bayes to see what tokens are being learned
 incorrectly? Your score for BAYES_50 seems high for a message that gets
 a neutral result from Bayes.

Looks like the OP doesn't have network tests enabled;  those scores
match the current stock ones for set 2 (Bayes enabled, DNS tests
disabled).  Enabling DNS tests would bring that back to 0.8 default (and
RDNS_NONE to 0.8, and FORGED_YAHOO_RCVD to 1.6).

-kgd