subject:"Bayes filter marking everything as ham"

Re: Bayes filter marking everything as ham

2016-06-02 Thread John Hardin


On Thu, 2 Jun 2016, John Hardin wrote:


On Thu, 2 Jun 2016, Antony Stone wrote:


 On Thursday 02 June 2016 at 13:16:57, Martin Gregorie wrote:

>  On Thu, 2016-06-02 at 12:28 +0200, Matus UHLAR - fantomas wrote:
> > >  Therefore I agree that there could be better way of noticing admins
> > >  of a [URIBL_BLOCKED] issue.
> 
>  create and install a logwatch service that scans /var/log/maillog

>  for lines containing "URIBL_BLOCKED" - this involves a two line config
>  file and a scanner (a few lines of Perl).

 The problem I see with this, though, is that you have to know that
 URIBL_BLOCKED is something sinister, and needs to be flagged as a problem,
 to
 bother doing this.


You get that if URIBL_BLOCKED hits on a ham and you look at the rule 
descriptions on that message.


Dammit...  hits on a **SPAM** and you look at the rule descriptions.

(well, if you're including the descriptions for ham as well my original 
comment would be correct too...)


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  People think they're trading chaos for order [by ceding more and
  more power to the Government], but they're just trading normal
  human evil for the really dangerous organized kind of evil, the
  kind that simply does not give a shit. Only bureaucrats can give
  you true evil. -- Larry Correia
---
 4 days until the 72nd anniversary of D-Day

Re: Bayes filter marking everything as ham

2016-06-02 Thread Reindl Harald




Am 02.06.2016 um 17:32 schrieb John Hardin:

On Thu, 2 Jun 2016, Antony Stone wrote:


On Thursday 02 June 2016 at 13:16:57, Martin Gregorie wrote:


On Thu, 2016-06-02 at 12:28 +0200, Matus UHLAR - fantomas wrote:

Therefore I agree that there could be better way of noticing admins
of a [URIBL_BLOCKED] issue.


create and install a logwatch service that scans /var/log/maillog
for lines containing "URIBL_BLOCKED" - this involves a two line config
file and a scanner (a few lines of Perl).


The problem I see with this, though, is that you have to know that
URIBL_BLOCKED is something sinister, and needs to be flagged as a
problem, to
bother doing this.


You get that if URIBL_BLOCKED hits on a ham and you look at the rule
descriptions on that message


well, if people would look and doing a clean work they would configure 
their machines proper long before connect them to the internet :-)


sadly a large amount of servers is managed by people who don't look and 
care and a large amount of inbound spam is caused by that fact


mailadmin is a fulltimejob with responsibility, but explain them 
somebody who can type "yum install spamassassin postfix" and from the 
moment on it accepts somehow mail he starts to call himself mailadmin




signature.asc
Description: OpenPGP digital signature

Re: Bayes filter marking everything as ham

2016-06-02 Thread John Hardin


On Thu, 2 Jun 2016, Antony Stone wrote:


On Thursday 02 June 2016 at 13:16:57, Martin Gregorie wrote:


On Thu, 2016-06-02 at 12:28 +0200, Matus UHLAR - fantomas wrote:

Therefore I agree that there could be better way of noticing admins
of a [URIBL_BLOCKED] issue.


create and install a logwatch service that scans /var/log/maillog
for lines containing "URIBL_BLOCKED" - this involves a two line config
file and a scanner (a few lines of Perl).


The problem I see with this, though, is that you have to know that
URIBL_BLOCKED is something sinister, and needs to be flagged as a problem, to
bother doing this.


You get that if URIBL_BLOCKED hits on a ham and you look at the rule 
descriptions on that message.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  People think they're trading chaos for order [by ceding more and
  more power to the Government], but they're just trading normal
  human evil for the really dangerous organized kind of evil, the
  kind that simply does not give a shit. Only bureaucrats can give
  you true evil. -- Larry Correia
---
 4 days until the 72nd anniversary of D-Day

Re: Bayes filter marking everything as ham

2016-06-02 Thread Kris Deugau

Antony Stone wrote:
> On Thursday 02 June 2016 at 15:12:58, Reindl Harald wrote:

>> it's highly unlike in a proper setup that SA faces enough email to hit
>> the URIBL limit
> 
> Once again, as you said yourself, "highly unlikely".  That does not mean 
> "impossible".

Indeed.  We received notice from Spamhaus, uribl.com, and surbl.com as
our customer base grew enough for us to start making enough queries to
cross the "free usage" threshold.  We have our own cache servers;  we do
not use someone else's upstream cache.

-kgd

Re: Bayes filter marking everything as ham

2016-06-02 Thread Antony Stone

On Thursday 02 June 2016 at 15:12:58, Reindl Harald wrote:

> Am 02.06.2016 um 15:07 schrieb Matus UHLAR - fantomas:
> > On 02.06.16 14:48, Reindl Harald wrote:
> >> that typically happens only when one is using a forwarding resolver
> >> get it finally

As you said yourself, "typically".  That's not "exclusively".

> > you did not get it:
> > 
> > there are cases where it's not caused by forwarding DNS but by getting
> > much mail.
> > 
> > just for sure:
> > 
> > there are cases where it's not caused by forwarding DNS but by getting
> > much mail.
> > 
> > got it finally?
> 
> i got more than you can imagine

Please could you two take this unproductive posturing offlist?

I'm sure everyone following this thread has now understood that URIBL_BLOCKED 
means you should check your DNS setup, and there have been several good 
suggestions about pointing people at the good documentation on how to avoid 
this, when the problem occurs.

I don't think we need any further "I know what I'm talking about and you 
don't" emails, thank you both.

> it's highly unlike in a proper setup that SA faces enough email to hit
> the URIBL limit

Once again, as you said yourself, "highly unlikely".  That does not mean 
"impossible".

Thanks,

Antony (not a list moderator, just a subscriber who prefers to see useful and 
productive emails on lists and in the archives afterwards).

-- 
All matter in the Universe can be placed into one of two categories:

1. Things which need to be fixed.
2. Things which need to be fixed once you've had a few minutes to play with 
them.

Re: Bayes filter marking everything as ham

2016-06-02 Thread Reindl Harald




Am 02.06.2016 um 15:07 schrieb Matus UHLAR - fantomas:

On 02.06.16 14:48, Reindl Harald wrote:

that typically happens only when one is using a forwarding resolver
get it finally


you did not get it:

there are cases where it's not caused by forwarding DNS but by getting much
mail.

just for sure:

there are cases where it's not caused by forwarding DNS but by getting much
mail.

got it finally?


i got more than you can imagine

it's highly unlike in a proper setup that SA faces enough email to hit 
the URIBL limit with a recursion nameserver since you very unikely can 
proceed that many mail in the contentfilter and so if you have that much 
spam-attempts for sure reject a large amount long before in the smtpd


28.05-31.05: 90 spam attempts, SA faced (including ham) 5 in 
that timeframe which makes around 17000 mails per day - how would you 
get that mach URIBL requests?




signature.asc
Description: OpenPGP digital signature

Re: Bayes filter marking everything as ham

2016-06-02 Thread Matus UHLAR - fantomas


On 02.06.16 12:35, Reindl Harald wrote:

the setup IS CRIPPELED in it's function as long URIBL/DNSBL/DNSWL are
not working - period



Am 02.06.2016 um 14:42 schrieb Matus UHLAR - fantomas:

hitting the others' limits does not mean that the setup is crippled.
get it finally. period


On 02.06.16 14:48, Reindl Harald wrote:

that typically happens only when one is using a forwarding resolver
get it finally


you did not get it:

there are cases where it's not caused by forwarding DNS but by getting much
mail.

just for sure:

there are cases where it's not caused by forwarding DNS but by getting much
mail.

got it finally?

and even if not: without URIBL/DNSBL proper working SA don't work 
well at all, and so it's crippeled - get it finally


URIBL is just one of blacklists, there are many others.

having troubles with one of rules is not reason to block mail, calling set
up crippled etc.


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
10 GOTO 10 : REM (C) Bill Gates 1998, All Rights Reserved!

Re: Bayes filter marking everything as ham

2016-06-02 Thread Reindl Harald




Am 02.06.2016 um 14:42 schrieb Matus UHLAR - fantomas:

On 02.06.16 12:35, Reindl Harald wrote:

the setup IS CRIPPELED in it's function as long URIBL/DNSBL/DNSWL are
not working - period


hitting the others' limits does not mean that the setup is crippled.
get it finally. period


that typically happens only when one is using a forwarding resolver
get it finally

and even if not: without URIBL/DNSBL proper working SA don't work well 
at all, and so it's crippeled - get it finally




signature.asc
Description: OpenPGP digital signature

Re: Bayes filter marking everything as ham

2016-06-02 Thread Matus UHLAR - fantomas


On 02.06.16 12:35, Reindl Harald wrote:
the setup IS CRIPPELED in it's function as long URIBL/DNSBL/DNSWL are 
not working - period


hitting the others' limits does not mean that the setup is crippled.
get it finally. period.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"To Boot or not to Boot, that's the question." [WD1270 Caviar]

Re: Bayes filter marking everything as ham

2016-06-02 Thread Martin Gregorie

On Thu, 2016-06-02 at 13:22 +0200, Antony Stone wrote:
> On Thursday 02 June 2016 at 13:16:57, Martin Gregorie wrote:
> 
> > On Thu, 2016-06-02 at 12:28 +0200, Matus UHLAR - fantomas wrote:
> > > > Therefore I agree that there could be better way of noticing
> admins
> > > > of a [URIBL_BLOCKED] issue.
> > 
> > create and install a logwatch service that scans /var/log/maillog
> > for lines containing "URIBL_BLOCKED" - this involves a two line
> config
> > file and a scanner (a few lines of Perl).
> 
> The problem I see with this, though, is that you have to know that 
> URIBL_BLOCKED is something sinister, and needs to be flagged as a
> problem, to bother doing this.
> 
Au contraire: the reason I suggested this implementing this approach
and including the logwatch service in the standard SA package is that
it gives a method of using an 'in your face' way of telling the
sysadmin he has a problem: the logwatch display might say something
like this:

*
*
*** Spamassassin reported URIBL_BLOCKED 256 times *** 
*
**
***
This happened because URIBL received
enough queries
from the DNS you are using to exceed their free query
limi
t. If you own the DNS, you need to subscribe to 
the URIBL non-free
service. If you don't own it, you
must set up your own non-recursive DNS
and make sure 
Spamassassin uses it. See *URL* for more details.

*
**
***

> It's probably less effort to actually set up a recursive local name
> server, so anyone who knows about URIBL_BLOCKED will simply do this
> instead.
> 
Indeed, but by including such a logwatch service as part of the SA
package you've provided a clue directly to SA newbies without having to
explain the problem yet again on this mailing list.

I'll even volunteer to write the logwatch service if none of the SA
devs want to do it and we can agree the wording for its banner display.


Martin

Re: Bayes filter marking everything as ham

2016-06-02 Thread Merijn van den Kroonenberg

>> On Thu, 2016-06-02 at 12:28 +0200, Matus UHLAR - fantomas wrote:
>> > > Therefore I agree that there could be better way of noticing admins
>> > > of a [URIBL_BLOCKED] issue.
>>
>> create and install a logwatch service that scans /var/log/maillog
>> for lines containing "URIBL_BLOCKED" - this involves a two line config
>> file and a scanner (a few lines of Perl).
>
> The problem I see with this, though, is that you have to know that
> URIBL_BLOCKED is something sinister, and needs to be flagged as a problem,
> to
> bother doing this.
>
> It's probably less effort to actually set up a recursive local name
> server, so
> anyone who knows about URIBL_BLOCKED will simply do this instead.

I agree, if you have not seen this problem before, then URIBL_BLOCKED just
looks like some disabled URIBL hitting the message. At some point I would
google it, but probably not as the first thing, because it looks like a
normal rule hit, and with low points (so disarmed). So only if I would see
it again and again I might get suspicious.

Re: Bayes filter marking everything as ham

2016-06-02 Thread Reindl Harald




Am 02.06.2016 um 13:16 schrieb Martin Gregorie:

On Thu, 2016-06-02 at 12:28 +0200, Matus UHLAR - fantomas wrote:




Therefore I agree that there could be better way of noticing admins

of a [URIBL_BLOCKED] issue.


There's one obvious way of doing this for very little cost & effort:
use logwatch.

If you're a pukka sysadmin you'll be reading the nightly logwatch
report as you drink your first coffee at work, so all that's needed is
to create and install a logwatch service that scans /var/log/maillog
for lines containing "URIBL_BLOCKED" - this involves a two line config
file and a scanner (a few lines of Perl).

Then install and forget - add it to the SA package: if URIBL blocks the
sysadmin will know about it the next day.


would be a good addition

but i doubt it would help for people which have it in front of their 
eyes, post it with other questions or even with "here my sa report 
header, why did this message pass"


that ones hardly have configured their machines in a way that they ever 
get the logwatch mails or know what logwatch is


the ones which do and know are not the problem



signature.asc
Description: OpenPGP digital signature

Re: Bayes filter marking everything as ham

2016-06-02 Thread Antony Stone

On Thursday 02 June 2016 at 13:16:57, Martin Gregorie wrote:

> On Thu, 2016-06-02 at 12:28 +0200, Matus UHLAR - fantomas wrote:
> > > Therefore I agree that there could be better way of noticing admins
> > > of a [URIBL_BLOCKED] issue.
> 
> create and install a logwatch service that scans /var/log/maillog
> for lines containing "URIBL_BLOCKED" - this involves a two line config
> file and a scanner (a few lines of Perl).

The problem I see with this, though, is that you have to know that 
URIBL_BLOCKED is something sinister, and needs to be flagged as a problem, to 
bother doing this.

It's probably less effort to actually set up a recursive local name server, so 
anyone who knows about URIBL_BLOCKED will simply do this instead.

Antony.

-- 
The next sentence is untrue.
The previous sentence is also not true.

   Please reply to the list;
 please *don't* CC me.

Re: Bayes filter marking everything as ham

2016-06-02 Thread Martin Gregorie

On Thu, 2016-06-02 at 12:28 +0200, Matus UHLAR - fantomas wrote:
> 
> > 
> > Therefore I agree that there could be better way of noticing admins
> of a [URIBL_BLOCKED] issue.
> 
There's one obvious way of doing this for very little cost & effort:
use logwatch. 

If you're a pukka sysadmin you'll be reading the nightly logwatch
report as you drink your first coffee at work, so all that's needed is
to create and install a logwatch service that scans /var/log/maillog
for lines containing "URIBL_BLOCKED" - this involves a two line config
file and a scanner (a few lines of Perl). 

Then install and forget - add it to the SA package: if URIBL blocks the
sysadmin will know about it the next day.


Martin

Re: Bayes filter marking everything as ham

2016-06-02 Thread Reindl Harald




Am 02.06.2016 um 12:28 schrieb Matus UHLAR - fantomas:

On 6/1/2016 3:06 AM, Reindl Harald wrote:

write 1000 times " YOUR SETUP IS CRIPPLED
http://uribl.com/refused.shtml " in the rpeort header and
every 5 seconds into the maillog so that the biggest fool can't
ignore it



Am 01.06.2016 um 15:24 schrieb Matus UHLAR - fantomas:

the setup doesn't have to be crippled to get URIBL_BLOCKED
some people just need to buy access...


On 01.06.16 15:33, Reindl Harald wrote:

in theory


not in theory. in real life.


in reality 99.9% cases where this happens would buying access not
change anything when someone is not capable to run a non-forwarding
resolver


99.9% is not 100% and those 0.1% should NOT be bugged with CRIPPLED setup
messages, they should get proper message


and *it is* crippeled - it likely also has not working RBL scoring
because exeeding limits and the same for dnwsl


receiving much mail is NOT a CRIPPLED setup


the setup IS CRIPPELED in it's function as long URIBL/DNSBL/DNSWL are 
not working - period


i don't care *how* all that fools get a obvious and not ignorable hint 
that they are doing basics wrong, but am tired of "buh my setup don't 
work well" style posts where it's pretty clear that nobody spent a 
second to ensure it works proper and others should solve it and google 
for them


yes, that thread was primary about bayes - but at the end it's a example 
of a careless setup from the very start




signature.asc
Description: OpenPGP digital signature

Re: Bayes filter marking everything as ham

2016-06-02 Thread Matus UHLAR - fantomas


On 6/1/2016 3:06 AM, Reindl Harald wrote:

write 1000 times " YOUR SETUP IS CRIPPLED
http://uribl.com/refused.shtml " in the rpeort header and
every 5 seconds into the maillog so that the biggest fool can't
ignore it



Am 01.06.2016 um 15:24 schrieb Matus UHLAR - fantomas:

the setup doesn't have to be crippled to get URIBL_BLOCKED
some people just need to buy access...


On 01.06.16 15:33, Reindl Harald wrote:

in theory


not in theory. in real life.

in reality 99.9% cases where this happens would buying access not 
change anything when someone is not capable to run a non-forwarding 
resolver


99.9% is not 100% and those 0.1% should NOT be bugged with CRIPPLED setup
messages, they should get proper message

and *it is* crippeled - it likely also has not working RBL scoring 
because exeeding limits and the same for dnwsl


receiving much mail is NOT a CRIPPLED setup.



Am 01.06.2016 um 14:14 schrieb Joe Quinn:

Perhaps, score URIBL_BLOCKED -1000?



On 01.06.16 14:19, Reindl Harald wrote:

no, score it +1000 because when your mailserver start to classify
anything as spam, especially in a setup where high scored mail is
rejected one would look what that rule is giving 1000 points



no way. non-working uribl does not cause any problem to mail flow


but it causes a ton of threads where people are just too lazy for 
google basics and it's not only about uribl, it's also about 
dnsbl/dnswl and usage limits


Therefore I agree that there could be better way of noticing admins of an
issue.

Getting all mail tagged as SPAM or HAM only because of URIBL unavailability
is not a way to notice them properly

We would get even more threads if we scored URIBL_BLOCKED insanely high (or
low).

URIBL_BLOCKED was NOT the main reason why this thread started, the main
reason was broken BAYES.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Microsoft dick is soft to do no harm

Re: Bayes filter marking everything as ham

2016-06-02 Thread Reindl Harald




Am 02.06.2016 um 06:48 schrieb Peter Carlson:

In fact, now that I am confident the script is correct and that my email
chain is in fact processing as I would like, I have moved the script
into cron as user amavis.  With amavis having read permissions to the
appropriate folders ($user/{SPAM|HAM}).


i doubt that this will work out over the long, but not my problem


su -c... I'll keep in mind if I ever want to run the script manually


well, i run such scripts as root with cron so that i can make sure 
permissions are 100% correct and then anything else got started with "su"



Although if I run it manually as root is there really a risk?  Are there
any known attacks?  I guess there could be some form of buffer overflow,
or malformed content that causes SA to crash, but it's hard for me to
imagine anything that could possible allow execution of some form of
injected code.  Or is this really just a case of "general best
practices", "run as little as possible as root"?  (Please dont read
anything into my questions, I am truly curious)


it's simply "run as little as possible as root"
i am curious why one would like to run something with full permissions


On 06/01/2016 09:11 PM, Reindl Harald wrote:


Am 02.06.2016 um 05:06 schrieb Peter Carlson:

ok, after over 50 hours of trying to get this work, I finally have a
solution.
The first (certainly not the only) response that was helpful to the
specific problem I posted was:


If that actually *did* get hits on BAYES_00 in this scenario then you
likely are not training the bayes database than SA is actually using.
What user are you training Bayes as, and what user is SA running under?

Both my sa-learn commands (manual and scripted) as well as SA pointed to
the correct db, however it turns out the training I did re-wrote the
ownership of the db files to root.  A little bit of user permission
adminning and that problem was solved.  sigh, way too many hours lost on
a permissions issue


in other words you are running sa-learn as root while it faces by
definition untrusted content from the web in case of spammails

su -c "command" - username




signature.asc
Description: OpenPGP digital signature

Re: Bayes filter marking everything as ham

2016-06-01 Thread John Hardin

On Wed, 1 Jun 2016, Peter Carlson wrote:

su -c... I'll keep in mind if I ever want to run the script manually. 
Although if I run it manually as root is there really a risk?  Are there any 
known attacks?

If they were known, we'd fix them... :)

I guess there could be some form of buffer overflow, or malformed 
content that causes SA to crash, but it's hard for me to imagine 
anything that could possible allow execution of some form of injected 
code.

You never know.

Or is this really just a case of "general best practices", "run as little as 
possible as root"?  (Please dont read anything into my questions, I am truly 
curious)

Yes. It's generally a bad idea to take the risk of processing data (or 
running programs) received from unknown sources as root. Best practice is 
to avoid doing so.

Peter

On 06/01/2016 09:11 PM, Reindl Harald wrote:

 Am 02.06.2016 um 05:06 schrieb Peter Carlson:
>  ok, after over 50 hours of trying to get this work, I finally have a
>  solution.
>  The first (certainly not the only) response that was helpful to the
>  specific problem I posted was:
> 
> >  If that actually *did* get hits on BAYES_00 in this scenario then you

> >  likely are not training the bayes database than SA is actually using.
> >  What user are you training Bayes as, and what user is SA running 
> >  under?

>  Both my sa-learn commands (manual and scripted) as well as SA pointed to
>  the correct db, however it turns out the training I did re-wrote the
>  ownership of the db files to root.  A little bit of user permission
>  adminning and that problem was solved.  sigh, way too many hours lost on
>  a permissions issue

 in other words you are running sa-learn as root while it faces by
 definition untrusted content from the web in case of spammails

 su -c "command" - username

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  You know things are bad when Pravda says we [the USA] have gone
  too far to the left. -- Joe Huffman
---
 5 days until the 72nd anniversary of D-Day

Re: Bayes filter marking everything as ham

2016-06-01 Thread Peter Carlson

In fact, now that I am confident the script is correct and that my email 
chain is in fact processing as I would like, I have moved the script 
into cron as user amavis.  With amavis having read permissions to the 
appropriate folders ($user/{SPAM|HAM}).


su -c... I'll keep in mind if I ever want to run the script manually.  
Although if I run it manually as root is there really a risk?  Are there 
any known attacks?  I guess there could be some form of buffer overflow, 
or malformed content that causes SA to crash, but it's hard for me to 
imagine anything that could possible allow execution of some form of 
injected code.  Or is this really just a case of "general best 
practices", "run as little as possible as root"?  (Please dont read 
anything into my questions, I am truly curious)


Peter

On 06/01/2016 09:11 PM, Reindl Harald wrote:


Am 02.06.2016 um 05:06 schrieb Peter Carlson:

ok, after over 50 hours of trying to get this work, I finally have a
solution.
The first (certainly not the only) response that was helpful to the
specific problem I posted was:


If that actually *did* get hits on BAYES_00 in this scenario then you
likely are not training the bayes database than SA is actually using.
What user are you training Bayes as, and what user is SA running under?

Both my sa-learn commands (manual and scripted) as well as SA pointed to
the correct db, however it turns out the training I did re-wrote the
ownership of the db files to root.  A little bit of user permission
adminning and that problem was solved.  sigh, way too many hours lost on
a permissions issue


in other words you are running sa-learn as root while it faces by 
definition untrusted content from the web in case of spammails


su -c "command" - username

Re: Bayes filter marking everything as ham

2016-06-01 Thread Reindl Harald



Am 02.06.2016 um 05:06 schrieb Peter Carlson:

ok, after over 50 hours of trying to get this work, I finally have a
solution.
The first (certainly not the only) response that was helpful to the
specific problem I posted was:


If that actually *did* get hits on BAYES_00 in this scenario then you
likely are not training the bayes database than SA is actually using.
What user are you training Bayes as, and what user is SA running under?

Both my sa-learn commands (manual and scripted) as well as SA pointed to
the correct db, however it turns out the training I did re-wrote the
ownership of the db files to root.  A little bit of user permission
adminning and that problem was solved.  sigh, way too many hours lost on
a permissions issue


in other words you are running sa-learn as root while it faces by 
definition untrusted content from the web in case of spammails


su -c "command" - username



signature.asc
Description: OpenPGP digital signature

Re: Bayes filter marking everything as ham

2016-06-01 Thread Peter Carlson

ok, after over 50 hours of trying to get this work, I finally have a 
solution.
The first (certainly not the only) response that was helpful to the 
specific problem I posted was:


If that actually *did* get hits on BAYES_00 in this scenario then you 
likely are not training the bayes database than SA is actually using.
What user are you training Bayes as, and what user is SA running under? 
Both my sa-learn commands (manual and scripted) as well as SA pointed to 
the correct db, however it turns out the training I did re-wrote the 
ownership of the db files to root.  A little bit of user permission 
adminning and that problem was solved.  sigh, way too many hours lost on 
a permissions issue.


Next issue I will tackle is: URIBL_BLOCKED over which I was severly 
flogged even though it was not my question.  I will attack this tomorrow.


After that this comment was made as well:

I've never used Cyrus, but my understanding is that it has one directory
per folder that holds both emails and metadata files. You appear to be
training on both.
This is an excellent catch and I will try and work some bash magic so 
that it only trains on mail messages.


BTW, many people commented on training ham and spam, training ham from 
the inbox, etc.  Most of what I was doing was testing various scenarios 
to try and find /something/anything/ that would produce a sane BAYES 
header.  During the testing I had complete control over the content of 
the imbox and ham and spam folders.  The final configuration is scripted 
training (note not auto_learn) done on ham and spam folders.


Peter

Re: Bayes filter marking everything as ham

2016-06-01 Thread shanew


On Wed, 1 Jun 2016, Reindl Harald wrote:



Am 01.06.2016 um 02:32 schrieb sha...@shanew.net:

 Kind of a shot in the dark, but are you sure everyone is promptly
 moving their spam out of the inboxes?  I worry about automated
 learning like this


autolearning has nothing to do with inboxes

http://www.maiamailguard.com/maia/wiki/sa-autolearn

"autolearn=ham, autolearnscore=-0.001"
"autolearnscore=-0.001" must be a bad joke in the config

hence it's dangerous, unpredictable and will sooner or later ruin your bayes 
without having a corpus where you could kill bad samples, move them from ham 
to spam or the other direction and just rebuild the bayes-db from scratch 
based on the fixed corpus, so you will end in wipe it and start from scratch 
(and need to take care of the minimum amount of training messages until bayes 
get enabled at all again)


I wasn't referring to SA's autolearning feature, which I agree can
suffer from feedback loops if your thresholds are set wrong (I set
my ham threshold to -2 for this reason).

That's why I used the phrase "automated learning" to distunguish OP's
"automated" cron jobs that calls sa-learn.  In retrospect, I should
have used words that more clearly distinguished it from the
autolearning feature.

--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: Bayes filter marking everything as ham

2016-06-01 Thread Benny Pedersen


On 2016-06-01 14:14, Joe Quinn wrote:

write 1000 times " YOUR SETUP IS CRIPPLED 
http://uribl.com/refused.shtml " in the rpeort
header and every 5 seconds into the maillog so that the biggest fool 
can't ignore it

Perhaps, score URIBL_BLOCKED -1000?


disconnect internet, get a life/wife :=)

Re: Bayes filter marking everything as ham

2016-06-01 Thread Reindl Harald




Am 01.06.2016 um 15:39 schrieb RW:

On Tue, 31 May 2016 14:58:05 -0700
Peter Carlson wrote:



# grab all the user folders
users=`find /var/spool/cyrus/mail -name SPAM -print`

...

sa-learn --nosync --spam --progress --dir $inbox/SPAM
sa-learn --nosync --ham --progress --dir $inbox


I've never used Cyrus, but my understanding is that it has one directory
per folder that holds both emails and metadata files. You appear to be
training on both.


and even if not

blindly train every inbox as ham is a road straight to hell for bayes, 
the same for spam in reality - how does one imagine a sane result with 
such a setup?


you train every false positive as spam so any futer mail is again a 
false positive and more and more similar mails become spammy


you train every not caught spam as ham leading in more and more mails 
are not caught and all trained to ham


you play lottery if the user at this moment has looked at his inbox and 
moved spam to the spamfolder, if he is at vacation you train als his not 
caught spam as ham


congratulations building such a setup, comine it with autolearning and 
then complain "Bayes filter marking everything as ham"


bayes training needs to be done *careful* and then you get a nearly 100% 
hitrate, if you train it wrong, well, you get a lottery game at best




signature.asc
Description: OpenPGP digital signature

Re: Bayes filter marking everything as ham

2016-06-01 Thread RW

On Tue, 31 May 2016 14:58:05 -0700
Peter Carlson wrote:

> # grab all the user folders
> users=`find /var/spool/cyrus/mail -name SPAM -print`
...
>     sa-learn --nosync --spam --progress --dir $inbox/SPAM
>     sa-learn --nosync --ham --progress --dir $inbox

I've never used Cyrus, but my understanding is that it has one directory
per folder that holds both emails and metadata files. You appear to be
training on both.

Re: Bayes filter marking everything as ham

2016-06-01 Thread Reindl Harald




Am 01.06.2016 um 15:24 schrieb Matus UHLAR - fantomas:

On 6/1/2016 3:06 AM, Reindl Harald wrote:

write 1000 times " YOUR SETUP IS CRIPPLED
http://uribl.com/refused.shtml " in the rpeort header and
every 5 seconds into the maillog so that the biggest fool can't
ignore it


the setup doesn't have to be crippled to get URIBL_BLOCKED
some people just need to buy access...


in theory

in reality 99.9% cases where this happens would buying access not change 
anything when someone is not capable to run a non-forwarding resolver


and *it is* crippeled - it likely also has not working RBL scoring 
because exeeding limits and the same for dnwsl



Am 01.06.2016 um 14:14 schrieb Joe Quinn:

Perhaps, score URIBL_BLOCKED -1000?


On 01.06.16 14:19, Reindl Harald wrote:

no, score it +1000 because when your mailserver start to classify
anything as spam, especially in a setup where high scored mail is
rejected one would look what that rule is giving 1000 points


no way. non-working uribl does not cause any problem to mail flow


but it causes a ton of threads where people are just too lazy for google 
basics and it's not only about uribl, it's also about dnsbl/dnswl and 
usage limits


HERE: http://spamassassin.apache.org/ put a yellow box with red borders 
and red letters to a page explaining the absoluetly basics and name 
non-forwarding resolvers directly on the homepage


countless people are complaing that SA is not working as expected all 
the time because they don't bother to doing their homework befor setup a 
server and connect it to the internet








signature.asc
Description: OpenPGP digital signature

Re: Bayes filter marking everything as ham

2016-06-01 Thread Matus UHLAR - fantomas


On 6/1/2016 3:06 AM, Reindl Harald wrote:

write 1000 times " YOUR SETUP IS CRIPPLED
http://uribl.com/refused.shtml " in the rpeort header and
every 5 seconds into the maillog so that the biggest fool can't ignore it


the setup doesn't have to be crippled to get URIBL_BLOCKED
some people just need to buy access...


Am 01.06.2016 um 14:14 schrieb Joe Quinn:

Perhaps, score URIBL_BLOCKED -1000?


On 01.06.16 14:19, Reindl Harald wrote:
no, score it +1000 because when your mailserver start to classify 
anything as spam, especially in a setup where high scored mail is 
rejected one would look what that rule is giving 1000 points


no way. non-working uribl does not cause any problem to mail flow, it just
lowers hit rate. Therefore it should not cause mail to be handled
differently, it just has to provide information about this issue.

in fact score of 0 should be just OK, but in such case it would never be
shown. 


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux IS user friendly, it's just selective who its friends are...

Re: Bayes filter marking everything as ham

2016-06-01 Thread Bowie Bailey


On 5/31/2016 8:32 PM, sha...@shanew.net wrote:

Kind of a shot in the dark, but are you sure everyone is promptly
moving their spam out of the inboxes?  I worry about automated
learning like this.  Even then, it seems unlikely that every mail
would get tagged by bayes as likely ham.

Someone just today suggested in another thread to add the following
line in local.cf in order to get more detail on what bayes is doing
under the hood.  It would provide more information for you (and us) to
go on.

add_header all Bayes 
bayes=_BAYES_,N=_BAYESTC_(_BAYESTCLEARNED_-_BAYESTCHAMMY_+_BAYESTCSPAMMY_),ham=(_HAMMYTOKENS(5,short)_), 
spam=(_SPAMMYTOKENS(5,short)_)


You might have to run SA directly against messages, as amavis may
throw out custom headers like that (I _know_ spamass-milter would).


Amavis only gets the score from SA.  Any SA header settings are ignored.

You can add that header config to the local.cf file and then run SA on 
one of the messages that got BAYES_00 to see what it's doing. You just 
have to be VERY sure that you run the command as the same user that 
Amavis is running as.  Otherwise, your results will be totally meaningless.


--
Bowie

Re: Bayes filter marking everything as ham

2016-06-01 Thread Reindl Harald




Am 01.06.2016 um 14:14 schrieb Joe Quinn:

On 6/1/2016 3:06 AM, Reindl Harald wrote:


Am 01.06.2016 um 02:38 schrieb David Jones:

From: Reindl Harald <h.rei...@thelounge.net>
Sent: Tuesday, May 31, 2016 6:27 PM
To: users@spamassassin.apache.org
Subject: Re: Bayes filter marking everything as ham



Am 31.05.2016 um 23:58 schrieb Peter Carlson:

May 30 09:04:53 www amavis[16577]: (16577-03) Passed CLEAN
{RelayedInbound},  Tests:
[BAYES_00=-1.9,RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001,URIBL_BLOCKED=0.001],

autolearn=ham autolearn_force=no, autolearnscore=-0.001, 3992 ms


https://wiki.apache.org/spamassassin/ImproveAccuracy


the next one with amavis and URIBL_BLOCKED
(http://uribl.com/refused.shtml) - i get tired of aksing for help hile
not doing basic homework



Amavias != pure SA
URIBL_BLOCKED == read some basics


Too bad we couldn't make SA do something very annoying and
more obvious when the URIBL_BLOCKED rule was hit.  Any ideas?


write 1000 times " YOUR SETUP IS CRIPPLED
http://uribl.com/refused.shtml " in the rpeort header and
every 5 seconds into the maillog so that the biggest fool can't ignore it


Perhaps, score URIBL_BLOCKED -1000?


no, score it +1000 because when your mailserver start to classify 
anything as spam, especially in a setup where high scored mail is 
rejected one would look what that rule is giving 1000 points




signature.asc
Description: OpenPGP digital signature

Re: Bayes filter marking everything as ham

2016-06-01 Thread Joe Quinn


On 6/1/2016 3:06 AM, Reindl Harald wrote:



Am 01.06.2016 um 02:38 schrieb David Jones:

From: Reindl Harald <h.rei...@thelounge.net>
Sent: Tuesday, May 31, 2016 6:27 PM
To: users@spamassassin.apache.org
Subject: Re: Bayes filter marking everything as ham



Am 31.05.2016 um 23:58 schrieb Peter Carlson:

May 30 09:04:53 www amavis[16577]: (16577-03) Passed CLEAN
{RelayedInbound},  Tests:
[BAYES_00=-1.9,RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001,URIBL_BLOCKED=0.001],
autolearn=ham autolearn_force=no, autolearnscore=-0.001, 3992 ms


https://wiki.apache.org/spamassassin/ImproveAccuracy


the next one with amavis and URIBL_BLOCKED
(http://uribl.com/refused.shtml) - i get tired of aksing for help hile
not doing basic homework



Amavias != pure SA
URIBL_BLOCKED == read some basics


Too bad we couldn't make SA do something very annoying and
more obvious when the URIBL_BLOCKED rule was hit.  Any ideas?


write 1000 times " YOUR SETUP IS CRIPPLED 
http://uribl.com/refused.shtml " in the rpeort header and 
every 5 seconds into the maillog so that the biggest fool can't ignore it



Perhaps, score URIBL_BLOCKED -1000?

Re: Bayes filter marking everything as ham

2016-06-01 Thread Matus UHLAR - fantomas


On 31.05.16 14:58, Peter Carlson wrote:

  (sorry if this is a repost, I dont see my messages coming through...the
  irony of spamassassin.apache.org trapping my request for help as spam.  I
  have snipped the logfile entries which I think were causing it to be
  tagged as spam)


please, avoid HTML mail to mailing lists, whenever possible.


  All of my messages are being tagged with BAYES_00=-1.9
  I have cleared the bayes db (sa-learn --clear), then I manually trained. 
  Here are the results:

sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0    642  0  non-token data: nspam
0.000  0   9415  0  non-token data: nham


there's cleanly some spam learned.


May 30 08:34:13 www amavis[16252]: (16252-01) Passed SPAMMY
{RelayedTaggedInbound},  Tests:

[BAYES_00=-1.9,HTML_MESSAGE=0.001,HTML_TAG_BALANCE_BODY=1.157,MIME_HTML_MOSTLY=0.428,MPART_ALT_DIFF=0.79,RAZOR2_CHECK=0.922,SPF_FAIL=0.001,SPF_HELO_FAIL=0.001,THIS_AD=1.675,T_HTML_TAG_BALANCE_CENTER=0.01,URIBL_BLOCKED=0.001,URIBL_DBL_SPAM=2.5],
autolearn=no autolearn_force=no, autolearnscore=8.272, 4054 ms


and some even caught, even with BAYES_00



  The spam is learned by a simple bash script.  The users (my family) move
  spam into a SPAM folder.  This script then runs every night ( I have
  removed some of the logging lines and comments for brevity):



#!/bin/bash
# delete messages this old
cleanafter=14
# grab all the user folders
users=`find /var/spool/cyrus/mail -name SPAM -print`
for u in ${users[@]}; do
    inbox=${u%/*}
    folder=${u##*/}
    user=${inbox##*/}
    sa-learn --nosync --spam --progress --dir $inbox/SPAM
    sa-learn --nosync --ham --progress --dir $inbox


1. you use amavis, so you must run sa-learn under the amavis user.

2. if the same mail appears in $inbox and $inbox/SPAM both, it will be
learned as ham. retraining the same mail will overwrite the 


simply exchanging those lines (first train ham, then spam) _could_ help you
much.

and of course read what the others advised you

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I'm not interested in your website anymore.
If you need cookies, bake them yourself.

Re: Bayes filter marking everything as ham

2016-06-01 Thread Reindl Harald



Am 01.06.2016 um 02:32 schrieb sha...@shanew.net:

Kind of a shot in the dark, but are you sure everyone is promptly
moving their spam out of the inboxes?  I worry about automated
learning like this


autolearning has nothing to do with inboxes

http://www.maiamailguard.com/maia/wiki/sa-autolearn

"autolearn=ham, autolearnscore=-0.001"
"autolearnscore=-0.001" must be a bad joke in the config

hence it's dangerous, unpredictable and will sooner or later ruin your 
bayes without having a corpus where you could kill bad samples, move 
them from ham to spam or the other direction and just rebuild the 
bayes-db from scratch based on the fixed corpus, so you will end in wipe 
it and start from scratch (and need to take care of the minimum amount 
of training messages until bayes get enabled at all again)







signature.asc
Description: OpenPGP digital signature

Re: Bayes filter marking everything as ham

2016-06-01 Thread Reindl Harald




Am 01.06.2016 um 02:38 schrieb David Jones:

From: Reindl Harald <h.rei...@thelounge.net>
Sent: Tuesday, May 31, 2016 6:27 PM
To: users@spamassassin.apache.org
Subject: Re: Bayes filter marking everything as ham



Am 31.05.2016 um 23:58 schrieb Peter Carlson:

May 30 09:04:53 www amavis[16577]: (16577-03) Passed CLEAN
{RelayedInbound},  Tests:

[BAYES_00=-1.9,RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001,URIBL_BLOCKED=0.001],
autolearn=ham autolearn_force=no, autolearnscore=-0.001, 3992 ms


https://wiki.apache.org/spamassassin/ImproveAccuracy


the next one with amavis and URIBL_BLOCKED
(http://uribl.com/refused.shtml) - i get tired of aksing for help hile
not doing basic homework



Amavias != pure SA
URIBL_BLOCKED == read some basics


Too bad we couldn't make SA do something very annoying and
more obvious when the URIBL_BLOCKED rule was hit.  Any ideas?


write 1000 times " YOUR SETUP IS CRIPPLED 
http://uribl.com/refused.shtml " in the rpeort header and 
every 5 seconds into the maillog so that the biggest fool can't ignore it




signature.asc
Description: OpenPGP digital signature

Re: Bayes filter marking everything as ham

2016-06-01 Thread Reindl Harald




Am 01.06.2016 um 02:04 schrieb Peter Carlson:

On 05/31/2016 04:27 PM, Reindl Harald wrote:



Am 31.05.2016 um 23:58 schrieb Peter Carlson:

May 30 09:04:53 www amavis[16577]: (16577-03) Passed CLEAN
{RelayedInbound},  Tests:
[BAYES_00=-1.9,RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001,URIBL_BLOCKED=0.001],

autolearn=ham autolearn_force=no, autolearnscore=-0.001, 3992 ms


the next one with amavis and URIBL_BLOCKED
(http://uribl.com/refused.shtml) - i get tired of aksing for help hile
not doing basic homework

Wow...I could say the same...I get tired of doing homework and not
getting any help


sorry, but when here are just 4 rules in teh reporting header i expect 
from anybody running a mailserver looking what they mean - just because 
it takes the same time then whine on a mailing list



Amavias != pure SA

I never claimed it was.  Are you insinuating that somehow amavis is
causing  a BAYES_00 false negative?


WTF - in your original post it was not clear that bayes is your problem 
because even with a BAYES_00 when you would follow basic best prctices 
like a recusrsion dns-cache on your inbound MX you would have a ton of 
RBL/URIBL likely catch the message and overrule BAYES_00


finally that would likely have prevented ruin your bayes from the start

however, your major mistake (besides URIBL_BLACK) is running 
autolearning before you have a solid base (setup itself and hand 
classified messages)



URIBL_BLOCKED == read some basics

your reply == useless.  You have no idea what I may or may not have
read.  You are under no obligation to provide any help to me or anyone
else.  I suggest that if for whatever reason you find my question
offensive that instead of hitting reply, you simply hit delete.

My initial question still remains, why is BAYES_00 always at -1.9. Why
is it marking all messages as ham?


because some fool trains spam as ham (autolearning and wrong manual 
learning combined)


  0.000  0642  0  non-token data: nspam
  0.000  0   9415  0  non-token data: nham

what do you expect when you train 15 times more ham than spam and than 
have "autolearn=ham" enabled?




signature.asc
Description: OpenPGP digital signature

Re: Bayes filter marking everything as ham

2016-05-31 Thread David Jones

>https://wiki.apache.org/spamassassin/ImproveAccuracy

>I have gone through this wiki (and ones like it) at least a dozen times.
>My server is blocking about 50% of the spam, thanks to some of the
>other layers of spam protection.  It's just bayes that I can't seem to get 
>right

Are you getting any BAYES_* rules hits above 70?  If you have trained enough
spam then you should be seeing some like BAYES_99.

Make sure you know what user that your SA is running as then run this command
as that user:

sa-learn --dump magic

Often times admins accidentally train as one user like root then SA runs as
amavis or some other account so the Bayes DB is not recognized.

You can force the location of the Bayes DB by setting bayes_path following
this link:

https://wiki.apache.org/spamassassin/SiteWideBayesSetup

I don't have much 
experience with Amavis but I hear it's pretty good. Maybe
this will help:
http://linux.kieser.net/salearn.html

I have had success with MailScanner for almost 14 years with Postfix.
Postscreen blocks the majority of junk using Postscreen and weighted
RBLs.  This has been covered on this mailing list so search the archives.
Then I enable SHORTCIRCUIT rules for many whitelist_auth senders to
prevent my Bayes from being semi-poisoned with legit senders that
could look spammy but follow proper practices for valid unsubscribing
by end users. In the end, SA only has to scan a very small percentage
of my mail which helps the Bayesian scoring along side some custom
meta rules that I don't have to babysit a lot.

Dave

Re: Bayes filter marking everything as ham

2016-05-31 Thread Martin Gregorie

On Wed, 2016-06-01 at 00:38 +, David Jones wrote:

> 
> Too bad we couldn't make SA do something very annoying and
> more obvious when the URIBL_BLOCKED rule was hit.
> 
I notice, rather to my surprise, that the SA Wiki doesn't seem to have
an entry for the URIBL_BLOCKED rule. However, since people don't seem
to have even run a web search or scanned the list archive before coming
here for answers, I suppose they wouldn't look there either.


Martin

Re: Bayes filter marking everything as ham

2016-05-31 Thread Martin Gregorie

On Tue, 2016-05-31 at 17:04 -0700, Peter Carlson wrote:
> 
> URIBL_BLOCKED == read some basics
> your reply == useless.  You have no idea what I may or may not have 
> read.  You are under no obligation to provide any help to me or
> anyone 
> else.  I suggest that if for whatever reason you find my question 
> offensive that instead of hitting reply, you simply hit delete.
> 
GIYF. Since this comes up regularly in this mailing list and the first
two answers to a query on 'URIBL_BLOCKED' tell you exactly what it
means when this rule fires and how to stop it firing, it looks as
though you haven't searched for answers in either place before coming
here. 
   
> 
> My initial question still remains, why is BAYES_00 always at -1.9.
> Why is it marking all messages as ham?
> 
Because you haven't trained Bayes correctly. Training it is well
explained in the SA documentation, so RTFM and do what it says.


Martin

Re: Bayes filter marking everything as ham

2016-05-31 Thread John Hardin


On Tue, 31 May 2016, Peter Carlson wrote:

I will investigate this (URIBL_BLOCKED) further tomorrow 
(https://wiki.apache.org/spamassassin/CachingNameserver),


Note: caching != recursing. You can have a caching forwarding local 
nameserver, which won't fix URIBL_BLOCKED.


however I doubt that it is causing the bayesian classifier to report as 
ham.


It's not.


https://wiki.apache.org/spamassassin/ImproveAccuracy

I have gone through this wiki (and ones like it) at least a dozen times.   My 
server is blocking about 50% of the spam, thanks
to some of the other layers of spam protection.  It's just bayes that I can't 
seem to get right

Why is it marking all messages as ham?

  Probably because you're overtraining as ham.

I have tried the following 3 scenarios:
  1.  Training SPAM only  (it's in its own folder)


Won't work, bayes needs both spam and ham to be able to tell the 
difference. If that actually *did* get hits on BAYES_00 in this scenario 
then you likely are not training the bayes database than SA is actually 
using.


What user are you training Bayes as, and what user is SA running under?


  2.  Training a folder with known HAM


This is the only reliable method.


  3.  Training with inbox as HAM


See earlier comments.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Justice is justice, whereas "social justice" is code for one set
  of rules for the rich, another for the poor; one set for whites,
  another set for minorities; one set for straight men, another for
  women and gays. In short, it's the opposite of actual justice.
-- Burt Prelutsky
---
 6 days until the 72nd anniversary of D-Day

Re: Bayes filter marking everything as ham

2016-05-31 Thread Peter Carlson


  
  

  not everyone is an email
expert that understands how RBLs work and that it's bad
to share a recursive DNS server on an SA server.

I will investigate this (URIBL_BLOCKED) further tomorrow
(https://wiki.apache.org/spamassassin/CachingNameserver), however I
doubt that it is causing the bayesian classifier to report as ham. 


  https://wiki.apache.org/spamassassin/ImproveAccuracy

I have gone through this wiki (and ones like it) at least a dozen
times.   My server is blocking about 50% of the spam, thanks to some
of the other layers of spam protection.  It's just bayes that I
can't seem to get right


  Why is it marking
all messages as ham?

  
  
  Probably because you're overtraining as ham.

I have tried the following 3 scenarios:
1.  Training SPAM only  (it's in its own folder)
  2.  Training a folder with known HAM
  3.  Training with inbox as HAM

In each scenario I carefully ensured that only SPAM was in SPAM and
no SPAM was with HAM.  In all 3 scenarios it always marks every
message as BAYES_00=-1.9.  Before each scenario I ran sa-learn
--clear and sa-learn --sync

add_header all Bayes bayes=_BAYES_,N=_BAYESTC_(_BAYESTCLEARNED_-_BAYESTCHAMMY_+_BAYESTCSPAMMY_),ham=(_HAMMYTOKENS(5,short)_),
  spam=(_SPAMMYTOKENS(5,short)_)
  

I will try this as well to see what happens.

Thanks for the responses, i do appreciate the help.

Peter

Re: Bayes filter marking everything as ham

2016-05-31 Thread John Hardin

On Tue, 31 May 2016, Peter Carlson wrote:

On 05/31/2016 04:27 PM, Reindl Harald wrote:

 Am 31.05.2016 um 23:58 schrieb Peter Carlson:
>  May 30 09:04:53 www amavis[16577]: (16577-03) Passed CLEAN
>  {RelayedInbound},  Tests:
>  [BAYES_00=-1.9,RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001,URIBL_BLOCKED=0.001],
>  autolearn=ham autolearn_force=no, autolearnscore=-0.001, 3992 ms

 URIBL_BLOCKED == read some basics
your reply == useless.  You have no idea what I may or may not have read. 
You are under no obligation to provide any help to me or anyone else.  I 
suggest that if for whatever reason you find my question offensive that 
instead of hitting reply, you simply hit delete.

There's a lot of history on the mailing list that can be found using 
"URIBL_BLOCKED" as a search term, and in the detailed rule description for 
it there's a URI for an explanation.

Basically: Do not use forwarded DNS for SA. Always use a local recursing 
resolver.

My initial question still remains, why is BAYES_00 always at -1.9.

Unhelpful but obvious answer: That's it's assigned score.

Why is it marking all messages as ham?

Probably because you're overtraining as ham.

Training the user's inbox as ham is a bad idea. That will score any spammy 
message they haven't moved to another folder as ham.

Set your users up with an explicit train-as-ham folder and tell them to 
*copy* hams to that folder. Ideally you'd review that yourself and move 
valid hams to another folder that sa-learn actually trains from, but I 
don't know whether you have privacy concerns with family members.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  ...we no longer live in a nation of laws. If we did,
  [Hillary Clinton] wouldn't be running for office,
  she'd be running for Mexico.-- Bill Whittle
  (or somewhere that does not have an extradition treaty with the US)
---
 6 days until the 72nd anniversary of D-Day

Re: Bayes filter marking everything as ham

2016-05-31 Thread David Jones

>From: Reindl Harald <h.rei...@thelounge.net>
>Sent: Tuesday, May 31, 2016 6:27 PM
>To: users@spamassassin.apache.org
>Subject: Re: Bayes filter marking everything as ham

>Am 31.05.2016 um 23:58 schrieb Peter Carlson:
>> May 30 09:04:53 www amavis[16577]: (16577-03) Passed CLEAN
>> {RelayedInbound},  Tests:
>> 
>> [BAYES_00=-1.9,RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001,URIBL_BLOCKED=0.001],
>> autolearn=ham autolearn_force=no, autolearnscore=-0.001, 3992 ms

https://wiki.apache.org/spamassassin/ImproveAccuracy

>the next one with amavis and URIBL_BLOCKED
>(http://uribl.com/refused.shtml) - i get tired of aksing for help hile
>not doing basic homework

>Amavias != pure SA
>URIBL_BLOCKED == read some basics

Too bad we couldn't make SA do something very annoying and
more obvious when the URIBL_BLOCKED rule was hit.  Any ideas?
There should be some more obvious way to tell sysadmins to
Google URIBL_BLOCKED or point them to a wiki page.  This
comes up way too many times and not everyone is an email
expert that understands how RBLs work and that it's bad
to share a recursive DNS server on an SA server.

Re: Bayes filter marking everything as ham

2016-05-31 Thread shanew


Kind of a shot in the dark, but are you sure everyone is promptly
moving their spam out of the inboxes?  I worry about automated
learning like this.  Even then, it seems unlikely that every mail
would get tagged by bayes as likely ham.

Someone just today suggested in another thread to add the following
line in local.cf in order to get more detail on what bayes is doing
under the hood.  It would provide more information for you (and us) to
go on.

add_header all Bayes 
bayes=_BAYES_,N=_BAYESTC_(_BAYESTCLEARNED_-_BAYESTCHAMMY_+_BAYESTCSPAMMY_),ham=(_HAMMYTOKENS(5,short)_),
 spam=(_SPAMMYTOKENS(5,short)_)

You might have to run SA directly against messages, as amavis may
throw out custom headers like that (I _know_ spamass-milter would).


On Tue, 31 May 2016, Peter Carlson wrote:


(sorry if this is a repost, I dont see my messages coming through...the
irony of spamassassin.apache.org trapping my request for help as spam.  I
have snipped the logfile entries which I think were causing it to be tagged
as spam)

All of my messages are being tagged with BAYES_00=-1.9
I have cleared the bayes db (sa-learn --clear), then I manually trained. 
Here are the results:
  sa-learn --dump magic
  0.000  0  3  0  non-token data: bayes db
  version
  0.000  0    642  0  non-token data: nspam
  0.000  0   9415  0  non-token data: nham
  0.000  0 119685  0  non-token data: ntokens
  0.000  0 1461963062  0  non-token data: oldest
  atime
  0.000  0 1464701914  0  non-token data: newest
  atime
  0.000  0  0  0  non-token data: last
  journal sync atime
  0.000  0 1464701937  0  non-token data: last
  expiry atime
  0.000  0    2764800  0  non-token data: last
  expire atime delta
  0.000  0 455262  0  non-token data: last
  expire reduction count

Here are two examples it shows that the bayes filter is very confident these
emails are ham:
  May 30 09:04:53 www amavis[16577]: (16577-03) Passed CLEAN
  {RelayedInbound},  
Tests:[BAYES_00=-1.9,RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001,URIBL_BLOCKED=0.001
  ], autolearn=ham autolearn_force=no, autolearnscore=-0.001, 3992
  ms

  May 30 08:34:13 www amavis[16252]: (16252-01) Passed SPAMMY
  {RelayedTaggedInbound},  
Tests:[BAYES_00=-1.9,HTML_MESSAGE=0.001,HTML_TAG_BALANCE_BODY=1.157,MIME_HTML_MOS
TLY=0.428,MPART_ALT_DIFF=0.79,RAZOR2_CHECK=0.922,SPF_FAIL=0.001,SPF_HELO_FA
IL=0.001,THIS_AD=1.675,T_HTML_TAG_BALANCE_CENTER=0.01,URIBL_BLOCKED=0.001,U
  RIBL_DBL_SPAM=2.5], autolearn=no autolearn_force=no,
  autolearnscore=8.272, 4054 ms

The spam is learned by a simple bash script.  The users (my family) move
spam into a SPAM folder.  This script then runs every night ( I have removed
some of the logging lines and comments for brevity):
  #!/bin/bash
  # delete messages this old
  cleanafter=14

  # grab all the user folders
  users=`find /var/spool/cyrus/mail -name SPAM -print`

  for u in ${users[@]}; do
      inbox=${u%/*}
      folder=${u##*/}
      user=${inbox##*/}
      sa-learn --nosync --spam --progress --dir $inbox/SPAM
      sa-learn --nosync --ham --progress --dir $inbox

  done

  # sync the sa db
  sa-learn --sync


Setup:
ubuntu server 14.04
postfix:2.11.0
amavis:2.7.1
spamassassin:3.4.0

postfix config (main.cf):
  content_filter = smtp-amavis:[127.0.0.1]:10024
  smtpd_recipient_restrictions =
      permit_sasl_authenticated,
      permit_mynetworks,
      reject_unauth_destination,
      reject_rbl_client zen.spamhaus.org,
      reject_rbl_client bl.spamcop.net
      reject_rbl_client ix.dnsbl.manitu.net,
      reject_rbl_client cbl.abuseat.org,
      reject_rbl_client b.barracudacentral.org,
      reject_rbl_client new.spam.dnsbl.sorbs.net

  smtpd_client_restrictions =
      permit_sasl_authenticated,
     permit_mynetworks,
      reject_rbl_client zen.spamhaus.org,
      reject_rbl_client bl.spamcop.net
      reject_rbl_client ix.dnsbl.manitu.net,
      reject_rbl_client cbl.abuseat.org,
      reject_rbl_client b.barracudacentral.org,
      reject_rbl_client new.spam.dnsbl.sorbs.net

spamassasin config:
   rewrite_header Subject *PC SPAM*
   trusted_networks 192.168.
   required_score 5.0
   use_bayes 1
   bayes_auto_learn 0
  # bayes_ignore_header X-Bogosity
  # bayes_ignore_header X-Spam-Flag
  # bayes_ignore_header X-Spam-Status
  ifplugin Mail::SpamAssassin::Plugin::Shortcircuit
  # shortcircuit USER_IN_WHITELIST   on
  # shortcircuit USER_IN_DEF_WHITELIST   on
  # shortcircuit USER_IN_ALL_SPAM_TO on
  #

Re: Bayes filter marking everything as ham

2016-05-31 Thread Peter Carlson


On 05/31/2016 04:27 PM, Reindl Harald wrote:



Am 31.05.2016 um 23:58 schrieb Peter Carlson:

May 30 09:04:53 www amavis[16577]: (16577-03) Passed CLEAN
{RelayedInbound},  Tests:
[BAYES_00=-1.9,RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001,URIBL_BLOCKED=0.001],
autolearn=ham autolearn_force=no, autolearnscore=-0.001, 3992 ms


the next one with amavis and URIBL_BLOCKED 
(http://uribl.com/refused.shtml) - i get tired of aksing for help hile 
not doing basic homework
Wow...I could say the same...I get tired of doing homework and not 
getting any help


Amavias != pure SA
I never claimed it was.  Are you insinuating that somehow amavis is 
causing  a BAYES_00 false negative?

URIBL_BLOCKED == read some basics
your reply == useless.  You have no idea what I may or may not have 
read.  You are under no obligation to provide any help to me or anyone 
else.  I suggest that if for whatever reason you find my question 
offensive that instead of hitting reply, you simply hit delete.


My initial question still remains, why is BAYES_00 always at -1.9. Why 
is it marking all messages as ham?


Peter

Re: Bayes filter marking everything as ham

2016-05-31 Thread Reindl Harald




Am 31.05.2016 um 23:58 schrieb Peter Carlson:

May 30 09:04:53 www amavis[16577]: (16577-03) Passed CLEAN
{RelayedInbound},  Tests:

[BAYES_00=-1.9,RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001,URIBL_BLOCKED=0.001],
autolearn=ham autolearn_force=no, autolearnscore=-0.001, 3992 ms


the next one with amavis and URIBL_BLOCKED 
(http://uribl.com/refused.shtml) - i get tired of aksing for help hile 
not doing basic homework


Amavias != pure SA
URIBL_BLOCKED == read some basics



signature.asc
Description: OpenPGP digital signature

Bayes filter marking everything as ham

2016-05-31 Thread Peter Carlson


  
  
 (sorry if this is a
  repost, I dont see my messages coming through...the irony of
  spamassassin.apache.org trapping my request for help as spam.  I
  have snipped the logfile entries which I think were causing it to
  be tagged as spam)
  
  All of my messages are being tagged with BAYES_00=-1.9
  I have cleared the bayes db (sa-learn --clear), then I manually
  trained.  Here are the results:
  sa-learn --dump magic
0.000  0  3  0  non-token data:
  bayes db version
  0.000  0    642  0  non-token data: nspam
  0.000  0   9415  0  non-token data: nham
  0.000  0 119685  0  non-token data:
  ntokens
  0.000  0 1461963062  0  non-token data: oldest
  atime
  0.000  0 1464701914  0  non-token data: newest
  atime
  0.000  0  0  0  non-token data: last
  journal sync atime
  0.000  0 1464701937  0  non-token data: last
  expiry atime
  0.000  0    2764800  0  non-token data: last
  expire atime delta
  0.000  0 455262  0  non-token data: last
  expire reduction count

  
  Here are two examples it shows that the bayes filter is very
  confident these emails are ham:
  May 30 09:04:53 www amavis[16577]: (16577-03)
  Passed CLEAN {RelayedInbound},  Tests:
  [BAYES_00=-1.9,RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001,URIBL_BLOCKED=0.001],







  autolearn=ham autolearn_force=no, autolearnscore=-0.001, 3992
  ms
  
  May 30 08:34:13 www amavis[16252]: (16252-01) Passed SPAMMY
  {RelayedTaggedInbound},  Tests:
  [BAYES_00=-1.9,HTML_MESSAGE=0.001,HTML_TAG_BALANCE_BODY=1.157,MIME_HTML_MOSTLY=0.428,MPART_ALT_DIFF=0.79,RAZOR2_CHECK=0.922,SPF_FAIL=0.001,SPF_HELO_FAIL=0.001,THIS_AD=1.675,T_HTML_TAG_BALANCE_CENTER=0.01,URIBL_BLOCKED=0.001,URIBL_DBL_SPAM=2.5],








  autolearn=no autolearn_force=no, autolearnscore=8.272, 4054 ms
  

  The spam is learned by a simple bash script.  The users (my
  family) move spam into a SPAM folder.  This script then runs every
  night ( I have removed some of the logging lines and comments for
  brevity):
  #!/bin/bash
# delete messages this old
cleanafter=14

# grab all the user folders
users=`find /var/spool/cyrus/mail -name SPAM -print`

for u in ${users[@]}; do
    inbox=${u%/*}
    folder=${u##*/}
    user=${inbox##*/}
    sa-learn --nosync --spam --progress --dir
  $inbox/SPAM
    sa-learn --nosync --ham --progress --dir $inbox

done

# sync the sa db
sa-learn --sync
  
  
  Setup:
  ubuntu server 14.04
  postfix:2.11.0
  amavis:2.7.1
  spamassassin:3.4.0
  
  postfix config (main.cf):
  content_filter = smtp-amavis:[127.0.0.1]:10024
smtpd_recipient_restrictions =
    permit_sasl_authenticated,
    permit_mynetworks,
    reject_unauth_destination,
    reject_rbl_client zen.spamhaus.org,
    reject_rbl_client bl.spamcop.net
    reject_rbl_client ix.dnsbl.manitu.net,
    reject_rbl_client cbl.abuseat.org,
    reject_rbl_client b.barracudacentral.org,
    reject_rbl_client new.spam.dnsbl.sorbs.net

smtpd_client_restrictions =
    permit_sasl_authenticated,
   permit_mynetworks,
    reject_rbl_client zen.spamhaus.org,
    reject_rbl_client bl.spamcop.net
    reject_rbl_client ix.dnsbl.manitu.net,
    reject_rbl_client cbl.abuseat.org,
    reject_rbl_client b.barracudacentral.org,
    reject_rbl_client new.spam.dnsbl.sorbs.net
  
  spamassasin config:
   rewrite_header Subject *PC SPAM*
 trusted_networks 192.168.
 required_score 5.0
 use_bayes 1
 bayes_auto_learn 0
# bayes_ignore_header X-Bogosity
# bayes_ignore_header X-Spam-Flag
# bayes_ignore_header X-Spam-Status
ifplugin Mail::SpamAssassin::Plugin::Shortcircuit
# shortcircuit USER_IN_WHITELIST   on
# shortcircuit USER_IN_DEF_WHITELIST   on
# shortcircuit USER_IN_ALL_SPAM_TO on
# shortcircuit SUBJECT_IN_WHITELIST    on
# shortcircuit USER_IN_BLACKLIST   on
# shortcircuit USER_IN_BLACKLIST_TO    on
# shortcircuit SUBJECT_IN_BLACKLIST    on
# shortcircuit ALL_TRUSTED on
# shortcircuit BAYES_99    spam
# shortcircuit BAYES_00

45 matches

Mail list logo