Re: Question about BAYES_999

2018-01-04 Thread Alex
>> # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|wc
>> 6508  247134 16925929
>> # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep
>> BAYES_999|grep BAYES_99\"|wc
>> 6508  247134 16925929
>>
>
> You need that last grep for BAYES_99 to be a "grep -v" and it needs some
> delimiter after the "99" to disinguish it from "999" like an equals sign
> since that is how amavis outputs it's rule hits and score.
>
> Jan  4 06:41:59 mail02 amavis[15124]: (15124-14) Passed SPAM
> {RelayedTaggedInbound}, [203.246.167.14]:63669 [203.246.167.14]
>  -> , Queue-ID: C193E4A5F78C,
> Message-ID: <9d14f53b-e8f9-186d-339d-aece00029...@zeilcar.net>, mail_id:
> pDEMud2MEZKg, Hits: 55.731, size: 9691, queued_as: 9C5CD4A5F795, 1328 ms,
> Tests: [BAYES_999=0.2,...
>
> Note the "BAYES_999=0.2" above would make your grep look like this:
>
> # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|grep -v
> BAYES_99=|wc

Ugh, yes, sorry. This was the result of pasting the wrong line while
experimenting. My separator is a quote. This is actually more precise
now, as the logging separates the rules into tests, tests_ham and
tests_spam:

# cat /var/log/maillog|grep timestamp|grep BAYES_99|perl -p -e
's|.*tests\":\[(.*)\],\"tests_ham.*|$1|'|grep BAYES_999\"|grep -v
BAYES_99\"

results with nothing printed.

...
ing in your area","subject_rot13":"EBPXL, n ybfg QBT, vf zvffvat va
lbhe 
nern","tests":["BAYES_99","BAYES_999","DCC_CHECK","DKIM_SIGNED","DKIM_VALID","DKIM_VALID_AU","HTML_IMAGE_RATIO_04","HTML_MESSAGE","MIME_HTML_ONLY","RCVD_IN_DNSWL_NONE","RCVD_IN_HOSTKARMA_W","RCVD_IN_SENDERSCORE_90_100","RELAYCOUNTRY_US","SPF_HELO_PASS","SPF_PASS","TXREP","T_DMARC_TESTS_PASS","T_REMOTE_IMAGE","T_RP_MATCHES_RCVD"],"tests_ham":["RCVD_IN_HOSTKARMA_W","RCVD_IN_SENDERSCORE_90_100","DKIM_VALID_AU","DKIM_VALID","T_RP_MATCHES_RCVD","TXREP","SPF_HELO_PASS","SPF_PASS","RCVD_IN_DNSWL_NONE"],"tests_spam":["BAYES_99","MIME_HTML_ONLY","HTML_IMAGE_RATIO_04","DCC_CHECK","BAYES_999","DKIM_SIGNED","T_DMARC_TESTS_PASS","RELAYCOUNTRY_US","T_REMOTE_IMAGE","HTML_MESSAGE"],"time_iso_week_date":"2018-W01-4","time_unix":1515042197.974,"to_addr":["mor...@example.com"],"type":"amavis"}


Re: Question about BAYES_999

2018-01-04 Thread David Jones

On 01/04/2018 11:20 AM, RW wrote:

On Thu, 4 Jan 2018 10:40:49 -0600
David Jones wrote:


On 01/04/2018 10:04 AM, RW wrote:



Are you sure that's right? It's a radically different frequency from
0.5% and 0.8%. IIWY I'd look at the 4 and check they are what you
think they are and not something like
   


My production MailScanner instance has a highly tuned MTA in front of
it so SA doesn't see as much spam.  The amavis instance is
intentionally open to more spam to collect for the nightly masscheck
processing.


That's not obviously relevant since I was referring to the
frequency of mails missing BAYES_99 within emails hitting BAYES_999.



... rules: meta test FOO has dependency 'BAYES_999' with a zero
score


If I had BAYES_99 set to a zero score, it would never show up in my
logs.


  
As I said, they are bogus warnings. I think it's a known issue.


I have BAYES_999 scored and  I get 3 such matches per spamd restart
using your grep patterns. Your 4 seem highly suspicious.

Until you manually check those 4, or retry with better grep patterns,
you don't really know what's happening.



I understand what you are saying now.  I will track down those 4 and 
report back.


--
David Jones


Re: Question about BAYES_999

2018-01-04 Thread RW
On Thu, 4 Jan 2018 10:40:49 -0600
David Jones wrote:

> On 01/04/2018 10:04 AM, RW wrote:

> > Are you sure that's right? It's a radically different frequency from
> > 0.5% and 0.8%. IIWY I'd look at the 4 and check they are what you
> > think they are and not something like
> >   
> 
> My production MailScanner instance has a highly tuned MTA in front of
> it so SA doesn't see as much spam.  The amavis instance is
> intentionally open to more spam to collect for the nightly masscheck
> processing.

That's not obviously relevant since I was referring to the
frequency of mails missing BAYES_99 within emails hitting BAYES_999.


> > ... rules: meta test FOO has dependency 'BAYES_999' with a zero
> > score 
> 
> If I had BAYES_99 set to a zero score, it would never show up in my
> logs.

 
As I said, they are bogus warnings. I think it's a known issue.

I have BAYES_999 scored and  I get 3 such matches per spamd restart
using your grep patterns. Your 4 seem highly suspicious.

Until you manually check those 4, or retry with better grep patterns,
you don't really know what's happening.







Re: Question about BAYES_999

2018-01-04 Thread David Jones

On 01/04/2018 10:58 AM, Alex wrote:

Hi,


I am seeing this problem on my MailScanner filters as well:

# grep BAYES_999 maillog-20171231 | wc -l
9172
# grep BAYES_999 maillog-20171231 | grep -v "BAYES_99 " | wc -l
4


# rpm -q amavisd-new
amavisd-new-2.11.0-4.fc25.noarch
# rpm -q perl
perl-5.24.3-389.fc25.x86_64

This is with the JSON logging enabled so my grep is a bit different.
This is also with an SVN spamassassin snapshot from about two weeks
ago. This is also with bayes stored in mysql.

# bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|wc
6508  247134 16925929
# bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep
BAYES_999|grep BAYES_99\"|wc
6508  247134 16925929



You need that last grep for BAYES_99 to be a "grep -v" and it needs some 
delimiter after the "99" to disinguish it from "999" like an equals sign 
since that is how amavis outputs it's rule hits and score.


Jan  4 06:41:59 mail02 amavis[15124]: (15124-14) Passed SPAM 
{RelayedTaggedInbound}, [203.246.167.14]:63669 [203.246.167.14] 
 -> , Queue-ID: 
C193E4A5F78C, Message-ID: 
<9d14f53b-e8f9-186d-339d-aece00029...@zeilcar.net>, mail_id: 
pDEMud2MEZKg, Hits: 55.731, size: 9691, queued_as: 9C5CD4A5F795, 1328 
ms, Tests: [BAYES_999=0.2,...


Note the "BAYES_999=0.2" above would make your grep look like this:

# bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|grep 
-v BAYES_99=|wc



Please let me know if there's anything further I can do to help.



--
David Jones


Re: Question about BAYES_999

2018-01-04 Thread Alex
Hi,

>>> I am seeing this problem on my MailScanner filters as well:
>>>
>>> # grep BAYES_999 maillog-20171231 | wc -l
>>> 9172
>>> # grep BAYES_999 maillog-20171231 | grep -v "BAYES_99 " | wc -l
>>> 4

# rpm -q amavisd-new
amavisd-new-2.11.0-4.fc25.noarch
# rpm -q perl
perl-5.24.3-389.fc25.x86_64

This is with the JSON logging enabled so my grep is a bit different.
This is also with an SVN spamassassin snapshot from about two weeks
ago. This is also with bayes stored in mysql.

# bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|wc
   6508  247134 16925929
# bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep
BAYES_999|grep BAYES_99\"|wc
   6508  247134 16925929

Please let me know if there's anything further I can do to help.


Re: Question about BAYES_999

2018-01-04 Thread David Jones

On 01/04/2018 10:04 AM, RW wrote:

On Thu, 4 Jan 2018 08:02:48 -0600
David Jones wrote:


On 01/04/2018 04:46 AM, Matus UHLAR - fantomas wrote:

On 2 Jan 2018, at 07:17, David Jones djo...@ena.com> wrote:

I haven't redefined these rules from what I can tell by searching
my local rules.  I would think that if I had done this, then
there would be consistent non-hits of BAYES_99 with BAYES_999 all
of the time.  This is only happening a small percentage of the
time.


On 02.01.18 15:39, @lbutlr wrote:

Checking my mail I see an incidence rate of this of about 0.5%,
which matches the rate you posted earlier.


amavis?
   


I am seeing this problem on my MailScanner filters as well:

# grep BAYES_999 maillog-20171231 | wc -l
9172
# grep BAYES_999 maillog-20171231 | grep -v "BAYES_99 " | wc -l
4


Are you sure that's right? It's a radically different frequency from
0.5% and 0.8%. IIWY I'd look at the 4 and check they are what you think
they are and not something like



My production MailScanner instance has a highly tuned MTA in front of it 
so SA doesn't see as much spam.  The amavis instance is intentionally 
open to more spam to collect for the nightly masscheck processing.



... rules: meta test FOO has dependency 'BAYES_999' with a zero score



If I had BAYES_99 set to a zero score, it would never show up in my logs.


I get some bogus warnings like this.

You need something to make sure it's a result line and some  boundary
checks like \bBAYES_99\b might help too.



MailScanner log output looks like this:

Dec 31 07:30:46 smtp2i.ena.net MailScanner[26137]: Message 
8902A148068E.ACC23 from 106.10.241.143 (novak5...@att.net) to k12tn.net 
is spam, SpamAssassin (not cached, score=35.679, required 4, 
autolearn=spam, BAYES_99 5.20, BAYES_999 0.20, DCC_CHECK 2.20, 
DIGEST_MULTIPLE 0.29, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU 
-0.10, DMARC_NONE 0.01, ...


Pretty sure my grep'ing above is good.



I can't reproduce the problem using spamc/spamd  from 3.4.1 with perl
5.24.3 on FreeBSD 11.1 with  Berkeley DB.

I don't have any missing BAYES_99 hits in my corpus headers and to
check it's not a recent bug I rescanned ~5k spams and checked the logs.



I appreciate you checking to help us determine how widespread this issue 
is and possibly narrow it down.


Here is my SA setup that is common to both my amavis and MailScanner:

SA 3.4.1
Shared Bayes DB in Redis
Share the same local configs for custom rules

Since we have at least one other person on this list reporting the same 
problem, I don't think my local custom rules are the problem.


--
David Jones


Re: Question about BAYES_999

2018-01-04 Thread David Jones

On 01/04/2018 04:46 AM, Matus UHLAR - fantomas wrote:

On 2 Jan 2018, at 07:17, David Jones djo...@ena.com> wrote:

I haven't redefined these rules from what I can tell by searching my
local rules.  I would think that if I had done this, then there would be
consistent non-hits of BAYES_99 with BAYES_999 all of the time.  This is
only happening a small percentage of the time.


On 02.01.18 15:39, @lbutlr wrote:
Checking my mail I see an incidence rate of this of about 0.5%, which 
matches the rate you posted earlier.


amavis?



I am seeing this problem on my MailScanner filters as well:

# grep BAYES_999 maillog-20171231 | wc -l
9172
# grep BAYES_999 maillog-20171231 | grep -v "BAYES_99 " | wc -l
4

I have a temporary fix in place with a meta rule:

metaMISSING_BAYES_99BAYES_999 && !BAYES_99
describeMISSING_BAYES_99BAYES_99 should always hit with 
BAYES_999 but sometimes it doesn't.

score   MISSING_BAYES_994.2

--
David Jones


Re: Question about BAYES_999

2018-01-04 Thread Matus UHLAR - fantomas

On 2 Jan 2018, at 07:17, David Jones djo...@ena.com> wrote:

I haven't redefined these rules from what I can tell by searching my
local rules.  I would think that if I had done this, then there would be
consistent non-hits of BAYES_99 with BAYES_999 all of the time.  This is
only happening a small percentage of the time.


On 02.01.18 15:39, @lbutlr wrote:

Checking my mail I see an incidence rate of this of about 0.5%, which matches 
the rate you posted earlier.


amavis?

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
The early bird may get the worm, but the second mouse gets the cheese. 


Re: Question about BAYES_999

2018-01-02 Thread @lbutlr
On 2 Jan 2018, at 07:17, David Jones djo...@ena.com> wrote:
> I haven't redefined these rules from what I can tell by searching my local 
> rules.  I would think that if I had done this, then there would be consistent 
> non-hits of BAYES_99 with BAYES_999 all of the time.  This is only happening 
> a small percentage of the time.

Checking my mail I see an incidence rate of this of about 0.5%, which matches 
the rate you posted earlier.

-- 
Hey, how come Andrew gets to get up? If he gets up, we'll all get up!
It'll be anarchy!



Re: Question about BAYES_999

2018-01-02 Thread David Jones

On 01/02/2018 07:57 AM, RW wrote:

On Mon, 1 Jan 2018 18:52:45 -0600
David Jones wrote:


On 01/01/2018 06:47 PM, Reindl Harald wrote:



Am 02.01.2018 um 01:18 schrieb David Jones:

I just had a spam message hit BAYES_999 but not BAYES_99.  Based
on BAYES_999 default score of 0.2, I thought that it was always
supposed to complement the BAYES_99 rule and both would trigger
when BAYES_999 hit.





I checked my logs and I am seeing both together when BAYES_999 hits
except for a few times.  Is this a bug?


It's certainly not the intended behaviour.

Firstly check that you haven't redefined either of the
BAYES_999/BAYES_99 rules and forgotten about it.



I haven't redefined these rules from what I can tell by searching my 
local rules.  I would think that if I had done this, then there would be 
consistent non-hits of BAYES_99 with BAYES_999 all of the time.  This is 
only happening a small percentage of the time.



Secondly are you using shortcircuiting at all? I'm wondering if this
might be a buggy or misconfigured shortcircuit.



I am using Shortcircuiting but these messages aren't hitting the 
SHORTCIRCUIT rule.


This amavis-new box that has this problem is my spam/ham collector for 
the SA nightly maschecking.  It shares the Bayes DB via redis with my 8 
main SA MailScanner filters.  Let me do some more digging in my primary 
filter logs to see if this is happening there and I will reply back. 
Those logs are very large so it will take some time to grep.


--
David Jones


Re: Question about BAYES_999

2018-01-02 Thread RW
On Mon, 1 Jan 2018 18:52:45 -0600
David Jones wrote:

> On 01/01/2018 06:47 PM, Reindl Harald wrote:
> > 
> > 
> > Am 02.01.2018 um 01:18 schrieb David Jones:  
> >> I just had a spam message hit BAYES_999 but not BAYES_99.  Based
> >> on BAYES_999 default score of 0.2, I thought that it was always
> >> supposed to complement the BAYES_99 rule and both would trigger
> >> when BAYES_999 hit.
> >>

> 
> I checked my logs and I am seeing both together when BAYES_999 hits 
> except for a few times.  Is this a bug?  

It's certainly not the intended behaviour.

Firstly check that you haven't redefined either of the
BAYES_999/BAYES_99 rules and forgotten about it.

Secondly are you using shortcircuiting at all? I'm wondering if this
might be a buggy or misconfigured shortcircuit.



Re: Question about BAYES_999

2018-01-01 Thread David Jones

On 01/01/2018 07:08 PM, Reindl Harald wrote:



Am 02.01.2018 um 01:59 schrieb David Jones:

On 01/01/2018 06:52 PM, David Jones wrote:

On 01/01/2018 06:47 PM, Reindl Harald wrote:


Am 02.01.2018 um 01:18 schrieb David Jones:
I just had a spam message hit BAYES_999 but not BAYES_99.  Based on 
BAYES_999 default score of 0.2, I thought that it was always 
supposed to complement the BAYES_99 rule and both would trigger 
when BAYES_999 hit.


https://pastebin.com/QsVgXwdC

If they are independent, then it would seem logical to bump up the 
default score higher than BAYES_99


never ever seen that and since bayes is based on a number between 0 
and 1 this should be technically impossible at all


with BAYES_00 that message has [score: 0.0003]



I checked my logs and I am seeing both together when BAYES_999 hits 
except for a few times.  Is this a bug?  Should I open a bug issue?  
I am not sure how to reproduce the problem unless others also see the 
same thing with that message.




Sorry.  Not thinking clearly.  Others would have to have the same 
Bayes DB to get that message to do the same thing.  I was able to 
reproduce the same results on another SA platform running MailScanner 
using the same Bayes DB in redis.


If others could check their mail logs to see if they are hitting 
BAYES_999 without BAYES_99 on the same message, please let me know


[sa-milt@mail-gw:/var/log]$ xzcat maillog-2017*.xz | grep "BAYES_999," | 
wc -l

9125

[sa-milt@mail-gw:/var/log]$ xzcat maillog-2017*.xz | grep "BAYES_999," | 
grep "BAYES_99," | wc -l

9125

[sa-milt@mail-gw:/var/log]$ xzcat maillog-2017*.xz | grep "BAYES_999," | 
grep -v "BAYES_99," | wc -l

0



Since yesterday morning:

# grep "BAYES_999=" /var/log/maillog | grep "BAYES_99=" | wc -l
8006
# grep "BAYES_999=" /var/log/maillog | wc -l
8092
# grep "BAYES_999=" /var/log/maillog | grep -v "BAYES_99=" | wc -l
86

Last week:

# grep "BAYES_999=" /var/log/maillog-20171231 | grep "BAYES_99=" | wc -l
43753
# grep "BAYES_999=" /var/log/maillog-20171231 | wc -l
44108
# grep "BAYES_999=" /var/log/maillog-20171231 | grep -v "BAYES_99=" | wc -l
355

--
David Jones


Re: Question about BAYES_999

2018-01-01 Thread David Jones

On 01/01/2018 06:52 PM, David Jones wrote:

On 01/01/2018 06:47 PM, Reindl Harald wrote:



Am 02.01.2018 um 01:18 schrieb David Jones:
I just had a spam message hit BAYES_999 but not BAYES_99.  Based on 
BAYES_999 default score of 0.2, I thought that it was always supposed 
to complement the BAYES_99 rule and both would trigger when BAYES_999 
hit.


https://pastebin.com/QsVgXwdC

If they are independent, then it would seem logical to bump up the 
default score higher than BAYES_99


never ever seen that and since bayes is based on a number between 0 
and 1 this should be technically impossible at all


with BAYES_00 that message has [score: 0.0003]



I checked my logs and I am seeing both together when BAYES_999 hits 
except for a few times.  Is this a bug?  Should I open a bug issue?  I 
am not sure how to reproduce the problem unless others also see the same 
thing with that message.




Sorry.  Not thinking clearly.  Others would have to have the same Bayes 
DB to get that message to do the same thing.  I was able to reproduce 
the same results on another SA platform running MailScanner using the 
same Bayes DB in redis.


If others could check their mail logs to see if they are hitting 
BAYES_999 without BAYES_99 on the same message, please let me know.


--
David Jones


Re: Question about BAYES_999

2018-01-01 Thread David Jones

On 01/01/2018 06:47 PM, Reindl Harald wrote:



Am 02.01.2018 um 01:18 schrieb David Jones:
I just had a spam message hit BAYES_999 but not BAYES_99.  Based on 
BAYES_999 default score of 0.2, I thought that it was always supposed 
to complement the BAYES_99 rule and both would trigger when BAYES_999 
hit.


https://pastebin.com/QsVgXwdC

If they are independent, then it would seem logical to bump up the 
default score higher than BAYES_99


never ever seen that and since bayes is based on a number between 0 and 
1 this should be technically impossible at all


with BAYES_00 that message has [score: 0.0003]



I checked my logs and I am seeing both together when BAYES_999 hits 
except for a few times.  Is this a bug?  Should I open a bug issue?  I 
am not sure how to reproduce the problem unless others also see the same 
thing with that message.


--
David Jones


Question about BAYES_999

2018-01-01 Thread David Jones
I just had a spam message hit BAYES_999 but not BAYES_99.  Based on 
BAYES_999 default score of 0.2, I thought that it was always supposed to 
complement the BAYES_99 rule and both would trigger when BAYES_999 hit.


https://pastebin.com/QsVgXwdC

If they are independent, then it would seem logical to bump up the 
default score higher than BAYES_99.


--
David Jones