Re: Malformed spam email gets through.

2018-01-04 Thread Bill Cole

On 4 Jan 2018, at 21:13 (-0500), @lbutlr wrote:

On 4 Jan 2018, at 11:47, Bill Cole 
 wrote:

On 3 Jan 2018, at 15:42, @lbutlr wrote:
There is no requirement that the right side be globally unique, just 
that the entire message ID is globally unique.


Right. And any software that can use localhost (or any other 
unqualified name whose meaning is contextually variable) as the right 
hand side is likely to be doing so on multiple machines that don't 
know about each other and so generally cannot know that they are not 
generating duplicate MIDs.


Sure, but depending on how the MID is generated it can certainly be 
statistically unique. As I said earlier, it only takes 256 bits to get 
an ID within spitting distance of the number of atoms in the universe. 
Should be unique enough.


Not even that. A standard UUID has 122 bits of entropy. To *probably* 
have *one* collision in that space, you'd need to generate 1 billion 
UUIDs per second for about 85 years. That should be good enough for 
naming unique things including email messages for as long as any one 
person cares, but if you want it more solid you could put a  UUID of one 
of the node-specific versions on one side of the @ and a random UUID on 
the other: 244 bits, won't collide in any space-time region visible to 
one observer.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-04 Thread @lbutlr
On 4 Jan 2018, at 11:47, Bill Cole  
wrote:
> On 3 Jan 2018, at 15:42, @lbutlr wrote:
>> There is no requirement that the right side be globally unique, just that 
>> the entire message ID is globally unique.
> 
> Right. And any software that can use localhost (or any other unqualified name 
> whose meaning is contextually variable) as the right hand side is likely to 
> be doing so on multiple machines that don't know about each other and so 
> generally cannot know that they are not generating duplicate MIDs.

Sure, but depending on how the MID is generated it can certainly be 
statistically unique. As I said earlier, it only takes 256 bits to get an ID 
within spitting distance of the number of atoms in the universe. Should be 
unique enough.

> The reason for the RHS=FQDN tradition is to establish a namespace for each 
> domain whereby global uniqueness can be guaranteed deterministically.

OH, I absolutely agree that using the domain for the RHS is a great idea, and 
there's really no reason not to. But there are other ways.

>>> An additional ~1% has a MID header with either no dots or no '@'.
>> 
>> Dots are irrelevant, but the way I read the RFC, ‘@‘ is required.
> 
> See the message I was responding too, which asked about the feasibility of 
> enforcing a "valid domain" rule. For that, dots are absolutely relevant. My 
> point, in short, is that doing so may result in 2 orders of magnitude more 
> rejection of wanted mail than most sites would deem tolerable.

Yep. Requiring MIDs to conform to out-of-spec requirements is sure to cause 
trouble.


-- 
You and me Sunday driving Not arriving



Re: SURBL false positives ratio

2018-01-04 Thread David Jones

On 01/04/2018 02:12 PM, Pedro David Marco wrote:


Out of curiosity...  how is SUBRL in terms of false positives?? is it a 
worthy IOC DDBB??



Thanks.

---
PedroD


My mail filtering volume is high enough that I would have to pay for a 
feed subscription.  I tried out a trial feed about a year ago for a few 
days and it was horrible for my US-based mail flow.  I saw way too many 
FPs against solid RBLs.  I don't even think I would want to use it even 
if it was free for my mail flow.


Disclaimer: Each mail flow will see much different spam so it may work 
for others.


I get good value out of the Invaluement RBL.  Combine it with Spamhaus 
ZEN and that will block the majority of junk.


--
David Jones


SURBL false positives ratio

2018-01-04 Thread Pedro David Marco

Out of curiosity...  how is SUBRL in terms of false positives?? is it a worthy 
IOC DDBB??

Thanks.
---PedroD

Re: Malformed spam email gets through.

2018-01-04 Thread Bill Cole

On 3 Jan 2018, at 15:42, @lbutlr wrote:
[...]


On 03 Jan 2018, at 12:36, Bill Cole 
 wrote:
About 1.5% of my personal non-spam email over the past 20 years has 
had "localhost" as the right hand side of the MID. This implies a de 
facto RFC violation because it poses a real risk of duplication.


There is no requirement that the right side be globally unique, just 
that the entire message ID is globally unique.


Right. And any software that can use localhost (or any other unqualified 
name whose meaning is contextually variable) as the right hand side is 
likely to be doing so on multiple machines that don't know about each 
other and so generally cannot know that they are not generating 
duplicate MIDs. The reason for the RHS=FQDN tradition is to establish a 
namespace for each domain whereby global uniqueness can be guaranteed 
deterministically.



An additional ~1% has a MID header with either no dots or no '@'.


Dots are irrelevant, but the way I read the RFC, ‘@‘ is required.


See the message I was responding too, which asked about the feasibility 
of enforcing a "valid domain" rule. For that, dots are absolutely 
relevant. My point, in short, is that doing so may result in 2 orders of 
magnitude more rejection of wanted mail than most sites would deem 
tolerable.


Re: Question about BAYES_999

2018-01-04 Thread Alex
>> # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|wc
>> 6508  247134 16925929
>> # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep
>> BAYES_999|grep BAYES_99\"|wc
>> 6508  247134 16925929
>>
>
> You need that last grep for BAYES_99 to be a "grep -v" and it needs some
> delimiter after the "99" to disinguish it from "999" like an equals sign
> since that is how amavis outputs it's rule hits and score.
>
> Jan  4 06:41:59 mail02 amavis[15124]: (15124-14) Passed SPAM
> {RelayedTaggedInbound}, [203.246.167.14]:63669 [203.246.167.14]
>  -> , Queue-ID: C193E4A5F78C,
> Message-ID: <9d14f53b-e8f9-186d-339d-aece00029...@zeilcar.net>, mail_id:
> pDEMud2MEZKg, Hits: 55.731, size: 9691, queued_as: 9C5CD4A5F795, 1328 ms,
> Tests: [BAYES_999=0.2,...
>
> Note the "BAYES_999=0.2" above would make your grep look like this:
>
> # bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|grep -v
> BAYES_99=|wc

Ugh, yes, sorry. This was the result of pasting the wrong line while
experimenting. My separator is a quote. This is actually more precise
now, as the logging separates the rules into tests, tests_ham and
tests_spam:

# cat /var/log/maillog|grep timestamp|grep BAYES_99|perl -p -e
's|.*tests\":\[(.*)\],\"tests_ham.*|$1|'|grep BAYES_999\"|grep -v
BAYES_99\"

results with nothing printed.

...
ing in your area","subject_rot13":"EBPXL, n ybfg QBT, vf zvffvat va
lbhe 
nern","tests":["BAYES_99","BAYES_999","DCC_CHECK","DKIM_SIGNED","DKIM_VALID","DKIM_VALID_AU","HTML_IMAGE_RATIO_04","HTML_MESSAGE","MIME_HTML_ONLY","RCVD_IN_DNSWL_NONE","RCVD_IN_HOSTKARMA_W","RCVD_IN_SENDERSCORE_90_100","RELAYCOUNTRY_US","SPF_HELO_PASS","SPF_PASS","TXREP","T_DMARC_TESTS_PASS","T_REMOTE_IMAGE","T_RP_MATCHES_RCVD"],"tests_ham":["RCVD_IN_HOSTKARMA_W","RCVD_IN_SENDERSCORE_90_100","DKIM_VALID_AU","DKIM_VALID","T_RP_MATCHES_RCVD","TXREP","SPF_HELO_PASS","SPF_PASS","RCVD_IN_DNSWL_NONE"],"tests_spam":["BAYES_99","MIME_HTML_ONLY","HTML_IMAGE_RATIO_04","DCC_CHECK","BAYES_999","DKIM_SIGNED","T_DMARC_TESTS_PASS","RELAYCOUNTRY_US","T_REMOTE_IMAGE","HTML_MESSAGE"],"time_iso_week_date":"2018-W01-4","time_unix":1515042197.974,"to_addr":["mor...@example.com"],"type":"amavis"}


Re: Question about BAYES_999

2018-01-04 Thread David Jones

On 01/04/2018 11:20 AM, RW wrote:

On Thu, 4 Jan 2018 10:40:49 -0600
David Jones wrote:


On 01/04/2018 10:04 AM, RW wrote:



Are you sure that's right? It's a radically different frequency from
0.5% and 0.8%. IIWY I'd look at the 4 and check they are what you
think they are and not something like
   


My production MailScanner instance has a highly tuned MTA in front of
it so SA doesn't see as much spam.  The amavis instance is
intentionally open to more spam to collect for the nightly masscheck
processing.


That's not obviously relevant since I was referring to the
frequency of mails missing BAYES_99 within emails hitting BAYES_999.



... rules: meta test FOO has dependency 'BAYES_999' with a zero
score


If I had BAYES_99 set to a zero score, it would never show up in my
logs.


  
As I said, they are bogus warnings. I think it's a known issue.


I have BAYES_999 scored and  I get 3 such matches per spamd restart
using your grep patterns. Your 4 seem highly suspicious.

Until you manually check those 4, or retry with better grep patterns,
you don't really know what's happening.



I understand what you are saying now.  I will track down those 4 and 
report back.


--
David Jones


Re: Question about BAYES_999

2018-01-04 Thread RW
On Thu, 4 Jan 2018 10:40:49 -0600
David Jones wrote:

> On 01/04/2018 10:04 AM, RW wrote:

> > Are you sure that's right? It's a radically different frequency from
> > 0.5% and 0.8%. IIWY I'd look at the 4 and check they are what you
> > think they are and not something like
> >   
> 
> My production MailScanner instance has a highly tuned MTA in front of
> it so SA doesn't see as much spam.  The amavis instance is
> intentionally open to more spam to collect for the nightly masscheck
> processing.

That's not obviously relevant since I was referring to the
frequency of mails missing BAYES_99 within emails hitting BAYES_999.


> > ... rules: meta test FOO has dependency 'BAYES_999' with a zero
> > score 
> 
> If I had BAYES_99 set to a zero score, it would never show up in my
> logs.

 
As I said, they are bogus warnings. I think it's a known issue.

I have BAYES_999 scored and  I get 3 such matches per spamd restart
using your grep patterns. Your 4 seem highly suspicious.

Until you manually check those 4, or retry with better grep patterns,
you don't really know what's happening.







Re: Question about BAYES_999

2018-01-04 Thread David Jones

On 01/04/2018 10:58 AM, Alex wrote:

Hi,


I am seeing this problem on my MailScanner filters as well:

# grep BAYES_999 maillog-20171231 | wc -l
9172
# grep BAYES_999 maillog-20171231 | grep -v "BAYES_99 " | wc -l
4


# rpm -q amavisd-new
amavisd-new-2.11.0-4.fc25.noarch
# rpm -q perl
perl-5.24.3-389.fc25.x86_64

This is with the JSON logging enabled so my grep is a bit different.
This is also with an SVN spamassassin snapshot from about two weeks
ago. This is also with bayes stored in mysql.

# bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|wc
6508  247134 16925929
# bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep
BAYES_999|grep BAYES_99\"|wc
6508  247134 16925929



You need that last grep for BAYES_99 to be a "grep -v" and it needs some 
delimiter after the "99" to disinguish it from "999" like an equals sign 
since that is how amavis outputs it's rule hits and score.


Jan  4 06:41:59 mail02 amavis[15124]: (15124-14) Passed SPAM 
{RelayedTaggedInbound}, [203.246.167.14]:63669 [203.246.167.14] 
 -> , Queue-ID: 
C193E4A5F78C, Message-ID: 
<9d14f53b-e8f9-186d-339d-aece00029...@zeilcar.net>, mail_id: 
pDEMud2MEZKg, Hits: 55.731, size: 9691, queued_as: 9C5CD4A5F795, 1328 
ms, Tests: [BAYES_999=0.2,...


Note the "BAYES_999=0.2" above would make your grep look like this:

# bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|grep 
-v BAYES_99=|wc



Please let me know if there's anything further I can do to help.



--
David Jones


Re: Question about BAYES_999

2018-01-04 Thread Alex
Hi,

>>> I am seeing this problem on my MailScanner filters as well:
>>>
>>> # grep BAYES_999 maillog-20171231 | wc -l
>>> 9172
>>> # grep BAYES_999 maillog-20171231 | grep -v "BAYES_99 " | wc -l
>>> 4

# rpm -q amavisd-new
amavisd-new-2.11.0-4.fc25.noarch
# rpm -q perl
perl-5.24.3-389.fc25.x86_64

This is with the JSON logging enabled so my grep is a bit different.
This is also with an SVN spamassassin snapshot from about two weeks
ago. This is also with bayes stored in mysql.

# bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep BAYES_999|wc
   6508  247134 16925929
# bzcat /var/log/maillog-201801??.bz2|grep timestamp|grep
BAYES_999|grep BAYES_99\"|wc
   6508  247134 16925929

Please let me know if there's anything further I can do to help.


Re: Question about BAYES_999

2018-01-04 Thread David Jones

On 01/04/2018 10:04 AM, RW wrote:

On Thu, 4 Jan 2018 08:02:48 -0600
David Jones wrote:


On 01/04/2018 04:46 AM, Matus UHLAR - fantomas wrote:

On 2 Jan 2018, at 07:17, David Jones djo...@ena.com> wrote:

I haven't redefined these rules from what I can tell by searching
my local rules.  I would think that if I had done this, then
there would be consistent non-hits of BAYES_99 with BAYES_999 all
of the time.  This is only happening a small percentage of the
time.


On 02.01.18 15:39, @lbutlr wrote:

Checking my mail I see an incidence rate of this of about 0.5%,
which matches the rate you posted earlier.


amavis?
   


I am seeing this problem on my MailScanner filters as well:

# grep BAYES_999 maillog-20171231 | wc -l
9172
# grep BAYES_999 maillog-20171231 | grep -v "BAYES_99 " | wc -l
4


Are you sure that's right? It's a radically different frequency from
0.5% and 0.8%. IIWY I'd look at the 4 and check they are what you think
they are and not something like



My production MailScanner instance has a highly tuned MTA in front of it 
so SA doesn't see as much spam.  The amavis instance is intentionally 
open to more spam to collect for the nightly masscheck processing.



... rules: meta test FOO has dependency 'BAYES_999' with a zero score



If I had BAYES_99 set to a zero score, it would never show up in my logs.


I get some bogus warnings like this.

You need something to make sure it's a result line and some  boundary
checks like \bBAYES_99\b might help too.



MailScanner log output looks like this:

Dec 31 07:30:46 smtp2i.ena.net MailScanner[26137]: Message 
8902A148068E.ACC23 from 106.10.241.143 (novak5...@att.net) to k12tn.net 
is spam, SpamAssassin (not cached, score=35.679, required 4, 
autolearn=spam, BAYES_99 5.20, BAYES_999 0.20, DCC_CHECK 2.20, 
DIGEST_MULTIPLE 0.29, DKIM_SIGNED 0.10, DKIM_VALID -0.10, DKIM_VALID_AU 
-0.10, DMARC_NONE 0.01, ...


Pretty sure my grep'ing above is good.



I can't reproduce the problem using spamc/spamd  from 3.4.1 with perl
5.24.3 on FreeBSD 11.1 with  Berkeley DB.

I don't have any missing BAYES_99 hits in my corpus headers and to
check it's not a recent bug I rescanned ~5k spams and checked the logs.



I appreciate you checking to help us determine how widespread this issue 
is and possibly narrow it down.


Here is my SA setup that is common to both my amavis and MailScanner:

SA 3.4.1
Shared Bayes DB in Redis
Share the same local configs for custom rules

Since we have at least one other person on this list reporting the same 
problem, I don't think my local custom rules are the problem.


--
David Jones


Re: Question about BAYES_999

2018-01-04 Thread David Jones

On 01/04/2018 04:46 AM, Matus UHLAR - fantomas wrote:

On 2 Jan 2018, at 07:17, David Jones djo...@ena.com> wrote:

I haven't redefined these rules from what I can tell by searching my
local rules.  I would think that if I had done this, then there would be
consistent non-hits of BAYES_99 with BAYES_999 all of the time.  This is
only happening a small percentage of the time.


On 02.01.18 15:39, @lbutlr wrote:
Checking my mail I see an incidence rate of this of about 0.5%, which 
matches the rate you posted earlier.


amavis?



I am seeing this problem on my MailScanner filters as well:

# grep BAYES_999 maillog-20171231 | wc -l
9172
# grep BAYES_999 maillog-20171231 | grep -v "BAYES_99 " | wc -l
4

I have a temporary fix in place with a meta rule:

metaMISSING_BAYES_99BAYES_999 && !BAYES_99
describeMISSING_BAYES_99BAYES_99 should always hit with 
BAYES_999 but sometimes it doesn't.

score   MISSING_BAYES_994.2

--
David Jones


Re: Question about BAYES_999

2018-01-04 Thread Matus UHLAR - fantomas

On 2 Jan 2018, at 07:17, David Jones djo...@ena.com> wrote:

I haven't redefined these rules from what I can tell by searching my
local rules.  I would think that if I had done this, then there would be
consistent non-hits of BAYES_99 with BAYES_999 all of the time.  This is
only happening a small percentage of the time.


On 02.01.18 15:39, @lbutlr wrote:

Checking my mail I see an incidence rate of this of about 0.5%, which matches 
the rate you posted earlier.


amavis?

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
The early bird may get the worm, but the second mouse gets the cheese.