Re: Malformed spam email gets through.

2018-01-01 Thread Pedro David Marco
 
> Also, can anyone suggest a nicely written rule, that triggers when an html 
> tag's text contains both upper and lower case letters?  Thanks. - Mark


Hi Mark and happy new year!

For small tags a simple rule, uggly but very cheap, may work:  
/Src|sRc|srC|.. and son on   number of letters to the power of 2... not 
usefull for long tags but cheap in terms of regex steps.

A more ellaborated regex...

The next rules are far from perfect but can detect  "something that looks like" 
mixed upper and lower case HTML tags in the pristine body. 

full       __MIXED_UPLOCASE_SRC   /(?=(?i:src))(?!src|SRC)...\s*=/tflags   
__MIXED_UPLOCASE_SRC   multiple maxhits=2
full       __MIXED_UPLOCASE_HREF  /(?=(?i:href))(?!href|HREF)\s*=/tflags   
__MIXED_UPLOCASE_HREF  multiple maxhits=2
meta        MIX_UPLOCASE_HTAGS    __MIXED_UPLOCASE_SRC >1 && 
__MIXED_UPLOCASE_HREF >1describe  MIX_UPLOCASE_HTAGS    MIX OF UPPER AND lower 
LETTERS in HTML TAGSscore       MIX_UPLOCASE_HTAGS    1

You can also check for invalid Base64 characters and and invalid Base64 line 
lenght...  if all of them match... "Hasta luego Lucas"  or as Rupert 
Gallagher says: easter eggs... :-)

hope they help you... 

-PedroD

  

Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 9:59 (-0500), David Jones wrote:

I think some mail systems will keep the same message-ID per email 
thread so your system must reject some replies.


I have not seen such behavior in the past 20 years...

Intentionally re-using another site's MIDs is so wrong that I'd happily 
make it break hard.


HOWEVER, the idea of enforcing any standard on MIDs beyond gross format 
(e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't the sole 
user is ludicrous.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 10:33 (-0500), David Jones wrote:


On 01/01/2018 09:29 AM, Bill Cole wrote:

On 1 Jan 2018, at 9:59 (-0500), David Jones wrote:

I think some mail systems will keep the same message-ID per email 
thread so your system must reject some replies.


I have not seen such behavior in the past 20 years...



Ok.  I stand corrected then.  What about bounces?  Don't they 
intentionally keep all of the same headers with an empty 
envelope-from?


Nope. A modern standard 'bounce' message is a MIME entity with a special 
type, denoted by a header somewhat like this:


Content-Type: multipart/report; report-type=delivery-status;
  boundary="blah.foo.bar-baz/example.com"

It should have a unique MID, a Date header reflecting the time of the 
bounce, a Subject header like "Undelivered Mail Returned to Sender", a 
To header with the original message's envelope sender, a From header 
clearly identifying the last MTA to hold the message and it's non-human 
nature such as 'mailer-dae...@example.com (Mail Delivery System)', and 
Received headers only reflecting the transit from that MTA to the target 
of the bounce.


One PART of a bounce is a message/rfc822 entity which has at least the 
headers of the original message and usually some or all of the body


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Question about BAYES_999

2018-01-01 Thread David Jones
I just had a spam message hit BAYES_999 but not BAYES_99.  Based on 
BAYES_999 default score of 0.2, I thought that it was always supposed to 
complement the BAYES_99 rule and both would trigger when BAYES_999 hit.


https://pastebin.com/QsVgXwdC

If they are independent, then it would seem logical to bump up the 
default score higher than BAYES_99.


--
David Jones



Re: Malformed spam email gets through.

2018-01-01 Thread Matus UHLAR - fantomas

On 01/01/2018 01:30 PM, Alan Hodgson wrote:
I've had good success junking anything with one of my domains in 
the message-id, where I know the mail isn't actually from someone 
in that domain. That's a pretty solid spam signature.


are you sure it's not your mailservers adding Message-Id to the
incoming mail?


On 01.01.18 14:01, David Jones wrote:
I too have seen spam with my own domain in the Message-ID but I 
combined it with a meta rule of !ALL_TRUSTED to be safe.  You are 
correct.  This is a good indicator of spam but each person is going 
to have to create this local rule unless someone wants to write a 
plugin that can detect this dynamically.


I've had probelms with a similar rule when I send mail directly from one of
mailservers. I've had to replace it by !ALL_TRUSTED && !NO_RELAYS
just FYI

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
A day without sunshine is like, night.


Re: Question about BAYES_999

2018-01-01 Thread David Jones

On 01/01/2018 06:52 PM, David Jones wrote:

On 01/01/2018 06:47 PM, Reindl Harald wrote:



Am 02.01.2018 um 01:18 schrieb David Jones:
I just had a spam message hit BAYES_999 but not BAYES_99.  Based on 
BAYES_999 default score of 0.2, I thought that it was always supposed 
to complement the BAYES_99 rule and both would trigger when BAYES_999 
hit.


https://pastebin.com/QsVgXwdC

If they are independent, then it would seem logical to bump up the 
default score higher than BAYES_99


never ever seen that and since bayes is based on a number between 0 
and 1 this should be technically impossible at all


with BAYES_00 that message has [score: 0.0003]



I checked my logs and I am seeing both together when BAYES_999 hits 
except for a few times.  Is this a bug?  Should I open a bug issue?  I 
am not sure how to reproduce the problem unless others also see the same 
thing with that message.




Sorry.  Not thinking clearly.  Others would have to have the same Bayes 
DB to get that message to do the same thing.  I was able to reproduce 
the same results on another SA platform running MailScanner using the 
same Bayes DB in redis.


If others could check their mail logs to see if they are hitting 
BAYES_999 without BAYES_99 on the same message, please let me know.


--
David Jones


Re: Question about BAYES_999

2018-01-01 Thread David Jones

On 01/01/2018 06:47 PM, Reindl Harald wrote:



Am 02.01.2018 um 01:18 schrieb David Jones:
I just had a spam message hit BAYES_999 but not BAYES_99.  Based on 
BAYES_999 default score of 0.2, I thought that it was always supposed 
to complement the BAYES_99 rule and both would trigger when BAYES_999 
hit.


https://pastebin.com/QsVgXwdC

If they are independent, then it would seem logical to bump up the 
default score higher than BAYES_99


never ever seen that and since bayes is based on a number between 0 and 
1 this should be technically impossible at all


with BAYES_00 that message has [score: 0.0003]



I checked my logs and I am seeing both together when BAYES_999 hits 
except for a few times.  Is this a bug?  Should I open a bug issue?  I 
am not sure how to reproduce the problem unless others also see the same 
thing with that message.


--
David Jones


Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 12:47 (-0500), Matus UHLAR - fantomas wrote:


On 1 Jan 2018, at 11:41 (-0500), Matus UHLAR - fantomas wrote:
the gross format in RFCs 822,2822 and 5322 describes message-id 
consisting

of local and domain part, thus is must contain "@".


On 01.01.18 12:17, Bill Cole wrote:
No, it does not. Re-read the cited sections. From RFC5322, the ABNF 
definition:


  msg-id  =   [CFWS] "<" id-left "@" id-right ">" [CFWS]


this is the part that says message-id must consist of local and domain
parts. It just says it implicitly, not explicitly, but:

It's not possible to construct Message-Id without the "@" while 
conforming

to any of mentioned RFCs.


True, but one could just as easily split up a UUID with '@' instead of 
'-' and comply while being as sure of uniqueness as could ever matter. 
Or put full UUIDs on both sides of the '@'. If a V1 UUID is on the 
right, it is even a host-unique identifier after a fashion.


Also note that if you demand that MIDs contain '@' with conforming 
strings on both sides, you risk losing mail that users want. This is 
a mistake I have made.


what exactly was the problem? Message-Id without the "@" or the
non-conforming parts there?


Missing '@'

Some messages lacking it were generated by antique systems that had 
proven themselves resistant to evolutionary pressures.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-01 Thread Alan Hodgson
On Mon, 2018-01-01 at 10:29 -0500, Bill Cole wrote:
> On 1 Jan 2018, at 9:59 (-0500), David Jones wrote:
> 
> > I think some mail systems will keep the same message-ID per email 
> > thread so your system must reject some replies.
> 
> I have not seen such behavior in the past 20 years...
> 
> Intentionally re-using another site's MIDs is so wrong that I'd
> happily 
> make it break hard.
> 
> HOWEVER, the idea of enforcing any standard on MIDs beyond gross
> format 
> (e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't the
> sole 
> user is ludicrous.

I've had good success junking anything with one of my domains in the
message-id, where I know the mail isn't actually from someone in that
domain. That's a pretty solid spam signature.

Lack of any message-id is also significant, but sadly there are still
some real senders sending mail with no message-id.

Re: Malformed spam email gets through.

2018-01-01 Thread David Jones

On 01/01/2018 01:30 PM, Alan Hodgson wrote:

On Mon, 2018-01-01 at 10:29 -0500, Bill Cole wrote:

On 1 Jan 2018, at 9:59 (-0500), David Jones wrote:

I think some mail systems will keep the same message-ID per email 
thread so your system must reject some replies. 



I have not seen such behavior in the past 20 years...

Intentionally re-using another site's MIDs is so wrong that I'd happily
make it break hard.

HOWEVER, the idea of enforcing any standard on MIDs beyond gross format
(e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't the sole
user is ludicrous.


I've had good success junking anything with one of my domains in the 
message-id, where I know the mail isn't actually from someone in that 
domain. That's a pretty solid spam signature.




I too have seen spam with my own domain in the Message-ID but I combined 
it with a meta rule of !ALL_TRUSTED to be safe.  You are correct.  This 
is a good indicator of spam but each person is going to have to create 
this local rule unless someone wants to write a plugin that can detect 
this dynamically.


Lack of any message-id is also significant, but sadly there are still 
some real senders sending mail with no message-id.


--
David Jones


Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 14:30 (-0500), Alan Hodgson wrote:


On Mon, 2018-01-01 at 10:29 -0500, Bill Cole wrote:

[...]

HOWEVER, the idea of enforcing any standard on MIDs beyond gross
format 
(e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't the
sole 
user is ludicrous.


I've had good success junking anything with one of my domains in the
message-id, where I know the mail isn't actually from someone in that
domain. That's a pretty solid spam signature.


Yes, I was a bit imprecise. Very specific idiosyncratic MID patterns can 
be extremely accurate spam indicators. Enforcement of RFC or common 
practice "standards" is riskier than it is worth.



Lack of any message-id is also significant, but sadly there are still
some real senders sending mail with no message-id.


Yes. It's one of the most annoying persistent sorts of mail sloppiness.

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Question about BAYES_999

2018-01-01 Thread David Jones

On 01/01/2018 07:08 PM, Reindl Harald wrote:



Am 02.01.2018 um 01:59 schrieb David Jones:

On 01/01/2018 06:52 PM, David Jones wrote:

On 01/01/2018 06:47 PM, Reindl Harald wrote:


Am 02.01.2018 um 01:18 schrieb David Jones:
I just had a spam message hit BAYES_999 but not BAYES_99.  Based on 
BAYES_999 default score of 0.2, I thought that it was always 
supposed to complement the BAYES_99 rule and both would trigger 
when BAYES_999 hit.


https://pastebin.com/QsVgXwdC

If they are independent, then it would seem logical to bump up the 
default score higher than BAYES_99


never ever seen that and since bayes is based on a number between 0 
and 1 this should be technically impossible at all


with BAYES_00 that message has [score: 0.0003]



I checked my logs and I am seeing both together when BAYES_999 hits 
except for a few times.  Is this a bug?  Should I open a bug issue?  
I am not sure how to reproduce the problem unless others also see the 
same thing with that message.




Sorry.  Not thinking clearly.  Others would have to have the same 
Bayes DB to get that message to do the same thing.  I was able to 
reproduce the same results on another SA platform running MailScanner 
using the same Bayes DB in redis.


If others could check their mail logs to see if they are hitting 
BAYES_999 without BAYES_99 on the same message, please let me know


[sa-milt@mail-gw:/var/log]$ xzcat maillog-2017*.xz | grep "BAYES_999," | 
wc -l

9125

[sa-milt@mail-gw:/var/log]$ xzcat maillog-2017*.xz | grep "BAYES_999," | 
grep "BAYES_99," | wc -l

9125

[sa-milt@mail-gw:/var/log]$ xzcat maillog-2017*.xz | grep "BAYES_999," | 
grep -v "BAYES_99," | wc -l

0



Since yesterday morning:

# grep "BAYES_999=" /var/log/maillog | grep "BAYES_99=" | wc -l
8006
# grep "BAYES_999=" /var/log/maillog | wc -l
8092
# grep "BAYES_999=" /var/log/maillog | grep -v "BAYES_99=" | wc -l
86

Last week:

# grep "BAYES_999=" /var/log/maillog-20171231 | grep "BAYES_99=" | wc -l
43753
# grep "BAYES_999=" /var/log/maillog-20171231 | wc -l
44108
# grep "BAYES_999=" /var/log/maillog-20171231 | grep -v "BAYES_99=" | wc -l
355

--
David Jones


Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 11:41 (-0500), Matus UHLAR - fantomas wrote:

the gross format in RFCs 822,2822 and 5322 describes message-id 
consisting

of local and domain part, thus is must contain "@".


No, it does not. Re-read the cited sections. From RFC5322, the ABNF 
definition:


   msg-id  =   [CFWS] "<" id-left "@" id-right ">" [CFWS]

   id-left =   dot-atom-text / obs-id-left

   id-right=   dot-atom-text / no-fold-literal / obs-id-right

   no-fold-literal =   "[" *dtext "]"

Note the lack of specification of "local" and "domain" parts.

Also note that if you demand that MIDs contain '@' with conforming 
strings on both sides, you risk losing mail that users want. This is a 
mistake I have made.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-01 Thread David Jones

On 01/01/2018 09:29 AM, Bill Cole wrote:

On 1 Jan 2018, at 9:59 (-0500), David Jones wrote:

I think some mail systems will keep the same message-ID per email 
thread so your system must reject some replies.


I have not seen such behavior in the past 20 years...



Ok.  I stand corrected then.  What about bounces?  Don't they 
intentionally keep all of the same headers with an empty envelope-from?



Intentionally re-using another site's MIDs is so wrong that I'd happily 
make it break hard.


HOWEVER, the idea of enforcing any standard on MIDs beyond gross format 
(e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't the sole 
user is ludicrous.




--
David Jones


Re: Malformed spam email gets through.

2018-01-01 Thread Benny Pedersen

David Jones skrev den 2018-01-01 15:59:


There is no way that most of us on this mailing list can be as strict
or our customers would complain constantly about missing email.


postfix add rfc message-id on mails that dont follow rfcs, so first mta 
(postfix here) hiddes mua's fault not following rfc's, i dont know other 
mta's on how thay help spammers


Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 3:54 (-0500), Rupert Gallagher wrote:

We reject anything whose mid does not include the fqdn or address 
literal of their sending server. We do this because the RFC says 
explicitly that the mid *MUST* have those features.


This is a blatant falsehood. Relevant RFCs:

https://tools.ietf.org/html/rfc5322#section-3.6.4
https://tools.ietf.org/html/rfc2822#section-3.6.4
https://tools.ietf.org/html/rfc822#section-4.6

The only "MUST" in regard to MID content in any of those is uniqueness. 
Use of a domain identifier is merely RECOMMENDED.


Beyond that, it is *IMPOSSIBLE* for a receiving system to reliably 
determine whether the right-hand part of a MID is a valid host or domain 
identifier for the generator of the MID.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-01 Thread David Jones

On 01/01/2018 09:33 AM, David Jones wrote:

On 01/01/2018 09:29 AM, Bill Cole wrote:

On 1 Jan 2018, at 9:59 (-0500), David Jones wrote:

I think some mail systems will keep the same message-ID per email 
thread so your system must reject some replies.


I have not seen such behavior in the past 20 years...



Ok.  I stand corrected then.  What about bounces?  Don't they 
intentionally keep all of the same headers with an empty envelope-from?




Answering myself.  No.  I checked a few and the Message-ID is generated 
new on bounces too.  NM  Ignore me ...  :)  I was thinking of something 
else related to email archiving that dedupes based on the Message-ID.


Happy New Year!



Intentionally re-using another site's MIDs is so wrong that I'd 
happily make it break hard.


HOWEVER, the idea of enforcing any standard on MIDs beyond gross 
format (e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't 
the sole user is ludicrous.






--
David Jones



Re: Malformed spam email gets through.

2018-01-01 Thread David Jones

On 01/01/2018 02:54 AM, Rupert Gallagher wrote:
We reject anything whose mid does not include the fqdn or address 
literal of their sending server. We do this because the RFC says 
explicitly that the mid *MUST* have those features. We write exceptions 
for those few senders who are legitimate but have lazy and 
incompetent sysadmins.


On Mon, Jan 1, 2018 at 00:15, Mark London > wrote:


Message-ID: 


Wow!  You must not have any spam problems because you don't accept much 
email -- ham or spam.  :)  I think some mail systems will keep the same 
message-ID per email thread so your system must reject some replies.


There is no way that most of us on this mailing list can be as strict or 
our customers would complain constantly about missing email.


--
David Jones


Re: Malformed spam email gets through.

2018-01-01 Thread Matus UHLAR - fantomas

On 1 Jan 2018, at 9:59 (-0500), David Jones wrote:

I think some mail systems will keep the same message-ID per email 
thread so your system must reject some replies.


On 01.01.18 10:29, Bill Cole wrote:

I have not seen such behavior in the past 20 years...

Intentionally re-using another site's MIDs is so wrong that I'd 
happily make it break hard.


HOWEVER, the idea of enforcing any standard on MIDs beyond gross 
format (e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't 
the sole user is ludicrous.


the gross format in RFCs 822,2822 and 5322 describes message-id consisting
of local and domain part, thus is must contain "@".

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Due to unexpected conditions Windows 2000 will be released
in first quarter of year 1901


Re: Malformed spam email gets through.

2018-01-01 Thread @lbutlr
On 1 Jan 2018, at 09:41, Matus UHLAR - fantomas  wrote:
> the gross format in RFCs 822,2822 and 5322 describes message-id consisting
> of local and domain part,

You are misreading the RFC.

The Message-ID itself is a *should* and there is no MUST un any of the 
description of the construction of the Message-ID, only that it MUST be 
globally unique.

5322 specifically states: "Though other algorithms will work, it is RECOMMENDED 
that the right-hand side contain some domain identifier (either of the host 
itself or otherwise) such that the generator of the message identifier can 
guarantee the uniqueness of the left-hand side within the scope of that domain."

There is no requirement to include a local and domain part in any part of a 
Message-ID.

A 256-bit would be unique to some significant fraction of the atoms in the 
universe. I'd posit that meets any reasonable definition of "must be globally 
unique."

But, in practice, the simplest way to guarantee uniqueness is to generate a 
timestamp and add it to a domain/IP/local ID.

-- 
"We take off our Republican hats and put on our American hats" -- Many 
Republicans in Sep 2008



Re: Malformed spam email gets through.

2018-01-01 Thread Rupert Gallagher
We reject anything whose mid does not include the fqdn or address literal of 
their sending server. We do this because the RFC says explicitly that the mid 
*MUST* have those features. We write exceptions for those few senders who are 
legitimate but have lazy and incompetent sysadmins.

On Mon, Jan 1, 2018 at 00:15, Mark London  wrote:

Message-ID: 

Re: Malformed spam email gets through.

2018-01-01 Thread Matus UHLAR - fantomas

On 1 Jan 2018, at 11:41 (-0500), Matus UHLAR - fantomas wrote:
the gross format in RFCs 822,2822 and 5322 describes message-id 
consisting

of local and domain part, thus is must contain "@".


On 01.01.18 12:17, Bill Cole wrote:
No, it does not. Re-read the cited sections. From RFC5322, the ABNF 
definition:


  msg-id  =   [CFWS] "<" id-left "@" id-right ">" [CFWS]


this is the part that says message-id must consist of local and domain
parts. It just says it implicitly, not explicitly, but:

It's not possible to construct Message-Id without the "@" while conforming
to any of mentioned RFCs.

Also note that if you demand that MIDs contain '@' with conforming 
strings on both sides, you risk losing mail that users want. This is 
a mistake I have made.


what exactly was the problem? Message-Id without the "@" or the
non-conforming parts there?

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Your mouse has moved. Windows NT will now restart for changes to take
to take effect. [OK]