Re: Malformed spam email gets through.

2018-01-04 Thread Bill Cole

On 4 Jan 2018, at 21:13 (-0500), @lbutlr wrote:

On 4 Jan 2018, at 11:47, Bill Cole 
 wrote:

On 3 Jan 2018, at 15:42, @lbutlr wrote:
There is no requirement that the right side be globally unique, just 
that the entire message ID is globally unique.


Right. And any software that can use localhost (or any other 
unqualified name whose meaning is contextually variable) as the right 
hand side is likely to be doing so on multiple machines that don't 
know about each other and so generally cannot know that they are not 
generating duplicate MIDs.


Sure, but depending on how the MID is generated it can certainly be 
statistically unique. As I said earlier, it only takes 256 bits to get 
an ID within spitting distance of the number of atoms in the universe. 
Should be unique enough.


Not even that. A standard UUID has 122 bits of entropy. To *probably* 
have *one* collision in that space, you'd need to generate 1 billion 
UUIDs per second for about 85 years. That should be good enough for 
naming unique things including email messages for as long as any one 
person cares, but if you want it more solid you could put a  UUID of one 
of the node-specific versions on one side of the @ and a random UUID on 
the other: 244 bits, won't collide in any space-time region visible to 
one observer.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-04 Thread @lbutlr
On 4 Jan 2018, at 11:47, Bill Cole  
wrote:
> On 3 Jan 2018, at 15:42, @lbutlr wrote:
>> There is no requirement that the right side be globally unique, just that 
>> the entire message ID is globally unique.
> 
> Right. And any software that can use localhost (or any other unqualified name 
> whose meaning is contextually variable) as the right hand side is likely to 
> be doing so on multiple machines that don't know about each other and so 
> generally cannot know that they are not generating duplicate MIDs.

Sure, but depending on how the MID is generated it can certainly be 
statistically unique. As I said earlier, it only takes 256 bits to get an ID 
within spitting distance of the number of atoms in the universe. Should be 
unique enough.

> The reason for the RHS=FQDN tradition is to establish a namespace for each 
> domain whereby global uniqueness can be guaranteed deterministically.

OH, I absolutely agree that using the domain for the RHS is a great idea, and 
there's really no reason not to. But there are other ways.

>>> An additional ~1% has a MID header with either no dots or no '@'.
>> 
>> Dots are irrelevant, but the way I read the RFC, ‘@‘ is required.
> 
> See the message I was responding too, which asked about the feasibility of 
> enforcing a "valid domain" rule. For that, dots are absolutely relevant. My 
> point, in short, is that doing so may result in 2 orders of magnitude more 
> rejection of wanted mail than most sites would deem tolerable.

Yep. Requiring MIDs to conform to out-of-spec requirements is sure to cause 
trouble.


-- 
You and me Sunday driving Not arriving



Re: Malformed spam email gets through.

2018-01-04 Thread Bill Cole

On 3 Jan 2018, at 15:42, @lbutlr wrote:
[...]


On 03 Jan 2018, at 12:36, Bill Cole 
 wrote:
About 1.5% of my personal non-spam email over the past 20 years has 
had "localhost" as the right hand side of the MID. This implies a de 
facto RFC violation because it poses a real risk of duplication.


There is no requirement that the right side be globally unique, just 
that the entire message ID is globally unique.


Right. And any software that can use localhost (or any other unqualified 
name whose meaning is contextually variable) as the right hand side is 
likely to be doing so on multiple machines that don't know about each 
other and so generally cannot know that they are not generating 
duplicate MIDs. The reason for the RHS=FQDN tradition is to establish a 
namespace for each domain whereby global uniqueness can be guaranteed 
deterministically.



An additional ~1% has a MID header with either no dots or no '@'.


Dots are irrelevant, but the way I read the RFC, ‘@‘ is required.


See the message I was responding too, which asked about the feasibility 
of enforcing a "valid domain" rule. For that, dots are absolutely 
relevant. My point, in short, is that doing so may result in 2 orders of 
magnitude more rejection of wanted mail than most sites would deem 
tolerable.


Re: Malformed spam email gets through.

2018-01-03 Thread @lbutlr
On 03 Jan 2018, at 04:57, Matus UHLAR - fantomas  wrote:
> while it's "only" recommended that the right part is a domain name, but
> there must be right part.

Yes, there must be a left and a right and an ‘@‘ in-between.

On 03 Jan 2018, at 12:36, Bill Cole  
wrote:
> About 1.5% of my personal non-spam email over the past 20 years has had 
> "localhost" as the right hand side of the MID. This implies a de facto RFC 
> violation because it poses a real risk of duplication.

There is no requirement that the right side be globally unique, just that the 
entire message ID is globally unique.

> An additional ~1% has a MID header with either no dots or no '@'.

Dots are irrelevant, but the way I read the RFC, ‘@‘ is required.

-- 
No Sigs. Blame Apple.



Re: Malformed spam email gets through.

2018-01-03 Thread Ian Zimmerman
On 2018-01-03 14:36, Bill Cole wrote:

> I have run an environment where each MTA node in the external gateway
> layer would add a MID with its own FQDN to any message passing through
> missing a MID. Those names could not be resolved in the world at
> large, but they were absolutely valid and guaranteed unique.

This is what I do with my personal outgoing messages.  Free 3rd level
DNs are available at freedns.org and I use a bogus (from the DNS POV)
4th level name under one of those, distinct for each host, as the RHS in
my Message-ID.  There's no good reason to use "localhost" or
"localdomain".

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet, fetch the TXT record for the domain.


Re: Malformed spam email gets through.

2018-01-03 Thread Bill Cole

On 2 Jan 2018, at 20:39, Alex wrote:

Is it possible to at least enforce that the message-ID has a valid 
domain?


Not reliably.

About 1.5% of my personal non-spam email over the past 20 years has had 
"localhost" as the right hand side of the MID. This implies a de facto 
RFC violation because it poses a real risk of duplication.


An additional ~1% has a MID header with either no dots or no '@'. This 
includes mail from Facebook, Seagate, Apple, one of my credit unions, a 
medical supply house that we buy from for my son's care, GMX (German 
freemail provider), multiple regulars on a private mailing list of 
old-timer anti-spam nutcases, the postmaster of LinkedIn sending 
personal mail with his linkedin.com address via GMail, iFixit, Verizon's 
SMS->Email gateway, and multiple ESPs including Eloqua and Digital 
River. At least one recent version of CommuniGate Pro (6.1.2) generated 
event invitations with a bare UUID as the MID.


In other words: a significant number of messages, largely legitimate 
transactional messages, lack a FQDN in the MID.


I have run an environment where each MTA node in the external gateway 
layer would add a MID with its own FQDN to any message passing through 
missing a MID. Those names could not be resolved in the world at large, 
but they were absolutely valid and guaranteed unique.


Re: Malformed spam email gets through.

2018-01-03 Thread Matus UHLAR - fantomas

On 1 Jan 2018, at 10:47, Matus UHLAR - fantomas uh...@fantomas.sk> wrote:



On 1 Jan 2018, at 11:41 (-0500), Matus UHLAR - fantomas wrote:

the gross format in RFCs 822,2822 and 5322 describes message-id consisting
of local and domain part, thus is must contain "@".


On 01.01.18 12:17, Bill Cole wrote:

No, it does not. Re-read the cited sections. From RFC5322, the ABNF definition:

 msg-id  =   [CFWS] "<" id-left "@" id-right ">" [CFWS]


this is the part that says message-id must consist of local and domain
parts.


On 02.01.18 13:44, @lbutlr wrote:

No, it doesn't say anything like that.


ok, let's rephrase that: it says that the message-id consists of two parts
and the "@" between them.


As I already posted:


5322 specifically states: "Though other algorithms will work, it is
RECOMMENDED that the right-hand side contain some domain identifier
(either of the host itself or otherwise) such that the generator of the
message identifier can guarantee the uniqueness of the left-hand side
within the scope of that domain."

There is no requirement to include a local and domain part in any part of a 
Message-ID.


while it's "only" recommended that the right part is a domain name, but
there must be right part.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
You have the right to remain silent. Anything you say will be misquoted,
then used against you. 


Re: Malformed spam email gets through.

2018-01-03 Thread Antony Stone
On Wednesday 03 January 2018 at 02:39:54, Alex wrote:

> Hi,
> 
> Is it possible to at least enforce that the message-ID has a valid domain?

If by "enforce" you mean "require" (in other words, you look at whatever 
message-ID the incoming email has, and you decide that if it doesn't contain a 
valid domain, then it is suspicious), then yes, you can.

However, this requirement is not stipulated by current RFCs, therefore you may 
well be falsely marking legitimate email.

Only a check of the incoming mail you receive, to see whether "message ID 
contains no valid domain" is a reliable indicator of spam, can tell you 
whether it's a good idea to do this on your mail filtering.

The example quoted below is entirely RFC-conformant.


Antony.,

> Received: from thomas-krueger.local
> (221.208.196.104.bc.googleusercontent.com. [104.196.208.221])
> by smtp-relay.gmail.com with ESMTPS id
> r16sm1186220uai.7.2017.12.28.18.04.13
> for 
> (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
> Thu, 28 Dec 2017 18:04:14 -0800 (PST)
> X-Relaying-Domain: janda02.com
> Message-ID: <5b974eb73ed9c2d1b630f4b600191771@zfimvuyb.gwbba>
> From: "Apple Store" 
> To: 
> 
> On Tue, Jan 2, 2018 at 5:41 PM, @lbutlr  wrote:
> > On 2 Jan 2018, at 04:26, Rupert Gallagher r...@protonmail.com> wrote:
> >> Note taken. We still abide to the duties and recommendations, and expect
> >> well-behaved servers do the same, by identifying themselves. We
> >> cross-check, and if they lie, we block them.
> > 
> > rejecting because they spoof a domain in the MID is one thing. Rejecting
> > an email because you misunderstood the RFC and don't see a valid domain
> > name is an entirely different thing.

-- 
"I estimate there's a world market for about five computers."

 - Thomas J Watson, Chairman of IBM

   Please reply to the list;
 please *don't* CC me.


Re: Malformed spam email gets through.

2018-01-02 Thread Alex
Hi,

Is it possible to at least enforce that the message-ID has a valid domain?

Received: from thomas-krueger.local
(221.208.196.104.bc.googleusercontent.com. [104.196.208.221])
by smtp-relay.gmail.com with ESMTPS id
r16sm1186220uai.7.2017.12.28.18.04.13
for 
(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
Thu, 28 Dec 2017 18:04:14 -0800 (PST)
X-Relaying-Domain: janda02.com
Message-ID: <5b974eb73ed9c2d1b630f4b600191771@zfimvuyb.gwbba>
From: "Apple Store" 
To: 



On Tue, Jan 2, 2018 at 5:41 PM, @lbutlr  wrote:
> On 2 Jan 2018, at 04:26, Rupert Gallagher r...@protonmail.com> wrote:
>> Note taken. We still abide to the duties and recommendations, and expect 
>> well-behaved servers do the same, by identifying themselves. We cross-check, 
>> and if they lie, we block them.
>
> rejecting because they spoof a domain in the MID is one thing. Rejecting an 
> email because you misunderstood the RFC and don't see a valid domain name is 
> an entirely different thing.
>
>
> --
> And, while it was regarded as pretty good evidence of criminality to be
> living in a slum, for some reason owning a whole street of them merely
> got you invited to the very best social occasions.
>


Re: Malformed spam email gets through.

2018-01-02 Thread @lbutlr
On 2 Jan 2018, at 04:26, Rupert Gallagher r...@protonmail.com> wrote:
> Note taken. We still abide to the duties and recommendations, and expect 
> well-behaved servers do the same, by identifying themselves. We cross-check, 
> and if they lie, we block them. 

rejecting because they spoof a domain in the MID is one thing. Rejecting an 
email because you misunderstood the RFC and don't see a valid domain name is an 
entirely different thing.


-- 
And, while it was regarded as pretty good evidence of criminality to be
living in a slum, for some reason owning a whole street of them merely
got you invited to the very best social occasions.



Re: Malformed spam email gets through.

2018-01-02 Thread @lbutlr
On 2 Jan 2018, at 03:12, Rupert Gallagher  wrote:
> RFC 822, pg. 30, section 6.2.3

Which is "Obsoleted by: 2822" which is "Obsoleted by: 5322"

So, please find the description in RFC 5322. Helpfully, I've posted it twice in 
this thread.

-- 
You know, Calculus is sort of like measles. Once you've had it, you
probably won't get it again, and you're glad of it. -- W. Carr



Re: Malformed spam email gets through.

2018-01-02 Thread @lbutlr
On 1 Jan 2018, at 10:47, Matus UHLAR - fantomas uh...@fantomas.sk> wrote:
> 
>> On 1 Jan 2018, at 11:41 (-0500), Matus UHLAR - fantomas wrote:
>>> the gross format in RFCs 822,2822 and 5322 describes message-id consisting
>>> of local and domain part, thus is must contain "@".
> 
> On 01.01.18 12:17, Bill Cole wrote:
>> No, it does not. Re-read the cited sections. From RFC5322, the ABNF 
>> definition:
>> 
>>  msg-id  =   [CFWS] "<" id-left "@" id-right ">" [CFWS]
> 
> this is the part that says message-id must consist of local and domain
> parts.

No, it doesn't say anything like that.

As I already posted:

> 5322 specifically states: "Though other algorithms will work, it is 
> RECOMMENDED that the right-hand side contain some domain identifier (either 
> of the host itself or otherwise) such that the generator of the message 
> identifier can guarantee the uniqueness of the left-hand side within the 
> scope of that domain."
> 
> There is no requirement to include a local and domain part in any part of a 
> Message-ID.

-- 
'How come you know all that stuff?' 'I ain't just a pretty face.' 'You
aren't even a pretty face, Gaspode.'



Re: Malformed spam email gets through.

2018-01-02 Thread Bill Cole

On 2 Jan 2018, at 5:12 (-0500), Rupert Gallagher wrote:


This is the normative reference.


This is the OBSOLETED normative reference.


RFC 822, pg. 30, section 6.2.3
--
msg-id = "<" addr-spec ">";
addr-spec = local-part "@" domain;
domain = sub-domain *("." sub-domain);
sub-domain = domain-ref / domain-literal;

>


Note that the "@" must also be present as part of the 
well-formed-formula.

When absent, the string is not well formed, and a syntax error occurs.


The change of formal syntax in RFC2822 to remove the reference to domain 
entities was not inadvertent or surreptitious. RFC5322 didn't reverse 
that change.




RFC 5322, pg. 27, section 3.6.4
---

<<  The message identifier (msg-id) itself MUST be a globally unique
   identifier for a message.  The generator of the message identifier
   MUST guarantee that the msg-id is unique.  There are several
   algorithms that can be used to accomplish this.  Since the msg-id 
has
   a similar syntax to addr-spec (identical except that quoted 
strings,
   comments, and folding white space are not allowed), a good method 
is

   to put the domain name (or a domain literal IP address) of the host
   on which the message identifier was created on the right-hand side 
of

   the "@" (since domain names and IP addresses are normally unique),
   and put a combination of the current absolute date and time along
   with some other currently unique (perhaps sequential) identifier
   available on the system (for example, a process id number) on the
   left-hand side.  Though other algorithms will work, it is 
RECOMMENDED

   that the right-hand side contain some domain identifier (either of
   the host itself or otherwise) such that the generator of the 
message
   identifier can guarantee the uniqueness of the left-hand side 
within

   the scope of that domain. >>


Note the use of RFC2119 terms. MUST and RECOMMENDED mean different 
things.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-02 Thread Rupert Gallagher
Note taken. We still abide to the duties and recommendations, and expect 
well-behaved servers do the same, by identifying themselves. We cross-check, 
and if they lie, we block them.

Spammers and criminals play hide and seek, and we have both legal and contract 
obbligations to reject them by all means possible.

Sent from ProtonMail Mobile

On Tue, Jan 2, 2018 at 12:12, Antony Stone 
 wrote:

> On Tuesday 02 January 2018 at 11:12:57, Rupert Gallagher wrote: > This is the 
> normative reference. I've picked out the significant parts from your email... 
> > RFC 5322, pg. 27, section 3.6.4 > 
> --- > > << The 
> message identifier (msg-id) itself MUST be a globally unique > identifier for 
> a message. > a good method Note: not the required method, not the only 
> method, just "a good method". > is to put the domain name (or a domain 
> literal IP address) of the host > on which the message identifier was created 
> on the right-hand side of > the "@" (since domain names and IP addresses are 
> normally unique) > Though other algorithms will work, it is RECOMMENDED Note, 
> recommended, not required. > that the right-hand side contain some domain 
> identifier (either of > the host itself or otherwise) Antony. -- Most people 
> are aware that the Universe is big. - Paul Davies, Professor of Theoretical 
> Physics Please reply to the list; please *don't* CC me.

Re: Malformed spam email gets through.

2018-01-02 Thread Antony Stone
On Tuesday 02 January 2018 at 11:12:57, Rupert Gallagher wrote:

> This is the normative reference.

I've picked out the significant parts from your email...

> RFC 5322, pg. 27, section 3.6.4
> ---
> 
> <<  The message identifier (msg-id) itself MUST be a globally unique
>identifier for a message.

>a good method

Note: not the required method, not the only method, just "a good method".

>is to put the domain name (or a domain literal IP address) of the host
>on which the message identifier was created on the right-hand side of
>the "@" (since domain names and IP addresses are normally unique)

>Though other algorithms will work, it is RECOMMENDED

Note, recommended, not required.

>that the right-hand side contain some domain identifier (either of
>the host itself or otherwise)

Antony.

-- 
Most people are aware that the Universe is big.

 - Paul Davies, Professor of Theoretical Physics

   Please reply to the list;
 please *don't* CC me.


Re: Malformed spam email gets through.

2018-01-02 Thread Rupert Gallagher
I said "sending server", not "domain of the sender".

If an e-mail from y...@rhsoft.net is sent by 95.129.202.170, your mid is 
expected to include either @blah. sunshine.at or @[95.129.202.170].

If the same mid includes @yahoo.com, for example, then the message is rejected 
as spam, because the IP is denied by yahoo's SPF RR.

Sent from ProtonMail Mobile

Re: Malformed spam email gets through.

2018-01-02 Thread Rupert Gallagher
This is the normative reference.

RFC 822, pg. 30, section 6.2.3
--
msg-id = "<" addr-spec ">";
addr-spec = local-part "@" domain;
domain = sub-domain *("." sub-domain);
sub-domain = domain-ref / domain-literal;

<>

Note that the "@" must also be present as part of the well-formed-formula.
When absent, the string is not well formed, and a syntax error occurs.

RFC 5322, pg. 27, section 3.6.4
---

<<  The message identifier (msg-id) itself MUST be a globally unique
   identifier for a message.  The generator of the message identifier
   MUST guarantee that the msg-id is unique.  There are several
   algorithms that can be used to accomplish this.  Since the msg-id has
   a similar syntax to addr-spec (identical except that quoted strings,
   comments, and folding white space are not allowed), a good method is
   to put the domain name (or a domain literal IP address) of the host
   on which the message identifier was created on the right-hand side of
   the "@" (since domain names and IP addresses are normally unique),
   and put a combination of the current absolute date and time along
   with some other currently unique (perhaps sequential) identifier
   available on the system (for example, a process id number) on the
   left-hand side.  Though other algorithms will work, it is RECOMMENDED
   that the right-hand side contain some domain identifier (either of
   the host itself or otherwise) such that the generator of the message
   identifier can guarantee the uniqueness of the left-hand side within
   the scope of that domain. >>

Happy new year.

Sent with [ProtonMail](https://protonmail.com) Secure Email.

>  Original Message 
> Subject: Re: Malformed spam email gets through.
> Local Time: 2 January 2018 9:54 AM
> UTC Time: 2 January 2018 08:54
> From: r...@protonmail.com
> To: users@spamassassin.apache.org
>
> You are wrong. I will quote from the standard when I get back to my desk.
>
> Sent from ProtonMail Mobile
>
> On Mon, Jan 1, 2018 at 17:17, Bill Cole 
> <sausers-20150...@billmail.scconsult.com> wrote:
>
>> On 1 Jan 2018, at 3:54 (-0500), Rupert Gallagher wrote: > We reject anything 
>> whose mid does not include the fqdn or address > literal of their sending 
>> server. We do this because the RFC says > explicitly that the mid *MUST* 
>> have those features. This is a blatant falsehood. Relevant RFCs: 
>> https://tools.ietf.org/html/rfc5322#section-3.6.4 
>> https://tools.ietf.org/html/rfc2822#section-3.6.4 
>> https://tools.ietf.org/html/rfc822#section-4.6 The only "MUST" in regard to 
>> MID content in any of those is uniqueness. Use of a domain identifier is 
>> merely RECOMMENDED. Beyond that, it is *IMPOSSIBLE* for a receiving system 
>> to reliably determine whether the right-hand part of a MID is a valid host 
>> or domain identifier for the generator of the MID. -- Bill Cole 
>> b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many 
>> *@billmail.scconsult.com addresses) Currently Seeking Steady Work: 
>> https://linkedin.com/in/billcole

Re: Malformed spam email gets through.

2018-01-02 Thread Rupert Gallagher
You are wrong. I will quote from the standard when I get back to my desk.

Sent from ProtonMail Mobile

On Mon, Jan 1, 2018 at 17:17, Bill Cole 
 wrote:

> On 1 Jan 2018, at 3:54 (-0500), Rupert Gallagher wrote: > We reject anything 
> whose mid does not include the fqdn or address > literal of their sending 
> server. We do this because the RFC says > explicitly that the mid *MUST* have 
> those features. This is a blatant falsehood. Relevant RFCs: 
> https://tools.ietf.org/html/rfc5322#section-3.6.4 
> https://tools.ietf.org/html/rfc2822#section-3.6.4 
> https://tools.ietf.org/html/rfc822#section-4.6 The only "MUST" in regard to 
> MID content in any of those is uniqueness. Use of a domain identifier is 
> merely RECOMMENDED. Beyond that, it is *IMPOSSIBLE* for a receiving system to 
> reliably determine whether the right-hand part of a MID is a valid host or 
> domain identifier for the generator of the MID. -- Bill Cole 
> b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many 
> *@billmail.scconsult.com addresses) Currently Seeking Steady Work: 
> https://linkedin.com/in/billcole

Re: Malformed spam email gets through.

2018-01-02 Thread Rupert Gallagher
We serve clients who must conform to certain legal and industrial standards. 
The general principle is to reject anything that cannot be traced back to their 
sender or falls outside their legal range (mail from nation X without bilateral 
agreement of cooperation against internet crime).

According to our monthly statistics, the rejection rate is >75%, not including 
the firewalls work. It is rare, but some spam still gets through.

On mid, the rfc speaks clearly. We just conform to it, and expect everybody 
else to do the same.

Happy new year!

Sent from ProtonMail Mobile

On Mon, Jan 1, 2018 at 15:59, David Jones  wrote:

> Wow! You must not have any spam problems because you don't accept much email 
> -- ham or spam. :) I think some mail systems will keep the same message-ID 
> per email thread so your system must reject some replies. There is no way 
> that most of us on this mailing list can be as strict or our customers would 
> complain constantly about missing email. -- David Jones@ntmhdcweb20sb> 
> @psfc.mit.edu> @psfc.mit.edu>

Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 14:30 (-0500), Alan Hodgson wrote:


On Mon, 2018-01-01 at 10:29 -0500, Bill Cole wrote:

[...]

HOWEVER, the idea of enforcing any standard on MIDs beyond gross
format 
(e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't the
sole 
user is ludicrous.


I've had good success junking anything with one of my domains in the
message-id, where I know the mail isn't actually from someone in that
domain. That's a pretty solid spam signature.


Yes, I was a bit imprecise. Very specific idiosyncratic MID patterns can 
be extremely accurate spam indicators. Enforcement of RFC or common 
practice "standards" is riskier than it is worth.



Lack of any message-id is also significant, but sadly there are still
some real senders sending mail with no message-id.


Yes. It's one of the most annoying persistent sorts of mail sloppiness.

--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 12:47 (-0500), Matus UHLAR - fantomas wrote:


On 1 Jan 2018, at 11:41 (-0500), Matus UHLAR - fantomas wrote:
the gross format in RFCs 822,2822 and 5322 describes message-id 
consisting

of local and domain part, thus is must contain "@".


On 01.01.18 12:17, Bill Cole wrote:
No, it does not. Re-read the cited sections. From RFC5322, the ABNF 
definition:


  msg-id  =   [CFWS] "<" id-left "@" id-right ">" [CFWS]


this is the part that says message-id must consist of local and domain
parts. It just says it implicitly, not explicitly, but:

It's not possible to construct Message-Id without the "@" while 
conforming

to any of mentioned RFCs.


True, but one could just as easily split up a UUID with '@' instead of 
'-' and comply while being as sure of uniqueness as could ever matter. 
Or put full UUIDs on both sides of the '@'. If a V1 UUID is on the 
right, it is even a host-unique identifier after a fashion.


Also note that if you demand that MIDs contain '@' with conforming 
strings on both sides, you risk losing mail that users want. This is 
a mistake I have made.


what exactly was the problem? Message-Id without the "@" or the
non-conforming parts there?


Missing '@'

Some messages lacking it were generated by antique systems that had 
proven themselves resistant to evolutionary pressures.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-01 Thread Matus UHLAR - fantomas

On 01/01/2018 01:30 PM, Alan Hodgson wrote:
I've had good success junking anything with one of my domains in 
the message-id, where I know the mail isn't actually from someone 
in that domain. That's a pretty solid spam signature.


are you sure it's not your mailservers adding Message-Id to the
incoming mail?


On 01.01.18 14:01, David Jones wrote:
I too have seen spam with my own domain in the Message-ID but I 
combined it with a meta rule of !ALL_TRUSTED to be safe.  You are 
correct.  This is a good indicator of spam but each person is going 
to have to create this local rule unless someone wants to write a 
plugin that can detect this dynamically.


I've had probelms with a similar rule when I send mail directly from one of
mailservers. I've had to replace it by !ALL_TRUSTED && !NO_RELAYS
just FYI

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
A day without sunshine is like, night.


Re: Malformed spam email gets through.

2018-01-01 Thread David Jones

On 01/01/2018 01:30 PM, Alan Hodgson wrote:

On Mon, 2018-01-01 at 10:29 -0500, Bill Cole wrote:

On 1 Jan 2018, at 9:59 (-0500), David Jones wrote:

I think some mail systems will keep the same message-ID per email 
thread so your system must reject some replies. 



I have not seen such behavior in the past 20 years...

Intentionally re-using another site's MIDs is so wrong that I'd happily
make it break hard.

HOWEVER, the idea of enforcing any standard on MIDs beyond gross format
(e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't the sole
user is ludicrous.


I've had good success junking anything with one of my domains in the 
message-id, where I know the mail isn't actually from someone in that 
domain. That's a pretty solid spam signature.




I too have seen spam with my own domain in the Message-ID but I combined 
it with a meta rule of !ALL_TRUSTED to be safe.  You are correct.  This 
is a good indicator of spam but each person is going to have to create 
this local rule unless someone wants to write a plugin that can detect 
this dynamically.


Lack of any message-id is also significant, but sadly there are still 
some real senders sending mail with no message-id.


--
David Jones


Re: Malformed spam email gets through.

2018-01-01 Thread Alan Hodgson
On Mon, 2018-01-01 at 10:29 -0500, Bill Cole wrote:
> On 1 Jan 2018, at 9:59 (-0500), David Jones wrote:
> 
> > I think some mail systems will keep the same message-ID per email 
> > thread so your system must reject some replies.
> 
> I have not seen such behavior in the past 20 years...
> 
> Intentionally re-using another site's MIDs is so wrong that I'd
> happily 
> make it break hard.
> 
> HOWEVER, the idea of enforcing any standard on MIDs beyond gross
> format 
> (e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't the
> sole 
> user is ludicrous.

I've had good success junking anything with one of my domains in the
message-id, where I know the mail isn't actually from someone in that
domain. That's a pretty solid spam signature.

Lack of any message-id is also significant, but sadly there are still
some real senders sending mail with no message-id.

Re: Malformed spam email gets through.

2018-01-01 Thread Matus UHLAR - fantomas

On 1 Jan 2018, at 11:41 (-0500), Matus UHLAR - fantomas wrote:
the gross format in RFCs 822,2822 and 5322 describes message-id 
consisting

of local and domain part, thus is must contain "@".


On 01.01.18 12:17, Bill Cole wrote:
No, it does not. Re-read the cited sections. From RFC5322, the ABNF 
definition:


  msg-id  =   [CFWS] "<" id-left "@" id-right ">" [CFWS]


this is the part that says message-id must consist of local and domain
parts. It just says it implicitly, not explicitly, but:

It's not possible to construct Message-Id without the "@" while conforming
to any of mentioned RFCs.

Also note that if you demand that MIDs contain '@' with conforming 
strings on both sides, you risk losing mail that users want. This is 
a mistake I have made.


what exactly was the problem? Message-Id without the "@" or the
non-conforming parts there?

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Your mouse has moved. Windows NT will now restart for changes to take
to take effect. [OK]


Re: Malformed spam email gets through.

2018-01-01 Thread Benny Pedersen

David Jones skrev den 2018-01-01 15:59:


There is no way that most of us on this mailing list can be as strict
or our customers would complain constantly about missing email.


postfix add rfc message-id on mails that dont follow rfcs, so first mta 
(postfix here) hiddes mua's fault not following rfc's, i dont know other 
mta's on how thay help spammers


Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 11:41 (-0500), Matus UHLAR - fantomas wrote:

the gross format in RFCs 822,2822 and 5322 describes message-id 
consisting

of local and domain part, thus is must contain "@".


No, it does not. Re-read the cited sections. From RFC5322, the ABNF 
definition:


   msg-id  =   [CFWS] "<" id-left "@" id-right ">" [CFWS]

   id-left =   dot-atom-text / obs-id-left

   id-right=   dot-atom-text / no-fold-literal / obs-id-right

   no-fold-literal =   "[" *dtext "]"

Note the lack of specification of "local" and "domain" parts.

Also note that if you demand that MIDs contain '@' with conforming 
strings on both sides, you risk losing mail that users want. This is a 
mistake I have made.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-01 Thread @lbutlr
On 1 Jan 2018, at 09:41, Matus UHLAR - fantomas  wrote:
> the gross format in RFCs 822,2822 and 5322 describes message-id consisting
> of local and domain part,

You are misreading the RFC.

The Message-ID itself is a *should* and there is no MUST un any of the 
description of the construction of the Message-ID, only that it MUST be 
globally unique.

5322 specifically states: "Though other algorithms will work, it is RECOMMENDED 
that the right-hand side contain some domain identifier (either of the host 
itself or otherwise) such that the generator of the message identifier can 
guarantee the uniqueness of the left-hand side within the scope of that domain."

There is no requirement to include a local and domain part in any part of a 
Message-ID.

A 256-bit would be unique to some significant fraction of the atoms in the 
universe. I'd posit that meets any reasonable definition of "must be globally 
unique."

But, in practice, the simplest way to guarantee uniqueness is to generate a 
timestamp and add it to a domain/IP/local ID.

-- 
"We take off our Republican hats and put on our American hats" -- Many 
Republicans in Sep 2008



Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 10:33 (-0500), David Jones wrote:


On 01/01/2018 09:29 AM, Bill Cole wrote:

On 1 Jan 2018, at 9:59 (-0500), David Jones wrote:

I think some mail systems will keep the same message-ID per email 
thread so your system must reject some replies.


I have not seen such behavior in the past 20 years...



Ok.  I stand corrected then.  What about bounces?  Don't they 
intentionally keep all of the same headers with an empty 
envelope-from?


Nope. A modern standard 'bounce' message is a MIME entity with a special 
type, denoted by a header somewhat like this:


Content-Type: multipart/report; report-type=delivery-status;
  boundary="blah.foo.bar-baz/example.com"

It should have a unique MID, a Date header reflecting the time of the 
bounce, a Subject header like "Undelivered Mail Returned to Sender", a 
To header with the original message's envelope sender, a From header 
clearly identifying the last MTA to hold the message and it's non-human 
nature such as 'mailer-dae...@example.com (Mail Delivery System)', and 
Received headers only reflecting the transit from that MTA to the target 
of the bounce.


One PART of a bounce is a message/rfc822 entity which has at least the 
headers of the original message and usually some or all of the body


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-01 Thread Matus UHLAR - fantomas

On 1 Jan 2018, at 9:59 (-0500), David Jones wrote:

I think some mail systems will keep the same message-ID per email 
thread so your system must reject some replies.


On 01.01.18 10:29, Bill Cole wrote:

I have not seen such behavior in the past 20 years...

Intentionally re-using another site's MIDs is so wrong that I'd 
happily make it break hard.


HOWEVER, the idea of enforcing any standard on MIDs beyond gross 
format (e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't 
the sole user is ludicrous.


the gross format in RFCs 822,2822 and 5322 describes message-id consisting
of local and domain part, thus is must contain "@".

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Due to unexpected conditions Windows 2000 will be released
in first quarter of year 1901


Re: Malformed spam email gets through.

2018-01-01 Thread David Jones

On 01/01/2018 09:33 AM, David Jones wrote:

On 01/01/2018 09:29 AM, Bill Cole wrote:

On 1 Jan 2018, at 9:59 (-0500), David Jones wrote:

I think some mail systems will keep the same message-ID per email 
thread so your system must reject some replies.


I have not seen such behavior in the past 20 years...



Ok.  I stand corrected then.  What about bounces?  Don't they 
intentionally keep all of the same headers with an empty envelope-from?




Answering myself.  No.  I checked a few and the Message-ID is generated 
new on bounces too.  NM  Ignore me ...  :)  I was thinking of something 
else related to email archiving that dedupes based on the Message-ID.


Happy New Year!



Intentionally re-using another site's MIDs is so wrong that I'd 
happily make it break hard.


HOWEVER, the idea of enforcing any standard on MIDs beyond gross 
format (e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't 
the sole user is ludicrous.






--
David Jones



Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 3:54 (-0500), Rupert Gallagher wrote:

We reject anything whose mid does not include the fqdn or address 
literal of their sending server. We do this because the RFC says 
explicitly that the mid *MUST* have those features.


This is a blatant falsehood. Relevant RFCs:

https://tools.ietf.org/html/rfc5322#section-3.6.4
https://tools.ietf.org/html/rfc2822#section-3.6.4
https://tools.ietf.org/html/rfc822#section-4.6

The only "MUST" in regard to MID content in any of those is uniqueness. 
Use of a domain identifier is merely RECOMMENDED.


Beyond that, it is *IMPOSSIBLE* for a receiving system to reliably 
determine whether the right-hand part of a MID is a valid host or domain 
identifier for the generator of the MID.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-01 Thread David Jones

On 01/01/2018 09:29 AM, Bill Cole wrote:

On 1 Jan 2018, at 9:59 (-0500), David Jones wrote:

I think some mail systems will keep the same message-ID per email 
thread so your system must reject some replies.


I have not seen such behavior in the past 20 years...



Ok.  I stand corrected then.  What about bounces?  Don't they 
intentionally keep all of the same headers with an empty envelope-from?



Intentionally re-using another site's MIDs is so wrong that I'd happily 
make it break hard.


HOWEVER, the idea of enforcing any standard on MIDs beyond gross format 
(e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't the sole 
user is ludicrous.




--
David Jones


Re: Malformed spam email gets through.

2018-01-01 Thread Bill Cole

On 1 Jan 2018, at 9:59 (-0500), David Jones wrote:

I think some mail systems will keep the same message-ID per email 
thread so your system must reject some replies.


I have not seen such behavior in the past 20 years...

Intentionally re-using another site's MIDs is so wrong that I'd happily 
make it break hard.


HOWEVER, the idea of enforcing any standard on MIDs beyond gross format 
(e.g.: <[[:ascii:]]{3,996}>) on a system where the admin isn't the sole 
user is ludicrous.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: Malformed spam email gets through.

2018-01-01 Thread David Jones

On 01/01/2018 02:54 AM, Rupert Gallagher wrote:
We reject anything whose mid does not include the fqdn or address 
literal of their sending server. We do this because the RFC says 
explicitly that the mid *MUST* have those features. We write exceptions 
for those few senders who are legitimate but have lazy and 
incompetent sysadmins.


On Mon, Jan 1, 2018 at 00:15, Mark London > wrote:


Message-ID: 


Wow!  You must not have any spam problems because you don't accept much 
email -- ham or spam.  :)  I think some mail systems will keep the same 
message-ID per email thread so your system must reject some replies.


There is no way that most of us on this mailing list can be as strict or 
our customers would complain constantly about missing email.


--
David Jones


Re: Malformed spam email gets through.

2018-01-01 Thread Rupert Gallagher
We reject anything whose mid does not include the fqdn or address literal of 
their sending server. We do this because the RFC says explicitly that the mid 
*MUST* have those features. We write exceptions for those few senders who are 
legitimate but have lazy and incompetent sysadmins.

On Mon, Jan 1, 2018 at 00:15, Mark London  wrote:

Message-ID: 

Re: Malformed spam email gets through.

2018-01-01 Thread Pedro David Marco
 
> Also, can anyone suggest a nicely written rule, that triggers when an html 
> tag's text contains both upper and lower case letters?  Thanks. - Mark


Hi Mark and happy new year!

For small tags a simple rule, uggly but very cheap, may work:  
/Src|sRc|srC|.. and son on   number of letters to the power of 2... not 
usefull for long tags but cheap in terms of regex steps.

A more ellaborated regex...

The next rules are far from perfect but can detect  "something that looks like" 
mixed upper and lower case HTML tags in the pristine body. 

full       __MIXED_UPLOCASE_SRC   /(?=(?i:src))(?!src|SRC)...\s*=/tflags   
__MIXED_UPLOCASE_SRC   multiple maxhits=2
full       __MIXED_UPLOCASE_HREF  /(?=(?i:href))(?!href|HREF)\s*=/tflags   
__MIXED_UPLOCASE_HREF  multiple maxhits=2
meta        MIX_UPLOCASE_HTAGS    __MIXED_UPLOCASE_SRC >1 && 
__MIXED_UPLOCASE_HREF >1describe  MIX_UPLOCASE_HTAGS    MIX OF UPPER AND lower 
LETTERS in HTML TAGSscore       MIX_UPLOCASE_HTAGS    1

You can also check for invalid Base64 characters and and invalid Base64 line 
lenght...  if all of them match... "Hasta luego Lucas"  or as Rupert 
Gallagher says: easter eggs... :-)

hope they help you... 

-PedroD

  

Re: Malformed spam email gets through.

2017-12-31 Thread David Jones

On 12/31/2017 05:15 PM, Mark London wrote:
Hi - I previously mentioned that I was getting emails with hand created 
html tags, that had both uppercase and lowercase letters.


I created a crude rawbody rule to test for them. It worked, until the 
spammer accidentally added the line "Content-Transfer-Encoding: base64", 
even though the body of the message is not encoded with base64.


Because of this, my rawbody rules failed to trigger.  See below.  Is 
there a way to detect a malformed email like this?


Also, can anyone suggest a nicely written rule, that triggers when an 
html tag's text contains both upper and lower case letters?  Thanks. - Mark


MIME-Version: 1.0
From: c...@nmlc.com
To: markrlon...@gmail.com
Date: Sun, 31 Dec 2017 18:42:25 CET
Subject: Never Pay For Covered Home Repairs Again-Best deal of the year, 
Iimited-Time*Njvt

Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: base64
Message-ID: 
X-OriginalArrivalTime: 22 Mar 2017 15:52:46.0402 (UTC) FILETIME=
X-SG-EID: 
Ir4EYmZz10i7MgunveLJlw0xcvqQbeauQMDQs3EPe27heIGiqko5Ui6DR17zgRAkuOys70ubB2uU06 
2rXoYm1NiUd72Cmr8IRCp81sAgopwU26YxZSasTrSlTtZfLgs+yn3P85pGOBbZrAEV2KAPssmDkJ77 
YTcMSxfLqx2qEBkTLe9yUFrjCwDKa+CySPgoWXhA3BKLnvIvUPwEgt0uMQ==
X-Feedback-ID: 
561562:WZ3ZRcIWAujB4xGDqDKA1Ud8w67Bpa8gtW18sDbAXo0=:WZ3ZRcIWAujB4xGDqDKA1Ud8w67Bpa8gtW18sDbAXo0=:SG 



HrEf=http://www.sitedesk.net/redirect.php?url=http%3A%2F/%2f/ec2-52-52-247-130.us-west-1.compute.amazonaws.com/qs=r-aeideaebigkjffgafifgifajjibbeaeekabababadjadaccaebbacdckacckcacb>srC=https://www.imagevita.org/uploads/46174adfa726bcdadfc2914890c02ee9.jpg>HrEf=http://www.sitedesk.net/redirect.php?url=http%3A%2F/%2f/ec2-52-52-247-130.us-west-1.compute.amazonaws.com/qs=ua-aeideaebigkjffgafifgifajjibbeaeekabababadjadaccaebbacdckacckcacb>srC=https://www.imagevita.org/uploads/8d36198d9d812471230cd3a1362eb169.jpg>HrEf=http://www.sitedesk.net/redirect.php?url=http%3A%2F/%2f/ec2-52-52-247-130.us-west-1.compute.amazonaws.com/qs=u-aeideaebigkjffgafifgifajjibbeaeekabababadjadaccaebbacdckacckcacb>srC=https://www.imagevita.org/uploads/529ec935ba2f0b52917be25826b3a23b.jpg>style="height:5500PX">The New York Times

Thank you for registering.



That email looks like it came from Sendgrid but I can't tell for sure 
without seeing all of the Received headers.  If it did come through 
Sendgrid, then this should be reported to their abuse to help all of us.


https://sendgrid.com/report-spam/

--
David Jones


Malformed spam email gets through.

2017-12-31 Thread Mark London

Hi - I previously mentioned that I was getting emails with hand created html 
tags, that had both uppercase and lowercase letters.

I created a crude rawbody rule to test for them. It worked, until the spammer 
accidentally added the line "Content-Transfer-Encoding: base64", even though 
the body of the message is not encoded with base64.

Because of this, my rawbody rules failed to trigger.  See below.  Is there a 
way to detect a malformed email like this?

Also, can anyone suggest a nicely written rule, that triggers when an html 
tag's text contains both upper and lower case letters?  Thanks. - Mark

MIME-Version: 1.0
From: c...@nmlc.com
To: markrlon...@gmail.com
Date: Sun, 31 Dec 2017 18:42:25 CET
Subject: Never Pay For Covered Home Repairs Again-Best deal of the year, 
Iimited-Time*Njvt
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: base64
Message-ID: 
X-OriginalArrivalTime: 22 Mar 2017 15:52:46.0402 (UTC) FILETIME=
X-SG-EID: 
Ir4EYmZz10i7MgunveLJlw0xcvqQbeauQMDQs3EPe27heIGiqko5Ui6DR17zgRAkuOys70ubB2uU06 
2rXoYm1NiUd72Cmr8IRCp81sAgopwU26YxZSasTrSlTtZfLgs+yn3P85pGOBbZrAEV2KAPssmDkJ77 
YTcMSxfLqx2qEBkTLe9yUFrjCwDKa+CySPgoWXhA3BKLnvIvUPwEgt0uMQ==
X-Feedback-ID: 
561562:WZ3ZRcIWAujB4xGDqDKA1Ud8w67Bpa8gtW18sDbAXo0=:WZ3ZRcIWAujB4xGDqDKA1Ud8w67Bpa8gtW18sDbAXo0=:SG

http://www.sitedesk.net/redirect.php?url=http%3A%2F/%2f/ec2-52-52-247-130.us-west-1.compute.amazonaws.com/qs=r-aeideaebigkjffgafifgifajjibbeaeekabababadjadaccaebbacdckacckcacb>https://www.imagevita.org/uploads/46174adfa726bcdadfc2914890c02ee9.jpg>http://www.sitedesk.net/redirect.php?url=http%3A%2F/%2f/ec2-52-52-247-130.us-west-1.compute.amazonaws.com/qs=ua-aeideaebigkjffgafifgifajjibbeaeekabababadjadaccaebbacdckacckcacb>https://www.imagevita.org/uploads/8d36198d9d812471230cd3a1362eb169.jpg>http://www.sitedesk.net/redirect.php?url=http%3A%2F/%2f/ec2-52-52-247-130.us-west-1.compute.amazonaws.com/qs=u-aeideaebigkjffgafifgifajjibbeaeekabababadjadaccaebbacdckacckcacb>https://www.imagevita.org/uploads/529ec935ba2f0b52917be25826b3a23b.jpg>The New York Times
Thank you for registering.