Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-23 Thread Henrik K
On Mon, May 23, 2022 at 10:48:51PM -0600, Philip Prindeville wrote:
> 
> 
> > On May 11, 2022, at 1:53 AM, Henrik K  wrote:
> > 
> > On Wed, May 11, 2022 at 10:49:32AM +0300, Henrik K wrote:
> >> On Wed, May 11, 2022 at 10:44:05AM +0300, Henrik K wrote:
> >>> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
>  See my original message.
>  
>  I can't think of a single way to match each header, and then test for 
>  any of them not matching the pattern...
> >>> 
> >>> Simply use regex negative lookahead.
> >>> 
> >>> ALL =~ /^(?!Foo|Bar):/m
> >>> 
> >>> It will hit any line _not_ starting with Foo: or Bar:
> >> 
> >> Oops I think it was buggy.. more like:
> >> 
> >> ALL =~ /^(?!(?:Foo|Bar):)/m
> > 
> > And for debug logging to log the missing header (to easily inspect what was
> > matched) you need some additional string matching, lookahead itself doesn't
> > save any string
> > 
> > ALL =~ /^(?!(?:Foo|Bar):)[^:]+/m
> > 
> 
> 
> Ended up using .*$ instead of [^:]* but that worked too.
> 
> Is it possible to count how many times we didn't see matching headers and 
> then count those, setting some threshold, like 3 or more unknown headers?

tflags multiple should work

header UNKNOWN_HDR ALL ...
tflags UNKNOWN_HDR multiple maxhits=3
meta UNKNOWN_HDR_TOOMANY UNKNOWN_HDR >= 3



Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-23 Thread Philip Prindeville



> On May 11, 2022, at 1:53 AM, Henrik K  wrote:
> 
> On Wed, May 11, 2022 at 10:49:32AM +0300, Henrik K wrote:
>> On Wed, May 11, 2022 at 10:44:05AM +0300, Henrik K wrote:
>>> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
 See my original message.
 
 I can't think of a single way to match each header, and then test for any 
 of them not matching the pattern...
>>> 
>>> Simply use regex negative lookahead.
>>> 
>>> ALL =~ /^(?!Foo|Bar):/m
>>> 
>>> It will hit any line _not_ starting with Foo: or Bar:
>> 
>> Oops I think it was buggy.. more like:
>> 
>> ALL =~ /^(?!(?:Foo|Bar):)/m
> 
> And for debug logging to log the missing header (to easily inspect what was
> matched) you need some additional string matching, lookahead itself doesn't
> save any string
> 
> ALL =~ /^(?!(?:Foo|Bar):)[^:]+/m
> 


Ended up using .*$ instead of [^:]* but that worked too.

Is it possible to count how many times we didn't see matching headers and then 
count those, setting some threshold, like 3 or more unknown headers?

Thanks,

-Philip



Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-13 Thread Philip Prindeville



> On May 11, 2022, at 9:24 AM, John Hardin  wrote:
> 
> On Tue, 10 May 2022, Philip Prindeville wrote:
> 
>> Anyone have a rule to detect the following nonsense headers seen in this 
>> message I got?
>> 
>> Return-Path: 
>> Received: from cp24.deluxehosting.com (cp24.deluxehosting.com 
>> [207.55.244.13])
>>  by mail (envelope-sender ) (MIMEDefang) with ESMTP 
>> id 23C2ch8H717309
>>  for ; Mon, 11 Apr 2022 20:38:50 -0600
>> To: "xy...@redfish-solutions.com" 
>> From: "Nabil, Home Depot" 
>> Message-ID: <35ee7c.8b8cf6.a...@uakron.edu>
>> Date: Mon, 11 Apr 2022 22:38:48 + (UTC)
>> Minicomputers-Exhume: sides
>> Subject: Nabil, 1 searches this week
>> Malthus-Films: 88976dea
>> List-Unsubscribe: 
>> 
>> Parasitic-Homogeneity: db5da28ba3e69a
>> MIME-Version: 1.0
>> Capitalizations-Grievously: oilers
>> Content-type: multipart/mixed; boundary="--=_1649731129-716331-86"
>> 
>> Obviously, the following bogus header names are present:
>> 
>> Minicomputers-Exhume
>> Malthus-Films
>> Parasitic-Homogeneity
>> Capitalizations-Grievously
> 
> Take a look at __RAND_HEADER and RAND_HEADER_MANY
> 
> 

For my test messages, __RAND_HEADER_MANY isn't firing.

Also, Return-Path: is listed in RFC-2822, and many delivering (terminal) MTA's 
add it, including Sendmail.

-Philip




Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-13 Thread Henrik K
On Fri, May 13, 2022 at 12:22:48PM -0600, Philip Prindeville wrote:
>
> How do you look at what a rule is matching?  I've never figured that out...

Debug output:
spamassassin -t -D rules < message.eml 2>&1 | grep 'got hit'



Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-13 Thread Philip Prindeville



> On May 11, 2022, at 1:53 AM, Henrik K  wrote:
> 
> On Wed, May 11, 2022 at 10:49:32AM +0300, Henrik K wrote:
>> On Wed, May 11, 2022 at 10:44:05AM +0300, Henrik K wrote:
>>> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
 See my original message.
 
 I can't think of a single way to match each header, and then test for any 
 of them not matching the pattern...
>>> 
>>> Simply use regex negative lookahead.
>>> 
>>> ALL =~ /^(?!Foo|Bar):/m
>>> 
>>> It will hit any line _not_ starting with Foo: or Bar:
>> 
>> Oops I think it was buggy.. more like:
>> 
>> ALL =~ /^(?!(?:Foo|Bar):)/m
> 
> And for debug logging to log the missing header (to easily inspect what was
> matched) you need some additional string matching, lookahead itself doesn't
> save any string
> 
> ALL =~ /^(?!(?:Foo|Bar):)[^:]+/m
> 


How do you look at what a rule is matching?  I've never figured that out...

-Philip




Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-13 Thread Philip Prindeville



> On May 11, 2022, at 1:44 AM, Henrik K  wrote:
> 
> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
>> See my original message.
>> 
>> I can't think of a single way to match each header, and then test for any of 
>> them not matching the pattern...
> 
> Simply use regex negative lookahead.
> 
> ALL =~ /^(?!Foo|Bar):/m
> 
> It will hit any line _not_ starting with Foo: or Bar:
> 


Ah, that did it.

Of course, if I get false positives, I'll have to search for the header names I 
forgot to include manually...




Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-11 Thread John Hardin

On Tue, 10 May 2022, Philip Prindeville wrote:


Anyone have a rule to detect the following nonsense headers seen in this 
message I got?

Return-Path: 
Received: from cp24.deluxehosting.com (cp24.deluxehosting.com [207.55.244.13])
by mail (envelope-sender ) (MIMEDefang) with ESMTP 
id 23C2ch8H717309
for ; Mon, 11 Apr 2022 20:38:50 -0600
To: "xy...@redfish-solutions.com" 
From: "Nabil, Home Depot" 
Message-ID: <35ee7c.8b8cf6.a...@uakron.edu>
Date: Mon, 11 Apr 2022 22:38:48 + (UTC)
Minicomputers-Exhume: sides
Subject: Nabil, 1 searches this week
Malthus-Films: 88976dea
List-Unsubscribe: 

Parasitic-Homogeneity: db5da28ba3e69a
MIME-Version: 1.0
Capitalizations-Grievously: oilers
Content-type: multipart/mixed; boundary="--=_1649731129-716331-86"

Obviously, the following bogus header names are present:

Minicomputers-Exhume
Malthus-Films
Parasitic-Homogeneity
Capitalizations-Grievously


Take a look at __RAND_HEADER and RAND_HEADER_MANY


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Of the twenty-two civilizations that have appeared in history,
  nineteen of them collapsed when they reached the moral state the
  United States is in now.  -- Arnold Toynbee
---
 3 days until the 74th anniversary of Israel's independence


Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-11 Thread Henrik K
On Wed, May 11, 2022 at 10:49:32AM +0300, Henrik K wrote:
> On Wed, May 11, 2022 at 10:44:05AM +0300, Henrik K wrote:
> > On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
> > > See my original message.
> > > 
> > > I can't think of a single way to match each header, and then test for any 
> > > of them not matching the pattern...
> > 
> > Simply use regex negative lookahead.
> > 
> > ALL =~ /^(?!Foo|Bar):/m
> > 
> > It will hit any line _not_ starting with Foo: or Bar:
> 
> Oops I think it was buggy.. more like:
> 
> ALL =~ /^(?!(?:Foo|Bar):)/m

And for debug logging to log the missing header (to easily inspect what was
matched) you need some additional string matching, lookahead itself doesn't
save any string

ALL =~ /^(?!(?:Foo|Bar):)[^:]+/m



Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-11 Thread Henrik K
On Wed, May 11, 2022 at 10:44:05AM +0300, Henrik K wrote:
> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
> > See my original message.
> > 
> > I can't think of a single way to match each header, and then test for any 
> > of them not matching the pattern...
> 
> Simply use regex negative lookahead.
> 
> ALL =~ /^(?!Foo|Bar):/m
> 
> It will hit any line _not_ starting with Foo: or Bar:

Oops I think it was buggy.. more like:

ALL =~ /^(?!(?:Foo|Bar):)/m

Unless you want to write colon to all alternations

ALL =~ /^(?!Foo:|Bar:)/m



Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-11 Thread Henrik K
On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
> See my original message.
> 
> I can't think of a single way to match each header, and then test for any of 
> them not matching the pattern...

Simply use regex negative lookahead.

ALL =~ /^(?!Foo|Bar):/m

It will hit any line _not_ starting with Foo: or Bar:



Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-11 Thread Martin Gregorie
On Tue, 2022-05-10 at 18:19 -0600, Philip Prindeville wrote:
> I can't think of a single way to match each header, and then test for
> any of them not matching the pattern...
> 
> 
I had in mind a subrule that triggers on valid header names, combined
with a meta rule that inverts the subrule result. At least, that's what
I'd try as a starting point.

Martin




Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-10 Thread Bill Cole

On 2022-05-10 at 20:20:14 UTC-0400 (Tue, 10 May 2022 18:20:14 -0600)
Philip Prindeville 
is rumored to have said:

On May 10, 2022, at 5:57 PM, Martin Gregorie  
wrote:


On Tue, 2022-05-10 at 17:29 -0600, Philip Prindeville wrote:


You're correct that they're different in every message received.


So write a rule that fires on any header name that *doesn't* match
anything in the list of legit headers as defined in the relevant 
RFCs.



See my original message.

I can't think of a single way to match each header, and then test for 
any of them not matching the pattern...


As documented in the POD in Mail::SpamAssassin::Conf, a header rule 
checking "ALL:raw" actually matches against the pristine header section, 
in which you could check for lines that do not begin with the 'standard' 
headers.


Unfortunately, as noted elsewhere in the thread, this pattern uses 
one-time header names AND there is nothing wrong about using random 
words as header names without a leading 'X-' so it's likely a low-yield 
approach.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-10 Thread Bill Cole

On 2022-05-10 at 18:10:23 UTC-0400 (Tue, 10 May 2022 16:10:23 -0600)
Philip Prindeville 
is rumored to have said:

Anyone have a rule to detect the following nonsense headers seen in 
this message I got?


No, and complicating your circumstance: RFC6648

Here's the title & abstract:


   Deprecating the "X-" Prefix and Similar Constructs
in Application Protocols

Abstract

   Historically, designers and implementers of application protocols
   have often distinguished between standardized and unstandardized
   parameters by prefixing the names of unstandardized parameters with
   the string "X-" or similar constructs.  In practice, that convention
   causes more problems than it solves.  Therefore, this document
   deprecates the convention for newly defined parameters with textual
   (as opposed to numerical) names in application protocols.



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-10 Thread Loren Wilton

Minicomputers-Exhume: sides
Malthus-Films: 88976dea
Parasitic-Homogeneity: db5da28ba3e69a
Capitalizations-Grievously: oilers


It looks like the pattern is
   /[A-Z][a-z]{1,20}-[A-Z][a-z]{1.20}\:\s{1,10}[\w\d]{3,20}/
or something close to that.
Obviously it can mutate, but generally these are made by a tool, and until a 
new version of the tool comes along, they will be stable.


Try someting like
   header  LW_BOGUS_HEADERS ALL =~ 
/[A-Z][a-z]{1,20}-[A-Z][a-z]{1.20}\:\s{1,10}[\w\d]{3,20}\n/is 



Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-10 Thread Philip Prindeville



> On May 10, 2022, at 5:57 PM, Martin Gregorie  wrote:
> 
> On Tue, 2022-05-10 at 17:29 -0600, Philip Prindeville wrote:
>> 
>> You're correct that they're different in every message received.
>> 
> So write a rule that fires on any header name that *doesn't* match
> anything in the list of legit headers as defined in the relevant RFCs.


See my original message.

I can't think of a single way to match each header, and then test for any of 
them not matching the pattern...


> 
> Of course you may need to extend that list to include some extras, such
> as headers injected by SA itself, as well as DMARC, DKIM, SPF etc.


That's the easy part.


> 
> Martin
> 
> 



Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-10 Thread Philip Prindeville



> On May 10, 2022, at 5:57 PM, Martin Gregorie  wrote:
> 
> On Tue, 2022-05-10 at 17:29 -0600, Philip Prindeville wrote:
>> 
>> You're correct that they're different in every message received.
>> 
> So write a rule that fires on any header name that *doesn't* match
> anything in the list of legit headers as defined in the relevant RFCs.


See my original message.

I can't think of a single way to match each header, and then test for any of 
them not matching the pattern...


> 
> Of course you may need to extend that list to include some extras, such
> as headers injected by SA itself, as well as DMARC, DKIM, SPF etc.


That's the easy part.


> 
> Martin
> 
> 



Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-10 Thread Martin Gregorie
On Tue, 2022-05-10 at 17:29 -0600, Philip Prindeville wrote:
> 
> You're correct that they're different in every message received.
> 
So write a rule that fires on any header name that *doesn't* match
anything in the list of legit headers as defined in the relevant RFCs.

Of course you may need to extend that list to include some extras, such
as headers injected by SA itself, as well as DMARC, DKIM, SPF etc.

Martin




Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-10 Thread Philip Prindeville



> On May 10, 2022, at 4:58 PM, Kevin A. McGrail  wrote:
> 
> On 5/10/2022 6:10 PM, Philip Prindeville wrote:
>> Anyone have a rule to detect the following nonsense headers seen in this 
>> message I got?
> 
> Interesting. Those look more like something that Bayesian learning would be 
> best to handle.
> 
> But, have you built a corpora of spam and ham?  Do a list of headers that 
> appear in ham and spam corpora and xor out the spam ones.  Then write a rule 
> if any of those exist.  They look like they might change a lot and they are 
> randomized to avoid these type of issues so I see your dilemma and a plugin 
> might be needed.
> 
> Regards,
> KAM


You're correct that they're different in every message received.




Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-10 Thread Kevin A. McGrail

On 5/10/2022 6:10 PM, Philip Prindeville wrote:

Anyone have a rule to detect the following nonsense headers seen in this 
message I got?


Interesting. Those look more like something that Bayesian learning would 
be best to handle.


But, have you built a corpora of spam and ham?  Do a list of headers 
that appear in ham and spam corpora and xor out the spam ones.  Then 
write a rule if any of those exist.  They look like they might change a 
lot and they are randomized to avoid these type of issues so I see your 
dilemma and a plugin might be needed.


Regards,
KAM

--
Kevin A. McGrail
kmcgr...@apache.org

Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171