Re: Bayes underperforming, HTML entities?

2018-11-29 Thread Amir Caspi
On Nov 29, 2018, at 10:11 PM, Bill Cole 
 wrote:
> 
> I have no issue with adding a new rule type to act on the output of a partial 
> well-defined HTML parsing, something in between 'rawbody' and 'body' types, 
> but overloading normalize_charset with that and so affecting every existing 
> rule of all body-oriented rule types would be a bad design.

The problem as I see it is that spammers are using HTML encoding as effectively 
another charset, and as a way of obfuscating like they did/do with Unicode 
lookalikes... but unless those HTML characters are translated there is no way 
to catch this obfuscation.

In other words — the encoded entities DISPLAY as something different than the 
content over which rules run... and because encoding is cumbersome and not 
human-readable, it also makes writing rules to catch these MUCH harder. Worse 
yet, they evade Bates almost completely because the encoded words don’t 
tokenize well.

Maybe normalize_charset isn’t the right place to do it, but it seems like there 
should be some way of converting HTML-encoded entities into their 
single-character ASCII or Unicode equivalents before body rules and especially 
before Bayes tokenization, so that we can tokenize and run our rules on the 
-displayed- text and not the encoded text...

How best to achieve this?

--- Amir


Re: Bayes underperforming, HTML entities?

2018-11-29 Thread Bill Cole

On 29 Nov 2018, at 17:32, Amir Caspi wrote:

B) Do you think that normalize_charsets could evolve to handle HTML 
entities?


That would be a mess. The normalize_charset option acts on the decoded 
text of text/* MIME parts before that text is parsed into meaningful 
tokens.


I have no issue with adding a new rule type to act on the output of a 
partial well-defined HTML parsing, something in between 'rawbody' and 
'body' types, but overloading normalize_charset with that and so 
affecting every existing rule of all body-oriented rule types would be a 
bad design.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole


Re: Bayes underperforming, HTML entities?

2018-11-29 Thread John Hardin

On Thu, 29 Nov 2018, Amir Caspi wrote:


On Nov 29, 2018, at 3:27 PM, John Hardin  wrote:


I'll see whether those can be incorporated into the existing UNICODE_OBFU_ZW 
rule (which of course will no longer actually be UNICODE :) )


Great. Maybe rename the rule. ;-)

What are your thoughts on item #2?  Specifically:

A) Could you sandbox the proposed rule change (AC_HTML_ENTITY_BONANZA_NEW) and 
see how it performs, including possible FPs?


Sure.


B) Do you think that normalize_charsets could evolve to handle HTML entities?


Potentially. I'm not familiar with that part of the code.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Gun Control laws aren't enacted to control guns, they are enacted
  to control people: catholics (1500s), japanese peasants (1600s),
  blacks (1860s), italian immigrants (1911), armenians (1911),
  the irish (1920s), jews (1930s), blacks (1960s), the poor (always)
---
 609 days since the first commercial re-flight of an orbital booster (SpaceX)


Re: Bayes underperforming, HTML entities?

2018-11-29 Thread Amir Caspi
On Nov 29, 2018, at 3:27 PM, John Hardin  wrote:
> 
> I'll see whether those can be incorporated into the existing UNICODE_OBFU_ZW 
> rule (which of course will no longer actually be UNICODE :) )

Great. Maybe rename the rule. ;-)

What are your thoughts on item #2?  Specifically:

A) Could you sandbox the proposed rule change (AC_HTML_ENTITY_BONANZA_NEW) and 
see how it performs, including possible FPs?

B) Do you think that normalize_charsets could evolve to handle HTML entities?

Cheers.

--- Amir



Re: Bayes underperforming, HTML entities?

2018-11-29 Thread John Hardin

On Thu, 29 Nov 2018, Amir Caspi wrote:


1) A new variant is showing up lately, with liberal use of zero-width 
spaces/joiners. See spample:
https://pastebin.com/zBVWaiew 

This uses the  (zero-width joiner) HTML entity, interspersed within words. I 
don't see any legitimate reason that these should be present for Roman charsets and 
other non-complex scripts that don't require it.  Later in the spample there is similar 
usage of the  (zero-width space) entity. I've seen a few other examples 
with other zero-width entities, as well.


I'll see whether those can be incorporated into the existing 
UNICODE_OBFU_ZW rule (which of course will no longer actually be UNICODE :) )


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Windows Genuine Advantage (WGA) means that now you use your
  computer at the sufferance of Microsoft Corporation. They can
  kill it remotely without your consent at any time for any reason;
  it also shuts down in sympathy when the servers at Microsoft crash.
---
 609 days since the first commercial re-flight of an orbital booster (SpaceX)


Re: Bayes underperforming, HTML entities?

2018-11-29 Thread Amir Caspi
On Nov 10, 2018, at 11:30 AM, John Hardin  wrote:
> 
> Initial results (again, all corpora aren't in yet)...
> 
> The rawbody rules perform much better (unsurprising), and the ASCII-only one 
> has a better raw S/O:
> 
> https://ruleqa.spamassassin.org/20181110-r1846283-n/__RW_HTML_ENTITY_ASCII_RAW/detail
>  
> 
> https://ruleqa.spamassassin.org/20181110-r1846283-n/__AC_HTML_ENTITY_BONANZA_SHRT_RAW/detail
>  
> 
> 
> The body one is still getting hits:
> 
> https://ruleqa.spamassassin.org/20181110-r1846283-n/__AC_HTML_ENTITY_BONANZA_SHRT_BODY/detail
>  
> 
> 
> ...but it's 99-100% overlap with the RAW version so it looks like it's purely 
> due to misformatting of the message.

Two new complications on this -- one could be solved by a new rule, but both 
could be solved by an evolution of the main rule.  The problem is that evolving 
the rule could pose an FP risk.  See spamples and discussion below.  Ultimately 
I think we need to consider HTML entities to be a form of character set, and 
have normalize_charsets convert them...

1) A new variant is showing up lately, with liberal use of zero-width 
spaces/joiners. See spample:
https://pastebin.com/zBVWaiew 

This uses the  (zero-width joiner) HTML entity, interspersed within words. 
I don't see any legitimate reason that these should be present for Roman 
charsets and other non-complex scripts that don't require it.  Later in the 
spample there is similar usage of the  (zero-width space) entity. I've 
seen a few other examples with other zero-width entities, as well.

A proposed rule to catch these zero-width entities (and variants) within Roman 
script:

rawbody AC_HTML_ZEROWIDTH_BONANZA   

Re: spoofing mail

2018-11-29 Thread Rick Gutierrez
El mié., 28 nov. 2018 a las 19:08, Reindl Harald
() escribió:
>
> >
> > these are the files that increase the score of the rule , If I'm
> > missing someone, please someone guide me or update me if I'm doing it
> > wrong.
> >
> > /var/lib/spamassassin/3.004001/updates_spamassassin_org/72_scores.cf
> > /usr/share/spamassassin/72_scores.cf
>
> just don't touch the files
> they will be overwritten
>
> please learn basics how to and where write local overrides
>
> https://support.configserver.com/en/knowledgebase/article/how-do-i-change-the-score-for-a-specific-spamassassin-test
>
Ok , understood.

Thnk


-- 
rickygm

http://gnuforever.homelinux.com


Re: spoofing mail

2018-11-29 Thread Rick Gutierrez
El jue., 29 nov. 2018 a las 10:18, David Jones () escribió:
>
> On 11/29/18 9:44 AM, Paul Stead wrote:
> > I can't find MSGID_BELONGS_RECIPIENT in the standard distribution - I think 
> > this might be because my Plugin is installed.
> >
> > Another to get into branch?
> >
>
> I think this one is worthy of consideration to be included in the core
> SA ruleset.
>
> https://github.com/fmbla
>
> [root@server spamassassin]# pwd
> /etc/mail/spamassassin
> [root@server spamassassin]# cat 99_recipient_msgid.cf
> ifplugin Mail::SpamAssassin::Plugin::RecipientMsgID
>
>meta __PDS_MAILING_SOFTWARE (__VIA_ML || __DOS_HAS_MAILING_LIST ||
> __DOS_HAS_LIST_UNSUB || __HAS_LIST_ID || __DOS_HAS_LIST_ID ||
> __HAS_X_MAILING_LIST)
>
>meta MSGID_BELONGS_RECIPIENT __MSGID_BELONGS_RECIPIENT &&
> !__PDS_MAILING_SOFTWARE && !ENA_TRUSTED_LIST
>describe MSGID_BELONGS_RECIPIENT Message-ID domain belongs to recipient
>score MSGID_BELONGS_RECIPIENT 2.2
>
>meta MSGID_FAKE_FROM_2_EMAILS (__PLUGIN_FROMNAME_SPOOF &&
> __MSGID_BELONGS_RECIPIENT)
>describe MSGID_FAKE_FROM_2_EMAILS MSGID belongs to recipient and
> faked froms
>score MSGID_FAKE_FROM_2_EMAILS 4.2
>
>full __FROM_NAME_LAST_THING
> /From:\W*([\w+.-]+\@[\w.-]+\.\w\w++).*\1(?:\s*|<\/\w+>|--[\w_\-\.\=]{2,}--)+$/s
>
>meta SPOOF_NAME_LAST_THING (__PLUGIN_FROMNAME_SPOOF &&
> __FROM_NAME_LAST_THING)
>describe SPOOF_NAME_LAST_THING From 2 emails and fake from name as
> last thing
>score SPOOF_NAME_LAST_THING 2.2
>
> endif
>
> --
> David Jones

Thank David , that rule is not within the github repository, it has
certainly been removed , you could upload it to github, gmail puts an
ugly format.



-- 
rickygm

http://gnuforever.homelinux.com


Re: spoofing mail

2018-11-29 Thread David Jones
On 11/29/18 9:44 AM, Paul Stead wrote:
> I can't find MSGID_BELONGS_RECIPIENT in the standard distribution - I think 
> this might be because my Plugin is installed.
> 
> Another to get into branch?
> 

I think this one is worthy of consideration to be included in the core 
SA ruleset.

https://github.com/fmbla

[root@server spamassassin]# pwd
/etc/mail/spamassassin
[root@server spamassassin]# cat 99_recipient_msgid.cf
ifplugin Mail::SpamAssassin::Plugin::RecipientMsgID

   meta __PDS_MAILING_SOFTWARE (__VIA_ML || __DOS_HAS_MAILING_LIST || 
__DOS_HAS_LIST_UNSUB || __HAS_LIST_ID || __DOS_HAS_LIST_ID || 
__HAS_X_MAILING_LIST)

   meta MSGID_BELONGS_RECIPIENT __MSGID_BELONGS_RECIPIENT && 
!__PDS_MAILING_SOFTWARE && !ENA_TRUSTED_LIST
   describe MSGID_BELONGS_RECIPIENT Message-ID domain belongs to recipient
   score MSGID_BELONGS_RECIPIENT 2.2

   meta MSGID_FAKE_FROM_2_EMAILS (__PLUGIN_FROMNAME_SPOOF && 
__MSGID_BELONGS_RECIPIENT)
   describe MSGID_FAKE_FROM_2_EMAILS MSGID belongs to recipient and 
faked froms
   score MSGID_FAKE_FROM_2_EMAILS 4.2

   full __FROM_NAME_LAST_THING 
/From:\W*([\w+.-]+\@[\w.-]+\.\w\w++).*\1(?:\s*|<\/\w+>|--[\w_\-\.\=]{2,}--)+$/s

   meta SPOOF_NAME_LAST_THING (__PLUGIN_FROMNAME_SPOOF && 
__FROM_NAME_LAST_THING)
   describe SPOOF_NAME_LAST_THING From 2 emails and fake from name as 
last thing
   score SPOOF_NAME_LAST_THING 2.2

endif

-- 
David Jones


Re: spoofing mail

2018-11-29 Thread Rick Gutierrez
El jue., 29 nov. 2018 a las 7:47, David Jones () escribió:
>

> Here's what my mail filters say.  You can ignore the DKIM_INVALID
> because the body was intentionally modified (redacted) to post to pastbin.
>
> X-Spam-Status: Yes, score=11.0 required=5.0 tests=BAYES_99,DKIM_INVALID,
> DKIM_SIGNED,ENA_BAD_SPAM,ENA_RELAY_NOT_US,MSGID_BELONGS_RECIPIENT,
> RCVD_IN_IVMBL,UNPARSEABLE_RELAY shortcircuit=no autolearn=no
> autolearn_force=no version=3.4.1
> X-Spam-Report:
> *  5.2 BAYES_99 BODY: Bayes spam probability is 99 to 100%
> *  [score: 0.9980]
> *  0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
> *  valid
> *  1.2 RCVD_IN_IVMBL No description available.
> *  0.0 UNPARSEABLE_RELAY Informational: message has unparseable relay 
> lines
> *  0.1 DKIM_INVALID DKIM or DK signature exists, but is not valid
> *  2.2 ENA_RELAY_NOT_US Relayed from outside the US and not on 
> whitelists
> *  2.2 MSGID_BELONGS_RECIPIENT Message-ID domain belongs to recipient
> *  0.0 ENA_BAD_SPAM Spam hitting really bad rules.
>
> A well-trained Bayes helps a lot.

Yes, the problem is that on this server I only have it as a gateway,
everything is sent to my mail server.

>
> You could/should increase the score on MSGID_BELONGS_RECIPIENT in your
> /etc/mail/spamassassin local scores file.

I can not find that rule, I do not know if adding it to my local.cf works?

>
> Local overrides of scores and settings is typically done in
> /etc/mail/spamassassin/local.cf but feel free to make your own *.cf
> files in /etc/mail/spamassassin.  Amavis can create it's own files to
> customize settings in /etc/mail/spamassassin so compare a vanilla SA
> installation to what you have to find the best place to put your local
> settings.
>
> --
> David Jones

regards!


-- 
rickygm

http://gnuforever.homelinux.com


Re: spoofing mail

2018-11-29 Thread Paul Stead
I can't find MSGID_BELONGS_RECIPIENT in the standard distribution - I think 
this might be because my Plugin is installed.

Another to get into branch?

--

On 29/11/2018, 13:47, "David Jones"  wrote:

On 11/29/18 3:30 AM, Rupert Gallagher wrote:
> Message-ID and To have the same domain, but From does not. You should
> have never received that mail.
>

Here's what my mail filters say.  You can ignore the DKIM_INVALID
because the body was intentionally modified (redacted) to post to pastbin.

X-Spam-Status: Yes, score=11.0 required=5.0 tests=BAYES_99,DKIM_INVALID,
DKIM_SIGNED,ENA_BAD_SPAM,ENA_RELAY_NOT_US,MSGID_BELONGS_RECIPIENT,
RCVD_IN_IVMBL,UNPARSEABLE_RELAY shortcircuit=no autolearn=no
autolearn_force=no version=3.4.1
X-Spam-Report:
*  5.2 BAYES_99 BODY: Bayes spam probability is 99 to 100%
*  [score: 0.9980]
*  0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
*  valid
*  1.2 RCVD_IN_IVMBL No description available.
*  0.0 UNPARSEABLE_RELAY Informational: message has unparseable relay 
lines
*  0.1 DKIM_INVALID DKIM or DK signature exists, but is not valid
*  2.2 ENA_RELAY_NOT_US Relayed from outside the US and not on 
whitelists
*  2.2 MSGID_BELONGS_RECIPIENT Message-ID domain belongs to recipient
*  0.0 ENA_BAD_SPAM Spam hitting really bad rules.

A well-trained Bayes helps a lot.

You could/should increase the score on MSGID_BELONGS_RECIPIENT in your
/etc/mail/spamassassin local scores file.

Local overrides of scores and settings is typically done in
/etc/mail/spamassassin/local.cf but feel free to make your own *.cf
files in /etc/mail/spamassassin.  Amavis can create it's own files to
customize settings in /etc/mail/spamassassin so compare a vanilla SA
installation to what you have to find the best place to put your local
settings.

--
David Jones



Paul Stead
Senior Engineer (Tools & Technology)
Zen Internet


Re: spoofing mail

2018-11-29 Thread David Jones
On 11/29/18 3:30 AM, Rupert Gallagher wrote:
> Message-ID and To have the same domain, but From does not. You should 
> have never received that mail.
> 

Here's what my mail filters say.  You can ignore the DKIM_INVALID 
because the body was intentionally modified (redacted) to post to pastbin.

X-Spam-Status: Yes, score=11.0 required=5.0 tests=BAYES_99,DKIM_INVALID,
DKIM_SIGNED,ENA_BAD_SPAM,ENA_RELAY_NOT_US,MSGID_BELONGS_RECIPIENT,
RCVD_IN_IVMBL,UNPARSEABLE_RELAY shortcircuit=no autolearn=no
autolearn_force=no version=3.4.1
X-Spam-Report:
*  5.2 BAYES_99 BODY: Bayes spam probability is 99 to 100%
*  [score: 0.9980]
*  0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
*  valid
*  1.2 RCVD_IN_IVMBL No description available.
*  0.0 UNPARSEABLE_RELAY Informational: message has unparseable relay 
lines
*  0.1 DKIM_INVALID DKIM or DK signature exists, but is not valid
*  2.2 ENA_RELAY_NOT_US Relayed from outside the US and not on 
whitelists
*  2.2 MSGID_BELONGS_RECIPIENT Message-ID domain belongs to recipient
*  0.0 ENA_BAD_SPAM Spam hitting really bad rules.

A well-trained Bayes helps a lot.

You could/should increase the score on MSGID_BELONGS_RECIPIENT in your 
/etc/mail/spamassassin local scores file.

Local overrides of scores and settings is typically done in 
/etc/mail/spamassassin/local.cf but feel free to make your own *.cf 
files in /etc/mail/spamassassin.  Amavis can create it's own files to 
customize settings in /etc/mail/spamassassin so compare a vanilla SA 
installation to what you have to find the best place to put your local 
settings.

-- 
David Jones


Re: --virtual-config-dir=pattern is not substituted

2018-11-29 Thread Eggert Ehmke
Strange, I am missing that configuration in /etc/postfix/master.cf. Will add 
them. 

Am Donnerstag, 29. November 2018, 01:15:39 CET schrieb Bill Cole:
> On 28 Nov 2018, at 17:53, Eggert Ehmke wrote:
> > Do you mean the --username option in /etc/default/spamassassin?
> 
> No. Postfix is running the 'spamc' program in some fashion, usually via
> a pipe transport configured in master.cf. That transport (typically an
> intermediary script) needs to be passed the recipient address by Postfix
> and may need to transform it in some fashion (e.g. strip the domain
> maybe) to use it as the argument to the '-u' option in an invocation of
> spamc.
> 
> > It is set to the generic user --username=debian-spamd
> > 
> > 
> > Thank you
> > 
> > Am Mittwoch, 28. November 2018, 22:41:38 CET schrieb RW:
> >> On Tue, 27 Nov 2018 18:01:04 +0100
> >> 
> >> Eggert Ehmke wrote:
> >>> I have Spamassassin running on Debian with Postfix, Dovecot etc. It
> >>> seems to work, Spam is filtered to my Quarantine. I have some
> >>> virtual
> >>> mailboxes in /var/mail/vhosts and have set up the Option
> >>> 
> >>> -x --virtual-config-dir=/var/mail/vhosts/%d/%l/spamassassin
> >>> This does not work, in the log file  /var/log/spamassassin/spamd.log
> >>> 
> >>> I find these lines:
> >>> 
> >>> warn: plugin: eval failed: bayes: (in learn) locker: safe_lock:
> >>> cannot create tmp lockfile /var/
> >>> mail/vhosts///spamassassin/bayes.lock.domain.de.3653
> >>> for /var/mail/vhosts///spa
> >>> 
> >>> 
> >>> So the user name and the domain are  not replaced in the pattern.
> >>> What may be wrong??
> >> 
> >> Are you sure the recipient address is being passed to spamc via the
> >> -u
> >> option?




Re: spoofing mail

2018-11-29 Thread Rupert Gallagher
Message-ID and To have the same domain, but From does not. You should have 
never received that mail.

On Wed, Nov 28, 2018 at 19:15, Rick Gutierrez  wrote:

> El mié., 28 nov. 2018 a las 6:03, Christian Grunfeld
> () escribió:
>>
>> Hi,
>>
>> this is a logcould you paste the email headers?
>>
>> cheers
>>
> I do not know if it is useful, the amavisd + spamassassin I have it in
> front of the mail server.
>
> https://pastebin.com/ktMUDLps
>
> I appreciate any comments or help.
>
> --
> rickygm
>
> http://gnuforever.homelinux.com