Re: recent increase in spam getting through

2016-12-17 Thread frederik
Hi Martin,

Thanks for the reply.

> Please keep your messages on the SA Users list. 

Here's my Cc line on the message you replied to:

Cc: RW , "users@spamassassin.apache.org" 


I don't know why it wouldn't go through to the list, perhaps I
shouldn't include spammy terms in the message body (I notice other
posters use zip attachments).

I don't have a "production" and "test" setup, just my laptop. I'm
sorry I missed your earlier suggestion to diff the outputs of
different "spamd" runs. I attach the output of the following commands:

$ sudo spamd -u spamd -g spamd -x -D > spamd-u-g-x.out 2>&1
$ sudo spamd -D > spamd.out 2>&1
   
$ diff -u <(cat spamd-u-g-x.out | cut -f 5- -d ' ') <(cat spamd.out | cut 
-f 5- -d ' ') > spamd.diff

It looks like the second command is able to use my ancient Bayes token
database from my home directory, which I'd forgotten about, and gets
BAYES_999; while the first command uses the global database I just
trained from scratch yesterday (which I now see is in
/var/lib/spamassassin/.spamassassin/bayes_toks) with 3e4 ham and 3e4
spam, and only gets BAYES_60. It would be nice to be able to explain
that.

I could have sworn that there were differences in the other rules as
well, for side-by-side runs like this, but now I can't reproduce that.

Thanks for your help,

Frederick

On Sun, Dec 18, 2016 at 01:00:32AM +, Martin Gregorie wrote:
> On Sat, 2016-12-17 at 15:37 -0800, frede...@ofb.net wrote:
> > Thank you John, that does help clarify things a bit. Also thanks to
> > Martin - I was typing this message when I received yours, but maybe
> > this will answer some of your questions.
> > 
> Please keep your messages on the SA Users list. Apart from anything
> else, by sending off-list messages, you're losing the chance for other
> eyes to see something the rest have missed.
>
> On the two examples you've quoted, it looks as if the score difference
> is due to a lack of URIBL responses, but I can't tell why from the
> evidence I've looked at except to point out that the absence of URI-
> BLOCKED in the low scored example is odd unless this test was done
> after you switched to using your own recursive, non-forwarding DNS
> server. Have you done that?
> 
> I still don't know whether you're using the same configuration for
> production and testing, but the presence of Bayes results in only one
> set of results rather suggests that either they are not the same or
> that they are the same but you've configured per-user Bayes and one of
> the user-specific Bayes databases is untrained and/or hasn't yet seen
> 200 spams and 200 hams.
> 
> BTW, the reason I suggested you do the parallel tests and diff their
> output was because that will highlight differences, which will make
> configuration differences much more obvious. You need to do this on a
> bigger set of messages and think about what any differences it reports
> is telling you about why your testing SA setup isn't getting the same
> results as your production SA.
> 
> If you're absolutely certain that your production SA and SA test setups
> both have:
> - the configuration location defaulted
> - both are running on the same version of the OS
> - the glue[*] you're using to patch SA into your main chain is
>   duplicated in your test setup 
> 
> Then I suggest you check that the SA configurations are identical:
> - is the list of files the same on both configs?
> - are all the files in the config identical? Use 'diff' to make sure.
> - are both Sa systems running the same SA version?
> 
> [*] 'glue' means the scripts or tools such as amavis-new, MIMEdefang,
>     etc
> 
> 
> Martin
> 


Re: recent increase in spam getting through

2016-12-17 Thread Martin Gregorie
On Sat, 2016-12-17 at 15:37 -0800, frede...@ofb.net wrote:
> Thank you John, that does help clarify things a bit. Also thanks to
> Martin - I was typing this message when I received yours, but maybe
> this will answer some of your questions.
> 
Please keep your messages on the SA Users list. Apart from anything
else, by sending off-list messages, you're losing the chance for other
eyes to see something the rest have missed.
 
On the two examples you've quoted, it looks as if the score difference
is due to a lack of URIBL responses, but I can't tell why from the
evidence I've looked at except to point out that the absence of URI-
BLOCKED in the low scored example is odd unless this test was done
after you switched to using your own recursive, non-forwarding DNS
server. Have you done that?

I still don't know whether you're using the same configuration for
production and testing, but the presence of Bayes results in only one
set of results rather suggests that either they are not the same or
that they are the same but you've configured per-user Bayes and one of
the user-specific Bayes databases is untrained and/or hasn't yet seen
200 spams and 200 hams.

BTW, the reason I suggested you do the parallel tests and diff their
output was because that will highlight differences, which will make
configuration differences much more obvious. You need to do this on a
bigger set of messages and think about what any differences it reports
is telling you about why your testing SA setup isn't getting the same
results as your production SA.

If you're absolutely certain that your production SA and SA test setups
both have:
- the configuration location defaulted
- both are running on the same version of the OS
- the glue[*] you're using to patch SA into your main chain is
  duplicated in your test setup 

Then I suggest you check that the SA configurations are identical:
- is the list of files the same on both configs?
- are all the files in the config identical? Use 'diff' to make sure.
- are both Sa systems running the same SA version?

[*] 'glue' means the scripts or tools such as amavis-new, MIMEdefang,
    etc


Martin



Re: recent increase in spam getting through

2016-12-17 Thread frederik
be true.
> 
> FWIW my SA rule development and test setup runs on a different machine
> from the one running my live SA installation, BUT:
> - both SA instances keep their configurations in the default location
>   for my OS (Fedora Linux)
> - both run SA as the spamd daemon and use spamc to pass messages to it
> - I use a set of bash scripts to maintain the production SA
>   configuration by copying the complete configuration from my SA test
>   and development environment to the production environment and
>   immediately restart the production spamd instance.
> 
> Thus, although the production system runs SA as part of my getmail MDA
> script while my development system has a manually triggered bash script
> that I use to feed selected test messages to spamc, as far as SA is
> concerned, the two environments and SA configurations are effectively
> identical.
> 
> 
> Martin

On Sat, Dec 17, 2016 at 01:03:48PM -0800, frede...@ofb.net wrote:
> Thanks again for the replies.
> 
> I'm still investigating the problem, but I just noticed that
> "spamassassin" gives the message a score of 12.0, while
> "spamc"/"spamd" (which my mail setup is configured to use) still give
> it a 4.0. So it seems that something more mundane is going on,
> although I'm not sure what. I hope it's not that I've just done
> something stupid again.
> 
> Also, it seems that I should have set up a "caching nameserver". I've
> attached the report from "spamassassin -t" (with a "URIBL_BLOCKED"
> rule).
> 
> Thank you,
> 
> Frederick
> 
> On Sat, Dec 17, 2016 at 07:16:43PM +, David Jones wrote:
> > 
> > >From: RW <rwmailli...@googlemail.com>
> > >Sent: Saturday, December 17, 2016 8:02 AM
> > >To: users@spamassassin.apache.org
> > >Subject: Re: recent increase in spam getting through
> >     
> > >On Sat, 17 Dec 2016 13:35:16 +
> > >David Jones wrote:
> > 
> > 
> > >> That mail server IP above is on a very high number of RBLs:
> > >> http://multirbl.valli.org/lookup/173.230.94.183.html
> > 
> > 
> > >MultiRBL.valli.org - Results of the query 173.230.94.183
> > >multirbl.valli.org
> > >DNSBL and FCrDNS test results of the query '173.230.94.183'.
> > 
> > >> 
> > >> The edge MX server 104.197.242.163 must not be doing any
> > >> MTA checks of RBLs. 
> > 
> > 
> > >As I already mentioned it's normal to get huge scores when retesting
> > >spam because most net rules are reactive. It doesn't imply anything
> > >about RBL results at the time it was received.
> > 
> > When I looked at that RBL link above a few hours ago, it was listed on
> > 30 RBLs and now it says 42 so I agree with you that this is not a direct
> > indicator of receive time results.  I use that link above after the receive
> > time just to get a quick idea how bad it is.  When I see a mail server IP
> > with more than 10 to 12 hits, then it has been sending spam recently.
> > 
> > My point was that a mail server doesn't get listed on 30 or 42 RBLs in
> > a few hours.  It would have to have been sending a lot of spam for at
> > least a few days so this email would have been blocked by postscreen
> > on my servers for weeks.  Looking at the senderscore.org report for
> > that IP, it has been sending spam for about 3 weeks and has a score
> > of 0 out of 100.  Trustworthy mail servers should have a score in the
> > 90's.
> > 
> > SA comes with a few major RBL rules that should have blocked this
> > message recently.  With Postfix postscreen configured with major
> > RBLs weighted high and less reliable RBLs weighted lower, you can
> > get  much better blocking at the MTA level using dozens of RBLs'
> > combined scoring.  Each mail admin has to assess which RBLs
> > are considered reliable for their location and users.
> > 
> > If the edge MX server just had a single zen.spamhaus.org RBL
> > configured and assuming it would be querying under the free
> > limit, then that email most likely would have been rejected before
> > SA and the OP would have never started this thread.

> Content analysis details:   (12.6 points, 5.0 required)
> 
>  pts rule name  description
>  -- --
>  1.2 URIBL_ABUSE_SURBL  Contains an URL listed in the ABUSE SURBL 
> blocklist
> [URIs: 6url.ru]
>  0.0 URIBL_BLOCKED  ADMINISTRATOR NOTICE: The query to URIBL was 
> blocked.
> 

Re: recent increase in spam getting through

2016-12-17 Thread Martin Gregorie
On Sat, 2016-12-17 at 13:03 -0800, frede...@ofb.net wrote:
> I'm still investigating the problem, but I just noticed that
> "spamassassin" gives the message a score of 12.0, while
> "spamc"/"spamd" (which my mail setup is configured to use) still give
> it a 4.0. So it seems that something more mundane is going on,
> although I'm not sure what. I hope it's not that I've just done
> something stupid again.
> 
Two possibilities:

1) If running the message through spamassassin hits a lot of URIBL and
   DNSBL rules that the spamc/spamd run didn't AND this was done some
   time after spamc/spamd scanned the message then thats normal because
   the blacklists now know about the new spam source.

2) Are you sure, i.e. can you prove that, both spamassassin and
   spamc/spamd are using the same configuration? 

   A way to check that would be to run the message through spamassassin
   and immediately run it through spamc/spamd, i.e:
       $ spamassasin spam1.out; spamc spam2.out
       $ diff spam1.out spam2.out

   If the lists of rule hits in the two outputs differ, then its likely
   that spamassassin and spamc are not using the same configuration.
   IOW your comparisons between spamassassin and spamc results are
   meaningless and will remain so until you've arranged things so that
   both are using the same configuration.

   In general, if the configuration is in the default location, 
   /etc/mail/spamassassin on my systems, and both spamassassin and
   spamc are using the default configuration, they should both be
   using the same one, but if you're using glue like amavis-new to
   run spamassassin then this may not be true.

FWIW my SA rule development and test setup runs on a different machine
from the one running my live SA installation, BUT:
- both SA instances keep their configurations in the default location
  for my OS (Fedora Linux)
- both run SA as the spamd daemon and use spamc to pass messages to it
- I use a set of bash scripts to maintain the production SA
  configuration by copying the complete configuration from my SA test
  and development environment to the production environment and
  immediately restart the production spamd instance.

Thus, although the production system runs SA as part of my getmail MDA
script while my development system has a manually triggered bash script
that I use to feed selected test messages to spamc, as far as SA is
concerned, the two environments and SA configurations are effectively
identical.


Martin
  











> Also, it seems that I should have set up a "caching nameserver". I've
> attached the report from "spamassassin -t" (with a "URIBL_BLOCKED"
> rule).
> 
> Thank you,
> 
> Frederick
> 
> On Sat, Dec 17, 2016 at 07:16:43PM +, David Jones wrote:
> > 
> > 
> > > 
> > > From: RW <rwmailli...@googlemail.com>
> > > Sent: Saturday, December 17, 2016 8:02 AM
> > > To: users@spamassassin.apache.org
> > > Subject: Re: recent increase in spam getting through
> > 
> > > 
> > > On Sat, 17 Dec 2016 13:35:16 +
> > > David Jones wrote:
> > 
> > 
> > > 
> > > > 
> > > > That mail server IP above is on a very high number of RBLs:
> > > > http://multirbl.valli.org/lookup/173.230.94.183.html
> > 
> > 
> > > 
> > > MultiRBL.valli.org - Results of the query 173.230.94.183
> > > multirbl.valli.org
> > > DNSBL and FCrDNS test results of the query '173.230.94.183'.
> > 
> > > 
> > > > 
> > > > 
> > > > The edge MX server 104.197.242.163 must not be doing any
> > > > MTA checks of RBLs. 
> > 
> > 
> > > 
> > > As I already mentioned it's normal to get huge scores when
> > > retesting
> > > spam because most net rules are reactive. It doesn't imply
> > > anything
> > > about RBL results at the time it was received.
> > 
> > When I looked at that RBL link above a few hours ago, it was listed
> > on
> > 30 RBLs and now it says 42 so I agree with you that this is not a
> > direct
> > indicator of receive time results.  I use that link above after the
> > receive
> > time just to get a quick idea how bad it is.  When I see a mail
> > server IP
> > with more than 10 to 12 hits, then it has been sending spam
> > recently.
> > 
> > My point was that a mail server doesn't get listed on 30 or 42 RBLs
> > in
> > a few hours.  It would have to have been sending a lot of spam for
> > at
> > least a few days so this email would have been blocked by
> > postscreen
> > on my servers for weeks.  Looking at the sendersco

Re: recent increase in spam getting through

2016-12-17 Thread John Hardin

On Sat, 17 Dec 2016, frede...@ofb.net wrote:


Also, it seems that I should have set up a "caching nameserver". I've
attached the report from "spamassassin -t" (with a "URIBL_BLOCKED"
rule).


The important part is that your MTA/SA not use your ISP or hosting 
provider's DNS sever, and the local MTA/SA DNS server not forward queries 
to an upstream DNS server. Caching results is not related to that.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
  does quite what I want. I wish Christopher Robin was here."
   -- Peter da Silva in a.s.r
---
 8 days until Christmas


Re: recent increase in spam getting through

2016-12-17 Thread frederik
Thanks again for the replies.

I'm still investigating the problem, but I just noticed that
"spamassassin" gives the message a score of 12.0, while
"spamc"/"spamd" (which my mail setup is configured to use) still give
it a 4.0. So it seems that something more mundane is going on,
although I'm not sure what. I hope it's not that I've just done
something stupid again.

Also, it seems that I should have set up a "caching nameserver". I've
attached the report from "spamassassin -t" (with a "URIBL_BLOCKED"
rule).

Thank you,

Frederick

On Sat, Dec 17, 2016 at 07:16:43PM +, David Jones wrote:
> 
> >From: RW <rwmailli...@googlemail.com>
> >Sent: Saturday, December 17, 2016 8:02 AM
> >To: users@spamassassin.apache.org
> >Subject: Re: recent increase in spam getting through
>     
> >On Sat, 17 Dec 2016 13:35:16 +
> >David Jones wrote:
> 
> 
> >> That mail server IP above is on a very high number of RBLs:
> >> http://multirbl.valli.org/lookup/173.230.94.183.html
> 
> 
> >MultiRBL.valli.org - Results of the query 173.230.94.183
> >multirbl.valli.org
> >DNSBL and FCrDNS test results of the query '173.230.94.183'.
> 
> >> 
> >> The edge MX server 104.197.242.163 must not be doing any
> >> MTA checks of RBLs. 
> 
> 
> >As I already mentioned it's normal to get huge scores when retesting
> >spam because most net rules are reactive. It doesn't imply anything
> >about RBL results at the time it was received.
> 
> When I looked at that RBL link above a few hours ago, it was listed on
> 30 RBLs and now it says 42 so I agree with you that this is not a direct
> indicator of receive time results.  I use that link above after the receive
> time just to get a quick idea how bad it is.  When I see a mail server IP
> with more than 10 to 12 hits, then it has been sending spam recently.
> 
> My point was that a mail server doesn't get listed on 30 or 42 RBLs in
> a few hours.  It would have to have been sending a lot of spam for at
> least a few days so this email would have been blocked by postscreen
> on my servers for weeks.  Looking at the senderscore.org report for
> that IP, it has been sending spam for about 3 weeks and has a score
> of 0 out of 100.  Trustworthy mail servers should have a score in the
> 90's.
> 
> SA comes with a few major RBL rules that should have blocked this
> message recently.  With Postfix postscreen configured with major
> RBLs weighted high and less reliable RBLs weighted lower, you can
> get  much better blocking at the MTA level using dozens of RBLs'
> combined scoring.  Each mail admin has to assess which RBLs
> are considered reliable for their location and users.
> 
> If the edge MX server just had a single zen.spamhaus.org RBL
> configured and assuming it would be querying under the free
> limit, then that email most likely would have been rejected before
> SA and the OP would have never started this thread.
Content analysis details:   (12.6 points, 5.0 required)

 pts rule name  description
 -- --
 1.2 URIBL_ABUSE_SURBL  Contains an URL listed in the ABUSE SURBL blocklist
[URIs: 6url.ru]
 0.0 URIBL_BLOCKED  ADMINISTRATOR NOTICE: The query to URIBL was 
blocked.
See

http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
 for more information.
[URIs: 6url.ru]
 0.5 RCVD_IN_SORBS_SPAM RBL: SORBS: sender is a spam source
[173.230.94.183 listed in dnsbl.sorbs.net]
 0.0 URIBL_DBL_ABUSE_REDIR  Contains an abused redirector URL listed in the
 DBL blocklist
[URIs: 6url.ru]
 1.3 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
  [Blocked - see <http://www.spamcop.net/bl.shtml?173.230.94.183>]
 0.4 RCVD_IN_XBLRBL: Received via a relay in Spamhaus XBL
[173.230.94.183 listed in zen.spamhaus.org]
 1.4 RCVD_IN_BRBL_LASTEXT   RBL: No description available.
[173.230.94.183 listed in bb.barracudacentral.org]
 2.7 RCVD_IN_PSBL   RBL: Received via a relay in PSBL
[173.230.94.183 listed in psbl.surriel.com]
 0.0 RCVD_IN_MSPIKE_L4  RBL: Bad reputation (-4)
[173.230.94.183 listed in bl.mailspike.net]
 0.0 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail
domains are different
 0.0 HTML_MESSAGE   BODY: HTML included in message
 2.0 BAYES_

Re: recent increase in spam getting through

2016-12-17 Thread David Jones

>From: RW <rwmailli...@googlemail.com>
>Sent: Saturday, December 17, 2016 8:02 AM
>To: users@spamassassin.apache.org
>Subject: Re: recent increase in spam getting through
    
>On Sat, 17 Dec 2016 13:35:16 +
>David Jones wrote:


>> That mail server IP above is on a very high number of RBLs:
>> http://multirbl.valli.org/lookup/173.230.94.183.html


>MultiRBL.valli.org - Results of the query 173.230.94.183
>multirbl.valli.org
>DNSBL and FCrDNS test results of the query '173.230.94.183'.

>> 
>> The edge MX server 104.197.242.163 must not be doing any
>> MTA checks of RBLs. 


>As I already mentioned it's normal to get huge scores when retesting
>spam because most net rules are reactive. It doesn't imply anything
>about RBL results at the time it was received.

When I looked at that RBL link above a few hours ago, it was listed on
30 RBLs and now it says 42 so I agree with you that this is not a direct
indicator of receive time results.  I use that link above after the receive
time just to get a quick idea how bad it is.  When I see a mail server IP
with more than 10 to 12 hits, then it has been sending spam recently.

My point was that a mail server doesn't get listed on 30 or 42 RBLs in
a few hours.  It would have to have been sending a lot of spam for at
least a few days so this email would have been blocked by postscreen
on my servers for weeks.  Looking at the senderscore.org report for
that IP, it has been sending spam for about 3 weeks and has a score
of 0 out of 100.  Trustworthy mail servers should have a score in the
90's.

SA comes with a few major RBL rules that should have blocked this
message recently.  With Postfix postscreen configured with major
RBLs weighted high and less reliable RBLs weighted lower, you can
get  much better blocking at the MTA level using dozens of RBLs'
combined scoring.  Each mail admin has to assess which RBLs
are considered reliable for their location and users.

If the edge MX server just had a single zen.spamhaus.org RBL
configured and assuming it would be querying under the free
limit, then that email most likely would have been rejected before
SA and the OP would have never started this thread.

Re: recent increase in spam getting through

2016-12-17 Thread RW
On Sat, 17 Dec 2016 13:35:16 +
David Jones wrote:


> That mail server IP above is on a very high number of RBLs:
> http://multirbl.valli.org/lookup/173.230.94.183.html
> 
> The edge MX server 104.197.242.163 must not be doing any
> MTA checks of RBLs. 


As I already mentioned it's normal to get huge scores when retesting
spam because most net rules are reactive. It doesn't imply anything
about RBL results at the time it was received. 


Re: recent increase in spam getting through

2016-12-17 Thread David Jones
>From: frede...@ofb.net <frede...@ofb.net>
>Sent: Saturday, December 17, 2016 1:35 AM
>To: users@spamassassin.apache.org
>Cc: John Hardin
>Subject: Re: recent increase in spam getting through
    
>Here's the sample spam:

>    From tfioxmns...@mariupol.us  Fri Dec 16 20:30:08 2016
>    Return-Path: <tfioxmns...@mariupol.us>
>    X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on thutmose
>    X-Spam-Level: ***
>    X-Spam-Status: No, score=4.0 required=5.0 tests=BAYES_50,
>    HEADER_FROM_DIFFERENT_DOMAINS,HELO_DYNAMIC_IPADDR,HTML_MESSAGE,
>    MIME_QP_LONG_LINE,RDNS_DYNAMIC,T_REMOTE_IMAGE,T_SPF_HELO_TEMPERROR,
>    T_SPF_TEMPERROR autolearn=no autolearn_force=no version=3.4.1  
>    X-Original-To: frede...@ofb.net
>    Delivered-To: frede...@ofb.net
>    Received: from host-173-230-94-183.fltapsf.clients.pavlovmedia.com
>    (host-173-230-94-183.fltapsf.clients.pavlovmedia.com 
>[173.230.94.183])
>    by ofb.net (Postfix) with SMTP id 1CF1D3FFB7
>    for <frede...@ofb.net>; Fri, 16 Dec 2016 20:30:07 -0800 (PST)

That mail server IP above is on a very high number of RBLs:
http://multirbl.valli.org/lookup/173.230.94.183.html

The edge MX server 104.197.242.163 must not be doing any
MTA checks of RBLs.  In my opinion, this is critical to get a
successful SA setup. RBLs should block 85 ot 95 percent of
spam and let SA score the last few percent.

Looks like your setup is having to deal with all of the spam
so the target is too large.  From my experience It will take
too much time to "baby sit" SA and it will look to you like
SA is randomly scoring that doesn't make sense.

Consider setting up a small EFA server as the edge MTA:
https://efa-project.org/

Dave

Re: recent increase in spam getting through

2016-12-16 Thread frederik
Dear all,

Thanks for all the replies to my question, I think all of them were
useful to read. Thank you all for your time.

I wasn't sure whom to reply to, but I've been tinkering with my setup
and I think that many spam messages are getting through which should
be caught by the so-called "Bayesian" text-based classifier. For
instance, there are 499 "spam" messages containing "Instacheat" in the
subject, and no such "ham" messages... The most recent such message in
my "spam" folder has BAYES_999, but there are two in my inbox, one
with BAYES_95 and one with BAYES_50. I've pasted the second one below.

There are a bunch of spam messages with similar properties, with e.g.
obvious erectile dysfunction words in the subject, getting through. My
spam folder has about 40,000 messages and my inbox maybe 30,000. I
tried changing my "train-spamassassin" script to clear the Bayesian
database first, and I also remembered the "sa-update" command and put
it in there for good measure. I disabled the mail fetch line in my
crontab while I ran it, so I'm sure it's not misclassifying things (to
answer Mr. Hardin's implied question).

#!/bin/zsh
sudo sa-update --channel updates.spamassassin.org --verbose
sudo -u spamd sa-learn --clear
sudo -u spamd sa-learn --showdots -D 1 --spam --dir ~/mail/folders/spam
sudo -u spamd sa-learn --showdots -D 1 --ham --dir ~/mail/folders/inbox

After running it, I saw a message with a penny stock subject go from
BAYES_60 to BAYES_95, now being classified correctly; but all the ones
I described were misclassified by the latest training run (which took
hours).

It seems a bit unfortunate, at least from my perspective, that it's
not so easy to train the weights for various rules on a per-user
basis, not just automatic textual features but things like
HTML_MESSAGE or T_REMOTE_IMAGE... There are algorithms to do this
reweighting very quickly - e.g. using a logistic GLM which should take
seconds - while also incorporating some prior beliefs, corresponding
to default weights. But I think nobody in the machine learning
community seems to be really interested in the problem of spam... It
would also be nice to see what the Bayesian classifier is doing, but
the database is all hashes so one is left guessing when it goes wrong.

Of course, even Gmail's spam classifier is pretty bad in my
experience. I'm still waiting for a "semi-supervised" or "active
learning" solution, which can take some large corpora and query me
only about labels of the boundary cases. Maybe Google has already
tried this and it doesn't work for some reason that escapes my
imagination.

Given that Spamassassin has gone through the effort of coding up so
many useful rules, it should be easy for a machine learning researcher
to take these use them as features in a more modern algorithm. Maybe
I'm being totally unhelpful by pointing this out, in which case I
apologise for not knowing better. I think I tried to make the same
point here some years ago and it didn't go anywhere.

Best regards,

Frederick

Here's the sample spam:

From tfioxmns...@mariupol.us  Fri Dec 16 20:30:08 2016
Return-Path: 
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on thutmose
X-Spam-Level: ***
X-Spam-Status: No, score=4.0 required=5.0 tests=BAYES_50,
HEADER_FROM_DIFFERENT_DOMAINS,HELO_DYNAMIC_IPADDR,HTML_MESSAGE,
MIME_QP_LONG_LINE,RDNS_DYNAMIC,T_REMOTE_IMAGE,T_SPF_HELO_TEMPERROR,
T_SPF_TEMPERROR autolearn=no autolearn_force=no version=3.4.1  
X-Original-To: frede...@ofb.net
Delivered-To: frede...@ofb.net
Received: from host-173-230-94-183.fltapsf.clients.pavlovmedia.com
(host-173-230-94-183.fltapsf.clients.pavlovmedia.com 
[173.230.94.183])
by ofb.net (Postfix) with SMTP id 1CF1D3FFB7
for ; Fri, 16 Dec 2016 20:30:07 -0800 (PST)
Message-ID: <756871361203-qgaxslpamnpdlkenbja...@pyzgb78.ezmicro.com>
From: Alexandra Smith 
Subject: Re: 1 Instacheat Request is Pending
To: frede...@ofb.net
Date: Sat, 17 Dec 2016 10:25:25 +0600
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="_av-gJPw4bNeVCqYrAQlhC5agA"
X-My-Tags: inbox

Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

*1 Instacheat Request is Pending*
  ❤ ❤ ❤

--_av-Hmri4xobxH07rQj8ufhPIg
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable


On Thu, Dec 15, 2016 at 08:42:36AM -0800, John Hardin wrote:
> On Thu, 15 Dec 2016, frede...@ofb.net wrote:
> 
> > sudo -u spamd sa-learn --showdots -D 1 --ham --dir ~/mail/folders/inbox
> 
> Bad idea. That learns as ham any FNs you haven't yet noticed and removed
> from your inbox.
> 
> You should only learn as ham messages that you have explicitly reviewed and
> judged as ham.
> 
> -- 
>  John Hardin KA7OHZ   

Re: recent increase in spam getting through

2016-12-16 Thread Kevin A. McGrail
Hi Marc, I would say off hand that amavis and mailscanner aren't the same thing 
as mimedefang.

Sure they can strap in clamd and spamd but they are more products than 
frameworks.

Mimedefang would likely frustrate non programmers because it doesn't strap 
things in by default and using it you need to write code pretty well.

Amavisd-new is like an amazingly good paint by numbers set while MD is a set of 
professional paints brushes and canvas.  Both can make amazing paintings (or 
screw them up royally)...

So MD is likely bad if you want to just inspect zips and scan for spam and rely 
on those items.  But wow can I do a lot with items that is way outside the box.

Anyway likely treading into religious territory so before I trigger a war, let 
me say I like both MD and amavisdnew.  I haven't use mailscanner much but have 
always respected Julian's work. 

They all very much have a place in the world of fighting basted spammers!
Regards,
KAM

On December 16, 2016 5:54:16 AM EST, "Marc Stürmer"  
wrote:
>Am 2016-12-15 19:56, schrieb Ian Zimmerman:
>
>> By now I have heard of MIMEDefang many times, and each time I wanted
>to
>> try it.  But it seems to require the milter interface in the MTA
>> (ie. sendmail or _maybe_ postfix), and I'm married to Exim. :-(
>
>Well, MIMEDefang is not the only kid on the block doing this.
>
>More popular are:
>
>* amavisd-new, which works with Postfix as well as Exim 
>(https://www.amavis.org) and
>* MailScanner, which works with both MTAs as well 
>(https://www.mailscanner.info).
>
>Both are able to integrate Clamav/another virus scanning engine and to 
>inspect and handle archive files.
>
>-- 
>Mit freundlichen Grüßen,
>
>MARC STÜRMER
>Ahornstraße 5 | 97506 Grafenrheinfeld
>Tel: 09723 9313969 | Mobil: 0174 3393862 | Fax: 09723 9314810 |
>m...@marc-stuermer.de


Re: recent increase in spam getting through

2016-12-16 Thread Marc Stürmer

Am 2016-12-15 19:56, schrieb Ian Zimmerman:


By now I have heard of MIMEDefang many times, and each time I wanted to
try it.  But it seems to require the milter interface in the MTA
(ie. sendmail or _maybe_ postfix), and I'm married to Exim. :-(


Well, MIMEDefang is not the only kid on the block doing this.

More popular are:

* amavisd-new, which works with Postfix as well as Exim 
(https://www.amavis.org) and
* MailScanner, which works with both MTAs as well 
(https://www.mailscanner.info).


Both are able to integrate Clamav/another virus scanning engine and to 
inspect and handle archive files.


--
Mit freundlichen Grüßen,

MARC STÜRMER
Ahornstraße 5 | 97506 Grafenrheinfeld
Tel: 09723 9313969 | Mobil: 0174 3393862 | Fax: 09723 9314810 |
m...@marc-stuermer.de


Re: recent increase in spam getting through

2016-12-15 Thread RW
On Thu, 15 Dec 2016 20:20:02 +
David Jones wrote:

> >From: Martin Gregorie <mar...@gregorie.org>
> >Sent: Thursday, December 15, 2016 1:39 PM
> >To: users@spamassassin.apache.org
> >Subject: Re: recent increase in spam getting through  
>     
> >On Thu, 2016-12-15 at 18:23 +, David Jones wrote:  
> >> There are many valuable SMTP realtime checks that must be done at
> >> the edge MTA.  Since you don't have control of this, then you have
> >> to resort to tuning SA constantly which is a never-ending game of
> >> cat-n-mouse since spam changes characteristics all of the time.
> >>   
> >It doen't *have* to be done at the edge MTA provided you are happy to
> >accept and then bin the junk rather than rejecting it. My system has
> >been working this way for years..  
> 
> True but one would have to know to put your ISP's mail server range
> into the trusted_networks/internal_networks in SA.  

If you are using getmail/fetchmail it commonly just works. SA has
explicit support for fetchmail, and getmail headers are unparseable.
Either way there is typically a chain of private and localhost IP
addresses up to the MX server.


> If you pull email later from an ISP mailbox, then RBLs
> could have changed during that time.  

Actually RBLs and other network rules are much more effective with a
delay. That's why problem FN's that are posted here usually get huge
scores when retested. I find that about half the spam that I download
with getmail hits RCVD_IN_XBL even though its already been through an
MTA XBL check (including a variable greylisting delay). 

A secondary advantage of the higher scores is that very little spam
ends up with a score close to 5, so if you have a separate folder for
high-scoring spam, any FPs stand-out much more clearly.

> Also the DNS server used by
> client running SA post-MTA could cause the dreaded URIBL_BLOCKED
> hit.  In my opinion, it makes a complex software twice as complex to
> run it post-MTA.

Avoiding URIBL_BLOCKED is something you need to do when you run
SpamAssassin irrespective of how your mail arrives. Setting-up
resolver+SA is not twice as hard as setting-up resolver+SA+MTA.



Re: recent increase in spam getting through

2016-12-15 Thread David Jones
>From: Martin Gregorie <mar...@gregorie.org>
>Sent: Thursday, December 15, 2016 1:39 PM
>To: users@spamassassin.apache.org
>Subject: Re: recent increase in spam getting through
    
>On Thu, 2016-12-15 at 18:23 +, David Jones wrote:
>> There are many valuable SMTP realtime checks that must be done at
>> the edge MTA.  Since you don't have control of this, then you have to
>> resort to tuning SA constantly which is a never-ending game of
>> cat-n-mouse since spam changes characteristics all of the time.
>> 
>It doen't *have* to be done at the edge MTA provided you are happy to
>accept and then bin the junk rather than rejecting it. My system has
>been working this way for years..

True but one would have to know to put your ISP's mail server range into
the trusted_networks/internal_networks in SA.  This is advanced SA
knowledge that takes a while to learn by experience or by this mailing list.
The typical person that is using SA to block spam on a single personal
mailbox is not going to know all of the tweaks that have to be done to
SA configs when retrieving email like this.

What makes SA so powerful is it's flexibility.  This flexibility also makes SA
so hard for most to wrap their heads around.  It's also hard to document
due to this flexibility.  There are so many ways to "glue" SA into the mail
flow which changes SA's perspective for things like "lastexternal" checks
that need to be setup properly to fully take advantage of SA's built-in RBL
rules.

It's still best to do RBL and DNS checks at the MTA in realtime and reject the
email so legit senders get feedback that the email was not delivered.  If you
pull email later from an ISP mailbox, then RBLs could have changed during
that time.  Also the DNS server used by client running SA post-MTA could
cause the dreaded URIBL_BLOCKED hit.  In my opinion, it makes a complex
software twice as complex to run it post-MTA.

Dave


Re: recent increase in spam getting through

2016-12-15 Thread Martin Gregorie
On Thu, 2016-12-15 at 18:23 +, David Jones wrote:
> There are many valuable SMTP realtime checks that must be done at
> the edge MTA.  Since you don't have control of this, then you have to
> resort to tuning SA constantly which is a never-ending game of
> cat-n-mouse since spam changes characteristics all of the time.
> 
It doen't *have* to be done at the edge MTA provided you are happy to
accept and then bin the junk rather than rejecting it. My system has
been working this way for years:

- I use getmail to retrieve mail from my mailbox at my ISP and use a
  locally written script as getmail's MDA.

- My MDA script calls spamc to run each message through spamd and then 
  passes it to my 'spamkiller' program. This throws spam into a
  quarantine directory and passes ham to Postfix via Postfix.sendmail
  for delivery within my local Lan
 
- I have a cron job that summarises what's in quarantine and deletes
  any spam that's over 7 days old

- I have a logwatch service that analyses spamd and spamkiller log
  entries on a daily basis.
 
If this approach looks like it might be useful for you, visit 
http://www.libelle-systems.com/free/ and take a look at the three
entries under Spamassassin (portmanteau, spamkiller and spamscan).
All are downloadable source tarballs.


Martin



Re: recent increase in spam getting through

2016-12-15 Thread Benny Pedersen

Ian Zimmerman skrev den 2016-12-15 19:56:

On 2016-12-15 11:32, Kevin A. McGrail wrote:


I'm a fan of MIMEDefang but I am not very familiar with Arch Linux so
I don't know what mta you are using nor it's capabilities.


By now I have heard of MIMEDefang many times, and each time I wanted to
try it.  But it seems to require the milter interface in the MTA
(ie. sendmail or _maybe_ postfix), and I'm married to Exim. :-(


exim > smtp-proxy > sendmail listing only on loopback ip and port > 
sendmail-milter > mimedefang-milter


would make it possible to use any milters in exim via smtp-proxy

imho there is work in progress to make some sort of milters to work in 
exim


i admit it complicated with the above, but it indeed possible


Re: recent increase in spam getting through

2016-12-15 Thread Larry Rosenman

On 2016-12-15 12:56, Ian Zimmerman wrote:

On 2016-12-15 11:32, Kevin A. McGrail wrote:


I'm a fan of MIMEDefang but I am not very familiar with Arch Linux so
I don't know what mta you are using nor it's capabilities.


By now I have heard of MIMEDefang many times, and each time I wanted to
try it.  But it seems to require the milter interface in the MTA
(ie. sendmail or _maybe_ postfix), and I'm married to Exim. :-(
I have RBLs, ClamAV and SpamAssassin working quite well with Exim on my 
FreeBSD mail server,

FWIW.

I'm willing to share config if anyone's interested.

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281


Re: recent increase in spam getting through

2016-12-15 Thread Ian Zimmerman
On 2016-12-15 11:32, Kevin A. McGrail wrote:

> I'm a fan of MIMEDefang but I am not very familiar with Arch Linux so
> I don't know what mta you are using nor it's capabilities.

By now I have heard of MIMEDefang many times, and each time I wanted to
try it.  But it seems to require the milter interface in the MTA
(ie. sendmail or _maybe_ postfix), and I'm married to Exim. :-(

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html


Re: recent increase in spam getting through

2016-12-15 Thread Kevin A. McGrail

> There are many valuable SMTP realtime checks that must be done at
> the edge MTA.  Since you don't have control of this, then you have to
> resort to tuning SA constantly which is a never-ending game of
> cat-n-mouse since spam changes characteristics all of the time.

That was unfortunately my takeaway as well.  A lot of spam is rejected 
before it gets to SA.




Re: recent increase in spam getting through

2016-12-15 Thread David Jones

>From: frede...@ofb.net <frede...@ofb.net>
>Sent: Thursday, December 15, 2016 11:26 AM
>To: David Jones
>Cc: users@spamassassin.apache.org
>Subject: Re: recent increase in spam getting through
    
>I'm using a friend's MTA, which is perhaps the source of the recent
>change - I'll have to check what they are doing. All my mail goes to a
>spool directory in my home on "ofb.net" and then I have a script which
>transfers the files and puts them into a maildir on my laptop. That
>way I don't have to have an internet connection to search through old
>email, mailing lists, and so on.

>    Received: from [171.247.127.4] (unknown [171.247.127.4])
>  by ofb.net (Postfix) with ESMTP id 7BEB441DB1   
>    for <frede...@ofb.net>; Thu, 15 Dec 2016 06:01:58 -0800 (PST)
>    Date: Thu, 15 Dec 2016 06:02:07 -0700
>    To: frede...@ofb.net

Based on that received header IP, this should have easily been blocked
by RBL and DNS FCrDNS (RDNS_NONE rule in SA) checks:

http://multirbl.valli.org/lookup/171.247.127.4.html

I am not able to require a perfect FCrDNS lookup on my MTA but I do
require a PTR record to exist.  This message would have been rejected
by Postfix.

There are many valuable SMTP realtime checks that must be done at
the edge MTA.  Since you don't have control of this, then you have to
resort to tuning SA constantly which is a never-ending game of
cat-n-mouse since spam changes characteristics all of the time.

The best thing I have ever done to help with this cat-n-mouse game is to go
heavy on the IP reputation of the sending mail server which involves RBLs,
DNS FCrDNS, SPF, etc. which has to be done on the edge MTA.

Dave

Re: recent increase in spam getting through

2016-12-15 Thread frederik
Thank you, David.

Sorry I should have known to give you a more verbose listing of the
headers, I put one at the end for the "voicemail" spam.

I'm using a friend's MTA, which is perhaps the source of the recent
change - I'll have to check what they are doing. All my mail goes to a
spool directory in my home on "ofb.net" and then I have a script which
transfers the files and puts them into a maildir on my laptop. That
way I don't have to have an internet connection to search through old
email, mailing lists, and so on.

It looks like I have a lot of reading to do (or my admins). I had
thought that running spamassasin locally after I download my emails
would be sufficient - even preferable, since locally there is the
"Bayesian" database.

Thanks again,

Frederick

Return-Path: <voicemailand...@southcentralmachine.arcoxmail.com>
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on thutmose
X-Spam-Level:
X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_05,HTML_MESSAGE,
RDNS_NONE,T_SPF_TEMPERROR autolearn=no autolearn_force=no 
version=3.4.1
X-Original-To: frede...@ofb.net
Delivered-To: frede...@ofb.net
Received: from [171.247.127.4] (unknown [171.247.127.4])
by ofb.net (Postfix) with ESMTP id 7BEB441DB1   
for <frede...@ofb.net>; Thu, 15 Dec 2016 06:01:58 -0800 (PST)
Date: Thu, 15 Dec 2016 06:02:07 -0700
To: frede...@ofb.net
From: SureVoIP <voicemailand...@southcentralmachine.arcoxmail.com>
Subject: Voicemail from 08449381540 <08449381540> 00:03:15
Message-ID: <27d1c28da751b7dd3a731e04d7a620d4@localhost.localdomain>
X-Priority: 3
X-Mailer: PHPMailer 5.2.2 
(http://code.google.com/a/apache-extras.org/p/phpmailer/)
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="b1_27d1c28da751b7dd3a731e04d7a620d4"


On Thu, Dec 15, 2016 at 04:42:16PM +, David Jones wrote:
> >From: frede...@ofb.net <frede...@ofb.net>
> >Sent: Thursday, December 15, 2016 9:33 AM
> >To: users@spamassassin.apache.org
> >Subject: recent increase in spam getting through
>  
> >    X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_05,HTML_MESSAGE,
> >   RDNS_NONE,T_SPF_TEMPERROR autolearn=no autolearn_force=no 
> >version=3.4.1
> 
> >    Date: Thu, 15 Dec 2016 02:09:18 -0700
> >    From: %GIRL_NAME Lyon <lyon_%girl_n...@feuz.com>
> >    To: frede...@ofb.net
> >    Subject: Re: Healthy soul in healthy body. Order Celexa now.
> >    X-Spam-Status: No, score=3.3 required=5.0 tests=BAYES_50,BODY_URI_ONLY,
> >    HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,
> >    MIME_HTML_ONLY,RDNS_NONE,T_SPF_HELO_TEMPERROR,T_SPF_TEMPERROR 
> >autolearn=no
> >    autolearn_force=no version=3.4.1
> 
> Need to see the received headers to check RBLs.  Make sure you are doing
> RBL checks at the MTA.  If you are using Postfix, then enable Postscreen and 
> use it's
> postscreen_dnsbl_sites for weighting reliable RBLs high and unreliable RBLs 
> low.  There
> is a long thread on this in the archives.
> 
> http://marc.info/?l=spamassassin-users=146590518212907=2
> 
> Start with a short list like zen.spamhaus.org and mailspike then add new ones
> slowly over time until the email that hits SpamAssassin is mostly clean.  RBLs
> block 95% of the spam at the MTA level so my SpamAssassin only has to block
> a very small percentage of spam based on content (Subject, body, AV, etc.) 
> and Bayes.
> 
> I offset some of the RBLs with postwhite for major mail providers that are 
> often
> listed on RBLs but can't be blocked due to their size like comcast.net.  In 
> this
> case, I have to let them on to SpamAssassin for scoring.  As long as they 
> update
> their SPF record, then these will be let through but spoofers could be 
> blocked by
> RBLs:
> 
> https://github.com/stevejenkins/postwhite
> 
> Remember that it is very important to use you own recursive DNS server and not
> point to other DNS servers that will combine your DNS queries with others 
> which
> can be over the free usages limits set by the RBLs and cause URIBL_BLOCKED 
> hits.
> 
> http://marc.info/?l=spamassassin-users=147498536120314=2
> 
> Hope this helps,
> Dave


Re: recent increase in spam getting through

2016-12-15 Thread David Jones
>From: frede...@ofb.net <frede...@ofb.net>
>Sent: Thursday, December 15, 2016 9:33 AM
>To: users@spamassassin.apache.org
>Subject: recent increase in spam getting through
 
>    X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_05,HTML_MESSAGE,
>   RDNS_NONE,T_SPF_TEMPERROR autolearn=no autolearn_force=no 
>version=3.4.1

>    Date: Thu, 15 Dec 2016 02:09:18 -0700
>    From: %GIRL_NAME Lyon <lyon_%girl_n...@feuz.com>
>    To: frede...@ofb.net
>    Subject: Re: Healthy soul in healthy body. Order Celexa now.
>    X-Spam-Status: No, score=3.3 required=5.0 tests=BAYES_50,BODY_URI_ONLY,
>    HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,
>    MIME_HTML_ONLY,RDNS_NONE,T_SPF_HELO_TEMPERROR,T_SPF_TEMPERROR 
>autolearn=no
>    autolearn_force=no version=3.4.1

Need to see the received headers to check RBLs.  Make sure you are doing
RBL checks at the MTA.  If you are using Postfix, then enable Postscreen and 
use it's
postscreen_dnsbl_sites for weighting reliable RBLs high and unreliable RBLs 
low.  There
is a long thread on this in the archives.

http://marc.info/?l=spamassassin-users=146590518212907=2

Start with a short list like zen.spamhaus.org and mailspike then add new ones
slowly over time until the email that hits SpamAssassin is mostly clean.  RBLs
block 95% of the spam at the MTA level so my SpamAssassin only has to block
a very small percentage of spam based on content (Subject, body, AV, etc.) and 
Bayes.

I offset some of the RBLs with postwhite for major mail providers that are often
listed on RBLs but can't be blocked due to their size like comcast.net.  In this
case, I have to let them on to SpamAssassin for scoring.  As long as they update
their SPF record, then these will be let through but spoofers could be blocked 
by
RBLs:

https://github.com/stevejenkins/postwhite

Remember that it is very important to use you own recursive DNS server and not
point to other DNS servers that will combine your DNS queries with others which
can be over the free usages limits set by the RBLs and cause URIBL_BLOCKED hits.

http://marc.info/?l=spamassassin-users=147498536120314=2

Hope this helps,
Dave

Re: recent increase in spam getting through

2016-12-15 Thread John Hardin

On Thu, 15 Dec 2016, frede...@ofb.net wrote:


sudo -u spamd sa-learn --showdots -D 1 --ham --dir ~/mail/folders/inbox


Bad idea. That learns as ham any FNs you haven't yet noticed and removed 
from your inbox.


You should only learn as ham messages that you have explicitly reviewed 
and judged as ham.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  It is not the place of government to make right every tragedy and
  woe that befalls every resident of the nation.
---
 Today: Bill of Rights day


Re: recent increase in spam getting through

2016-12-15 Thread Kevin A. McGrail

On 12/15/2016 11:24 AM, frede...@ofb.net wrote:

No, I only run Spamassassin. I take it that 'clamav' would improve
things.
I don't have numbers in front of me, but these malicious payloads with 
zips are quite common but we don't

What do you mean "if you are using an engine that can do it"?
SpamAssassin's power is that it is an API and a program.  There are 
numerous programs that integrate SpamAssassin, clamav, and other 
technologies to improve your e-mail.


I'm a fan of MIMEDefang but I am not very familiar with Arch Linux so I 
don't know what mta you are using nor it's capabilities.


Regards,
KAM


Re: recent increase in spam getting through

2016-12-15 Thread frederik
Hi Kevin,

Thanks for your reply.

On Thu, Dec 15, 2016 at 11:07:33AM -0500, Kevin A. McGrail wrote:
> On 12/15/2016 10:33 AM, frede...@ofb.net wrote:
> > Dear Spamassassin,
> > 
> > I've seen a recent increase in spam getting through Spamassassin...
> > I've been getting groups of spam messages which have the same subject,
> > often with zip attachments. Here's a screenshot from Mutt:
> It's an interesting intersection of email security, malware scanning and
> antispam that I work with continuously.
> 
> Are you doing any AV scanning on your email?
> 
> Such as clamav before you run SA?

No, I only run Spamassassin. I take it that 'clamav' would improve
things.

> Or if you are using an engine that can do it, you can look at doing zip
> parsing, blocking executables and dangerous payloads.

What do you mean "if you are using an engine that can do it"?

Thanks,

Frederick


Re: recent increase in spam getting through

2016-12-15 Thread Kevin A. McGrail

On 12/15/2016 10:33 AM, frede...@ofb.net wrote:

Dear Spamassassin,

I've seen a recent increase in spam getting through Spamassassin...
I've been getting groups of spam messages which have the same subject,
often with zip attachments. Here's a screenshot from Mutt:
It's an interesting intersection of email security, malware scanning and 
antispam that I work with continuously.


Are you doing any AV scanning on your email?

Such as clamav before you run SA?

Or if you are using an engine that can do it, you can look at doing zip 
parsing, blocking executables and dangerous payloads.


Regards,
KAM


recent increase in spam getting through

2016-12-15 Thread frederik
Dear Spamassassin,

I've seen a recent increase in spam getting through Spamassassin...
I've been getting groups of spam messages which have the same subject,
often with zip attachments. Here's a screenshot from Mutt:

36604 N * Dec 15 %GIRL_NAME Lyon (0.2K) Re: Healthy soul in healthy body. 
Order Celexa now.
36605 N * Dec 15 Beta Consulting ( 49K) Opleiding Excel basis en/of 
gevorderd
36606 N * Dec 15 kneuper@grwsj.e ( 60K) Envío de factura PDF al CLIENTE
36607 N * Dec 15 Mona Dominguez  (4.9K) Order Receipt
36608 N * Dec 15 Hyman Walsh (4.9K) Order Receipt
36609 N * Dec 15 Ugg Boots   (9.0K) frede...@ofb.net,Free Shipping + 
Discounted Gift Ca
36610 N * Dec 15 SureVoIP( 12K) Voicemail from 08440635679 
<08440635679> 00:02:17
36611 N * Dec 15 Alberto (0.8K) Triple your gaming pleasure
36612 N * Dec 15 Harp-Approval A (1.4K) Can HARP help you save on your 
monthly home payment
36613 N * Dec 15 SureVoIP( 13K) Voicemail from 08445596415 
<08445596415> 00:02:13
36614 N * Dec 15 SureVoIP( 13K) Voicemail from 08437168032 
<08437168032> 00:02:44
36615 N * Dec 15 Medical Marijua (6.3K) CNN: Epileptic Seizures 
Dramatically Improved with
36616 N * Dec 15 SureVoIP( 13K) Voicemail from 08449381540 
<08449381540> 00:03:15
36617 N * Dec 15 SureVoIP( 13K) Voicemail from 08459518695 
<08459518695> 00:02:33
36618 N * Dec 15 SureVoIP( 13K) Voicemail from 08448469191 
<08448469191> 00:01:08
36619 N * Dec 15 SureVoIP( 13K) Voicemail from 08453192741 
<08453192741> 00:02:33
36620 N * Dec 15 SureVoIP( 13K) Voicemail from 08433847988 
<08433847988> 00:02:19
36621   * Dec 15 SureVoIP( 12K) Voicemail from 08428271866 
<08428271866> 00:02:48
36622 N * Dec 15 SureVoIP( 13K) Voicemail from 08482974918 
<08482974918> 00:03:45
36623 N * Dec 15 SureVoIP( 13K) Voicemail from 08401864200 
<08401864200> 00:01:51
36624 N * Dec 15 SureVoIP( 13K) Voicemail from 08457292679 
<08457292679> 00:02:41  

Here's a couple of headers:

Date: Thu, 15 Dec 2016 20:25:45 +0530
From: SureVoIP <voicemailand...@dubrovniktravel.hr>
To: frede...@ofb.net
Subject: Voicemail from 08457292679 <08457292679> 00:02:41
X-Mailer: PHPMailer 5.2.2 
(http://code.google.com/a/apache-extras.org/p/phpmailer/)
X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_05,HTML_MESSAGE,
RDNS_NONE,T_SPF_TEMPERROR autolearn=no autolearn_force=no 
version=3.4.1

Date: Thu, 15 Dec 2016 02:09:18 -0700
From: %GIRL_NAME Lyon <lyon_%girl_n...@feuz.com>
To: frede...@ofb.net
Subject: Re: Healthy soul in healthy body. Order Celexa now.
X-Spam-Status: No, score=3.3 required=5.0 tests=BAYES_50,BODY_URI_ONLY,
HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,
MIME_HTML_ONLY,RDNS_NONE,T_SPF_HELO_TEMPERROR,T_SPF_TEMPERROR 
autolearn=no
autolearn_force=no version=3.4.1

As you can see, I have trained the "Bayesian" filter but it isn't
recognizing the messages as spam. I run Arch Linux and I use the
following commands to do the training:

sudo -u spamd sa-learn --showdots -D 1 --spam --dir ~/mail/folders/spam
sudo -u spamd sa-learn --showdots -D 1 --ham --dir ~/mail/folders/inbox

I use spamc and spamd to do the filtering.

Any ideas? I don't have many legitimate emails with 'zip' attachments,
but I'm intimidated by the thought of going into the Spamassassin
config and tweaking the various parameters by hand.

Thanks,

Frederick