Re: unexpected FN, how to improve/tune to catch

2018-11-17 Thread Matus UHLAR - fantomas

On 15.11.18 09:42, Ian Zimmerman wrote:
>  # This one disables Bayes.  ...
> tiny detail. use_learner 0



On Fri, 16 Nov 2018 09:52:05 +0100 Matus UHLAR - fantomas wrote:

1. this description is invalid. use_bayes disables bayes.


On 16.11.18 14:13, RW wrote:

use_learner 0, in theory, disables all machine learning plug-ins.


I would prefer a more thorough explanation, can you please provide it?

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
LSD will make your ECS screen display 16.7 million colors


Re: unexpected FN, how to improve/tune to catch

2018-11-16 Thread RW
On Fri, 16 Nov 2018 08:48:56 -0800
Ian Zimmerman wrote:


> 1. Am I correct in assuming that SA decodes base64 MIME parts so it
> does act on these links?  Reading the -D output surely indicates so.

I think you've already answered that.
 
> 2. I remember some discussion here about following shortener links
> like bitly.  What is the resolution of that?  Does SA currently (as
> of 3.4.2) follow such links, to see (for example) that the link in my
> spample led to Facebook?
 
See

https://github.com/smfreegard/DecodeShortURLs

but Facebook isn't going to be listed anywhere. There is a test for
bitly links that have already been blocked.

> 3. The documentation for the HashBL plugin shows how to set it up to
> check addresses from headers.  Is there a way to also check addresses
> from mailto links in the body?  

It's already supposed to do that. It doesn't actually say it only
checks headers.

If you are thinking of:

header   HASHBL_EMAIL   eval:check_hashbl_emails('ebl.msbl.org')

AFAIK that 'header' is just a matter of syntax and  taxonomy and the
only practical difference it makes is that HASHBL_EMAIL's score counts
towards the 3 header points needed for auto-learning.


Re: unexpected FN, how to improve/tune to catch

2018-11-16 Thread Ian Zimmerman
On 2018-11-16 09:52, Matus UHLAR - fantomas wrote:

> such spam should be filtered at mailing list level before this happens.

And it almost always is.  Not in this case.

> what can help you

> - BAYES

understood, I am trying to do without Bayes for now, because I want to
avoid the maintenance (training and, especially, expiring).

> - network rules

those are on

> - URI blacklists

those are on

> did you enable/install razor, pyzor, dcc, spf and dkim libraries?

not dcc, but it would be useless in this case (mailing list is bulk by
definition).  The others are on.

> apparently it does not contain any URI.

It does.  Two web (bitly, masking a redirection to Facebook; plus
wecareusa) and one mailto.

Three followup questions about this last point:

1. Am I correct in assuming that SA decodes base64 MIME parts so it does
act on these links?  Reading the -D output surely indicates so.

2. I remember some discussion here about following shortener links like
bitly.  What is the resolution of that?  Does SA currently (as of 3.4.2)
follow such links, to see (for example) that the link in my spample led
to Facebook?

3. The documentation for the HashBL plugin shows how to set it up to
check addresses from headers.  Is there a way to also check addresses
from mailto links in the body?  If not now, is anything like that
planned for upcoming releases?

Thanks

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.


Re: unexpected FN, how to improve/tune to catch

2018-11-16 Thread RW
On Fri, 16 Nov 2018 09:52:05 +0100
Matus UHLAR - fantomas wrote:

> On 15.11.18 09:42, Ian Zimmerman wrote:

> >  # This one disables Bayes.  ...
> > tiny detail. use_learner 0  
> 
> 1. this description is invalid. use_bayes disables bayes.


use_learner 0, in theory, disables all machine learning plug-ins.  


> 2. bayes is the best to help you to detect spam. Don't complain when
> you have disabled it.
> 
> >Where are all the other scores?  I would have expected at least
> >something for bit.ly and for the misspelled closing line, which is a
> >dead spam give-away to a human ...  

It has a missing letter, I'm a poor typist, I miss letters often.
Spelling mistakes are most useful for Bayes, which you turned-off.


Re: unexpected FN, how to improve/tune to catch

2018-11-16 Thread Matus UHLAR - fantomas

On 15.11.18 09:42, Ian Zimmerman wrote:

This little pearl got through upstream filter on a mailing list.


such spam is very hard to detect, because mailing lists tend to clear
negative-scoring rules and add some positive-scoring.

such spam should be filtered at mailing list level before this happens.


My scores for it were:

 RCVD_IN_DNSWL_MED=-2.3,SPF_HELO_PASS=-0.0,MAILING_LIST_MULTI=-1.0,TOTAL=-3.3


these are standard rules, and since the mail came from a mailing list, it's
expected to score negatively.

what can help you
- BAYES
- network rules
- URI blacklists

Do you have those enabled?


Here is my user_prefs file:

 # This one disables Bayes.  If you want to use Bayes remove or comment
 # out this line.  You'll need to manage your Bayes database with a
 # cronjob or something.  I can help but I won't do the last tiny detail.
 use_learner 0


1. this description is invalid. use_bayes disables bayes.

2. bayes is the best to help you to detect spam. Don't complain when you
have disabled it.


Where are all the other scores?  I would have expected at least
something for bit.ly and for the misspelled closing line, which is a
dead spam give-away to a human ...


did you enable/install razor, pyzor, dcc, spf and dkim libraries?


I have run spamassassin -D on it and everything seems to work as
designed i.e. the tests including URIBL run fine, they just don't catch
anything.  It's disappointing.


apparently it does not contain any URI.


Maybe the KAM rules would have got this one?


no. They can help, but hardly help you to push -3.3 scoring mail received
via mailing list over spam threshold.


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Chernobyl was an Windows 95 beta test site.


unexpected FN, how to improve/tune to catch

2018-11-15 Thread Ian Zimmerman
This little pearl got through upstream filter on a mailing list.

  https://pastebin.com/JhDGvAAA

I show the body only, but the MIME headers were:

  Content-Transfer-Encoding: base64
  Content-Type: text/plain; charset="utf-8"; Format="flowed"

Also:

  From: yourfrugalstore 
  Message-ID: <88ca9f91-131f-e584-3331-074c5139c...@yourfrugalstore.club>

My scores for it were:

  RCVD_IN_DNSWL_MED=-2.3,SPF_HELO_PASS=-0.0,MAILING_LIST_MULTI=-1.0,TOTAL=-3.3

Here is my user_prefs file:

  # This one disables Bayes.  If you want to use Bayes remove or comment
  # out this line.  You'll need to manage your Bayes database with a
  # cronjob or something.  I can help but I won't do the last tiny detail.
  use_learner 0

  # This means spamassassin will just add headers to the message, and not
  # wrap it as an attachment in a new message.
  report_safe 0

  # Tells spamassassin which Received headers it can trust not to be
  # forged.  In our case, it is a single address, the public address of
  # the server.
  clear_trusted_networks
  trusted_networks 12.34.56.78/32

  # This is not really needed but I included it to be explicit
  clear_dns_servers
  dns_server 127.0.0.1

  # Set the backend library for geoip functionality

  country_db_type GeoIP

Where are all the other scores?  I would have expected at least
something for bit.ly and for the misspelled closing line, which is a
dead spam give-away to a human ...

I have run spamassassin -D on it and everything seems to work as
designed i.e. the tests including URIBL run fine, they just don't catch
anything.  It's disappointing.

Maybe the KAM rules would have got this one?

-- 
Please don't Cc: me privately on mailing lists and Usenet,
if you also post the followup to the list or newsgroup.
To reply privately _only_ on Usenet and on broken lists
which rewrite From, fetch the TXT record for no-use.mooo.com.