Re: New SA install, configuring for retraining on false positives

2015-11-05 Thread Axb

On 11/05/2015 12:52 PM, David Mehler wrote:

It's looking like I have several options, MailScanner which hooks in
to SA, Amavisd-new ditto, or SA as a milter called directly from my
MTA. Comments on these or other methods?


Mailscanner and Postfix is a hack - it works BUT
Amavisd-new is good and very well supported...
Both in Perl...

If you want something lightweight and no Perl dependency party.

http://fuglu.org

Axb




New SA install, configuring for retraining on false positives

2015-11-05 Thread David Mehler
Hello,

I've got a Postfix email server going with a Mysql database backend on
FreeBSD 10.2. I'm now wanting to add Spamassassin to the picture and
am wondering current best practices? It's been a number of years since
I did it and last time effectiveness wasn't so good. I'm not sure if
it was because I was following old information or didn't have things
done right configuration wise?

It's looking like I have several options, MailScanner which hooks in
to SA, Amavisd-new ditto, or SA as a milter called directly from my
MTA. Comments on these or other methods?

I'm also wanting to get the latest antispam rules, are those from SA
or are there third party rules I should look into?

Finally, one of the things I'm going to implement in addition to SA is
Sieve, done with my MDA Dovecot, in which mail flagged witha spam
header is automatically moved in to a dedicated spam folder. I am then
wanting to set up a system to tell SA when it has misclassified a
false positive, what are people using in that environment?

Any other user feedback appreciated.

Thanks.
Dave.


Re: New SA install, configuring for retraining on false positives

2015-11-05 Thread Reindl Harald



Am 05.11.2015 um 12:52 schrieb David Mehler:

Finally, one of the things I'm going to implement in addition to SA is
Sieve, done with my MDA Dovecot, in which mail flagged witha spam
header is automatically moved in to a dedicated spam folder. I am then
wanting to set up a system to tell SA when it has misclassified a
false positive, what are people using in that environment?


you train ham exactly the same way as spam, in fact *you must* train 
enough ham to get bayes working properly




signature.asc
Description: OpenPGP digital signature


Re: New SA install, configuring for retraining on false positives

2015-11-05 Thread Bill Cole

On 5 Nov 2015, at 6:52, David Mehler wrote:


Hello,

I've got a Postfix email server going with a Mysql database backend on
FreeBSD 10.2. I'm now wanting to add Spamassassin to the picture and
am wondering current best practices? It's been a number of years since
I did it and last time effectiveness wasn't so good. I'm not sure if
it was because I was following old information or didn't have things
done right configuration wise?

It's looking like I have several options, MailScanner which hooks in
to SA,


But sadly uses an explicitly unsupported interface to Postfix that 
carries a real risk of breakage. This has resulted in openly hostility 
between MailScanner and Postfix developers that is obviously not what 
you want when selecting tools to integrate.



Amavisd-new ditto,


The canonical content filtering tool for Postfix. Works, has lots of 
users, never a really wrong choice. It may give you enough rope to hang 
yourself with but you'd need to be a skilled hangman.



or SA as a milter called directly from my
MTA.


There is no such thing: SA is not a milter.

But of course there *are* multiple milters that can support SA. If 
you're comfortable writing Perl and like the idea of being able to 
implement your own custom filtering tweaks that don't fit well as SA 
rules, Consider MIMEDefang, a milter that also acts as a hub for other 
tools (such as AV software) similarly to amavisd-new and is configured 
by a set of Perl subroutine implementations that you can modify 
arbitrarily. This is a nice fit for logic that can't be done in Postfix 
itself because of the limitations on its internal content filters & 
policy controls but which don't make sense to implement as custom SA 
rules. MD definitely gives you more than enough rope to hang yourself 
with. To overextend that metaphor, it provides you a working rope 
factory designed for versatile retooling.


Other milters that do nothing but provide simple SA-MTA plumbing may 
make sense for small systems with limited memory or if you have a hatred 
for Perl (which makes SA problematic too...) but otherwise it is hard to 
see an advantage to them over MD or Amavis.






Re: New SA install, configuring for retraining on false positives

2015-11-05 Thread Joe Quinn

On 11/5/2015 1:44 PM, Reindl Harald wrote:



Am 05.11.2015 um 19:24 schrieb Bill Cole:

On 5 Nov 2015, at 6:52, David Mehler wrote:

or SA as a milter called directly from my
MTA.


There is no such thing: SA is not a milter


tell that our spamass-milter setup running for more than a year now 
rejecting 99% of junk at MTA level (the piece making it through 
postscreen, spf, ptr/helo, greylisting and what not) with nearly zero 
false positives


Just to make sure it's clear, because this is an easy bit of authorship 
to be confused by:
David is correct in that spamass-milter isn't spamassassin. It's 
credited on their manpage to Georg C. F. Greve  and Dan 
Nelson  and GPLv2 licensed.


It falls under the snipped bit of the previous email, milters that can 
support SA. There's an overlap in concerns that makes it often on-topic 
here (much like with MD / Postfix / other elements of the mail stack), 
but we aren't the developers.


Re: New SA install, configuring for retraining on false positives

2015-11-05 Thread Reindl Harald



Am 05.11.2015 um 20:26 schrieb Joe Quinn:

On 11/5/2015 1:44 PM, Reindl Harald wrote:


Am 05.11.2015 um 19:24 schrieb Bill Cole:

On 5 Nov 2015, at 6:52, David Mehler wrote:

or SA as a milter called directly from my
MTA.


There is no such thing: SA is not a milter


tell that our spamass-milter setup running for more than a year now
rejecting 99% of junk at MTA level (the piece making it through
postscreen, spf, ptr/helo, greylisting and what not) with nearly zero
false positives


Just to make sure it's clear, because this is an easy bit of authorship
to be confused by:
David is correct in that spamass-milter isn't spamassassin. It's
credited on their manpage to Georg C. F. Greve  and Dan
Nelson  and GPLv2 licensed.

It falls under the snipped bit of the previous email, milters that can
support SA. There's an overlap in concerns that makes it often on-topic
here (much like with MD / Postfix / other elements of the mail stack),
but we aren't the developers


i know that all, but "there is no such thing" is wrong



signature.asc
Description: OpenPGP digital signature


Re: New SA install, configuring for retraining on false positives

2015-11-05 Thread Reindl Harald



Am 05.11.2015 um 19:24 schrieb Bill Cole:

On 5 Nov 2015, at 6:52, David Mehler wrote:

or SA as a milter called directly from my
MTA.


There is no such thing: SA is not a milter


tell that our spamass-milter setup running for more than a year now 
rejecting 99% of junk at MTA level (the piece making it through 
postscreen, spf, ptr/helo, greylisting and what not) with nearly zero 
false positives




signature.asc
Description: OpenPGP digital signature


Re: why: auto-learn? no: scored as spam but autolearn wanted ham

2015-11-05 Thread Benny Pedersen

On November 5, 2015 3:54:25 PM Matthias Apitz  wrote:


...
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on c720-r276659
X-Spam-Flag: YES
X-Spam-Level: **
X-Spam-Status: Yes, score=1000.0 required=3.0 tests=GTUBE,NO_RECEIVED,
NO_RELAYS autolearn=no autolearn_force=no version=3.4.0
X-Spam-Report: ++
* 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email
* -0.0 NO_RELAYS Informational: message was not relayed via SMTP
* -0.0 NO_RECEIVED Informational: message has no Received
* headers
...

Why auto-learn wants the mail as HAM?


where did you see this ?, GTUBE disables autolearn


Re: why: auto-learn? no: scored as spam but autolearn wanted ham

2015-11-05 Thread Matthias Apitz
El día Thursday, November 05, 2015 a las 04:24:04PM +0100, John Wilcock 
escribió:

> Le 05/11/2015 15:54, Matthias Apitz a écrit :
> > X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on c720-r276659
> > X-Spam-Flag: YES
> > X-Spam-Level: **
> > X-Spam-Status: Yes, score=1000.0 required=3.0 tests=GTUBE,NO_RECEIVED,
> >  NO_RELAYS autolearn=no autolearn_force=no version=3.4.0
> > X-Spam-Report: ++
> >  * 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email
> >  * -0.0 NO_RELAYS Informational: message was not relayed via SMTP
> >  * -0.0 NO_RECEIVED Informational: message has no Received
> >  * headers
> > ...
> >
> > Why auto-learn wants the mail as HAM?
> 
> Because autolearning ignores rules with the noautolearn, userconf or 
> learn tflags set (and uses the scores from scoreset 0 or 1).
> 
> ...

Thanks for all explanations. I now have a better understanding of the
autolearning process. Please, can someone forward me off-list (gzip'ed with
complete header lines) a SPAM which resulted in autolearn=spam.

Thanks in advance.

matthias
-- 
Matthias Apitz, ✉ g...@unixarea.de,  http://www.unixarea.de/  ☎ 
+49-176-38902045


Re: why: auto-learn? no: scored as spam but autolearn wanted ham

2015-11-05 Thread Reindl Harald

* 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email
https://en.wikipedia.org/wiki/GTUBE

Am 05.11.2015 um 15:54 schrieb Matthias Apitz:

This is with version 3.4.0 on FreeBSD 11-CURRENT. If I run with the
sample file:

$ spamassassin -tD < Mail-SpamAssassin-3.4.0/sample-spam.txt

it says on STDERR:
...
nov  5 15:47:54.521 [3855] dbg: learn: auto-learn: currently using scoreset 1
nov  5 15:47:54.521 [3855] dbg: learn: auto-learn: message score: 999.998, 
computed score for autolearn: 0
nov  5 15:47:54.521 [3855] dbg: learn: auto-learn? ham=0.1, spam=12, 
body-points=0, head-points=0, learned-points=0
nov  5 15:47:54.521 [3855] dbg: learn: auto-learn? no: scored as spam but 
autolearn wanted ham
nov  5 15:47:54.521 [3855] dbg: check: is spam? score=999.998 required=3
...

and returns the mail with this header:

...
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on c720-r276659
X-Spam-Flag: YES
X-Spam-Level: **
X-Spam-Status: Yes, score=1000.0 required=3.0 tests=GTUBE,NO_RECEIVED,
 NO_RELAYS autolearn=no autolearn_force=no version=3.4.0
X-Spam-Report: ++
 * 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email
 * -0.0 NO_RELAYS Informational: message was not relayed via SMTP
 * -0.0 NO_RECEIVED Informational: message has no Received
 * headers
...

Why auto-learn wants the mail as HAM?




signature.asc
Description: OpenPGP digital signature


why: auto-learn? no: scored as spam but autolearn wanted ham

2015-11-05 Thread Matthias Apitz

Hello,

This is with version 3.4.0 on FreeBSD 11-CURRENT. If I run with the
sample file:

$ spamassassin -tD < Mail-SpamAssassin-3.4.0/sample-spam.txt

it says on STDERR:
...
nov  5 15:47:54.521 [3855] dbg: learn: auto-learn: currently using scoreset 1
nov  5 15:47:54.521 [3855] dbg: learn: auto-learn: message score: 999.998, 
computed score for autolearn: 0
nov  5 15:47:54.521 [3855] dbg: learn: auto-learn? ham=0.1, spam=12, 
body-points=0, head-points=0, learned-points=0
nov  5 15:47:54.521 [3855] dbg: learn: auto-learn? no: scored as spam but 
autolearn wanted ham
nov  5 15:47:54.521 [3855] dbg: check: is spam? score=999.998 required=3
...

and returns the mail with this header:

...
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on c720-r276659
X-Spam-Flag: YES
X-Spam-Level: **
X-Spam-Status: Yes, score=1000.0 required=3.0 tests=GTUBE,NO_RECEIVED,
NO_RELAYS autolearn=no autolearn_force=no version=3.4.0
X-Spam-Report: ++
* 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email
* -0.0 NO_RELAYS Informational: message was not relayed via SMTP
* -0.0 NO_RECEIVED Informational: message has no Received
* headers
...

Why auto-learn wants the mail as HAM?

matthias



-- 
Matthias Apitz, ✉ g...@unixarea.de,  http://www.unixarea.de/  ☎ 
+49-176-38902045


Re: why: auto-learn? no: scored as spam but autolearn wanted ham

2015-11-05 Thread Matthias Apitz
El día Thursday, November 05, 2015 a las 03:57:01PM +0100, Reindl Harald 
escribió:

> * 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email
> https://en.wikipedia.org/wiki/GTUBE

Maybe because you are top posting you have not read my question, at
lease you have not answered it.

> nov  5 15:47:54.521 [3855] dbg: learn: auto-learn? no: scored as spam but 
> autolearn wanted ham
> nov  5 15:47:54.521 [3855] dbg: check: is spam? score=999.998 required=3  
>  

> > X-Spam-Status: Yes, score=1000.0 required=3.0 tests=GTUBE,NO_RECEIVED,
> >  NO_RELAYS autolearn=no autolearn_force=no version=3.4.0
> > X-Spam-Report: ++
> >  * 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email
> >  * -0.0 NO_RELAYS Informational: message was not relayed via SMTP
> >  * -0.0 NO_RECEIVED Informational: message has no Received
> >  * headers
> > ...
> >
> > Why auto-learn wants the mail as HAM?

Again, why it wants to declare the SPAM message as autolearn=ham?

matthias
-- 
Matthias Apitz, ✉ g...@unixarea.de,  http://www.unixarea.de/  ☎ 
+49-176-38902045


Re: why: auto-learn? no: scored as spam but autolearn wanted ham

2015-11-05 Thread Kris Deugau
Matthias Apitz wrote:
> This is with version 3.4.0 on FreeBSD 11-CURRENT. If I run with the
> sample file:
> 
> $ spamassassin -tD < Mail-SpamAssassin-3.4.0/sample-spam.txt

> Why auto-learn wants the mail as HAM?

> it says on STDERR:
> ...
> nov  5 15:47:54.521 [3855] dbg: learn: auto-learn: currently using scoreset 1
> nov  5 15:47:54.521 [3855] dbg: learn: auto-learn: message score: 999.998, 
> computed score for autolearn: 0

This line reports the score used to decide which direction to autolearn.
 There are a number of conditions that mean the "normal" score on the
message is not the one used to decide on autolearn.

> nov  5 15:47:54.521 [3855] dbg: learn: auto-learn? ham=0.1, spam=12, 
> body-points=0, head-points=0, learned-points=0

This line reports the current thresholds for autolearn.

0 < 0.1, so if the message is to be autolearned, it should be learned as
ham.

> nov  5 15:47:54.521 [3855] dbg: learn: auto-learn? no: scored as spam but 
> autolearn wanted ham

This line reports that the live score (note, not the score used to
decide how to autolearn) scored as spam, so the message will not be
autolearned at all.

See the man page for Mail::SpamAssassin::Plugin::AutoLearnThreshold for
the full set of details.

-kgd


Re: why: auto-learn? no: scored as spam but autolearn wanted ham

2015-11-05 Thread John Wilcock

Le 05/11/2015 15:54, Matthias Apitz a écrit :

X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on c720-r276659
X-Spam-Flag: YES
X-Spam-Level: **
X-Spam-Status: Yes, score=1000.0 required=3.0 tests=GTUBE,NO_RECEIVED,
 NO_RELAYS autolearn=no autolearn_force=no version=3.4.0
X-Spam-Report: ++
 * 1000 GTUBE BODY: Generic Test for Unsolicited Bulk Email
 * -0.0 NO_RELAYS Informational: message was not relayed via SMTP
 * -0.0 NO_RECEIVED Informational: message has no Received
 * headers
...

Why auto-learn wants the mail as HAM?


Because autolearning ignores rules with the noautolearn, userconf or 
learn tflags set (and uses the scores from scoreset 0 or 1).


Without GTUBE, this message would have had a score below the default 
autolearn ham threshold of 0.1 and would thus have been learnt as ham. 
For safety, however, SA checks the autolearn score against the actual 
classification before it goes ahead with the learning process.


--
John


Re: New SA install, configuring for retraining on false positives

2015-11-05 Thread David Jones
>From: David Mehler 
>Sent: Thursday, November 5, 2015 5:52 AM
>To: users@spamassassin.apache.org
>Subject: New SA install, configuring for retraining on false positives

>Hello,

>I've got a Postfix email server going with a Mysql database backend on
>FreeBSD 10.2. I'm now wanting to add Spamassassin to the picture and
>am wondering current best practices? It's been a number of years since
>I did it and last time effectiveness wasn't so good. I'm not sure if
>it was because I was following old information or didn't have things
>done right configuration wise?

>It's looking like I have several options, MailScanner which hooks in
>to SA, Amavisd-new ditto, or SA as a milter called directly from my
>MTA. Comments on these or other methods?

>I'm also wanting to get the latest antispam rules, are those from SA
>or are there third party rules I should look into?

>Finally, one of the things I'm going to implement in addition to SA is
>Sieve, done with my MDA Dovecot, in which mail flagged witha spam
>header is automatically moved in to a dedicated spam folder. I am then
>wanting to set up a system to tell SA when it has misclassified a
>false positive, what are people using in that environment?

>Any other user feedback appreciated.

If you don't want to mess with setting this up yourself from scratch, I
recommend using an EFA box in front and have it deliver to your
Dovecot box.
http://efa-project.org/
You would smarthost everything outbound from the Dovecot box back
through the EFA box to see both directions in the web interface for
very nice reporting capabilities.
You can train SA spam/ham easily from within the web interface.
This is a prebuilt MailScanner VM that has everything setup nicely
and can keep itself updated/patched very easily.

Dave J.

>Thanks.
>Dave.