Re: learn ham

2017-01-05 Thread Shawn Bakhtiar

> On Jan 5, 2017, at 8:54 AM, Dave Funk  wrote:
> 
> On Thu, 5 Jan 2017, Nicola Piazzi wrote:
> 
>> Each minute it learn messages of the last minute so it read and learn one 
>> time only for each message
>> Messages are that it sends from internal, so il learn that words are not spam
>> 
>> Internal messages are not spam
> 
> Until one of your users gets their account hacked/phished and spammers then 
> use it to abuse your server to send out megabytes of spam.
> (or they may have had an account on Yahoo that used the same password).
> 
> Careless users happen to the best of us. ;(
> 
> John's point is still valid; blind un-vetted automated Bayes learning is 
> asking for trouble.

I would have to agree and re-inforce the message here... automated learning of 
SPAM/HAM is not a good idea. I have users dropping emails THEY HAVE SUBSCRIBED 
TO and forgotten they did so in their SPAM folder, and I would argue those are 
NOT SPAM. They actually contain a LOT of industry standard nomenclature that if 
trained as SPAM would not necessarily be valid tokens.

Think about it, the best machine to tell whether something is SPAM or not is 
the human machine. learning in this regard is telling SA emails like this one 
that I have specifically identified as SPAM are ones you should look out for. 
It (in and of itself) does not make a judgement call on what is or is not SPAM. 
You need to do that. 

Keep teaching and pretty soon everything is in every pool (there is such a 
thing as knowing too much, so much so, that you are left indecisive and 
perplexed at event the simplest problem). I think it's far better to have a 
smaller pool of tokens keyed with precision than a lot of tokens that well 
frankly could go either way.



> 
> -- 
> Dave Funk  University of Iowa
> College of Engineering
> 319/335-5751   FAX: 319/384-0549   1256 Seamans Center
> Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
> #include 
> Better is not better, 'standard' is better. B{



Re: R: learn ham

2017-01-05 Thread Dave Funk

On Thu, 5 Jan 2017, Nicola Piazzi wrote:


Each minute it learn messages of the last minute so it read and learn one time 
only for each message
Messages are that it sends from internal, so il learn that words are not spam

Internal messages are not spam


Until one of your users gets their account hacked/phished and spammers 
then use it to abuse your server to send out megabytes of spam.

(or they may have had an account on Yahoo that used the same password).

Careless users happen to the best of us. ;(

John's point is still valid; blind un-vetted automated Bayes learning is 
asking for trouble.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: R: learn ham

2017-01-05 Thread John Hardin

On Thu, 5 Jan 2017, Nicola Piazzi wrote:


Each minute it learn messages of the last minute so it read and learn one time 
only for each message


There is a certain amount of overhead involved in reading the mailbox and 
processing messages even if they have already been learned...



Messages are that it sends from internal, so il learn that words are not spam

Internal messages are not spam


...until you get infected by a spambot.

Bayes training should be manually reviewed. Blind training is fragile and 
invites the system to go badly off the rails when for some reason it makes 
a poor decision that is self-reinforcing.




Nicola Piazzi
CED - Sistemi
COMET s.p.a.
Via Michelino, 105 - 40127 Bologna - Italia
Tel.  +39 051.6079.293
Cell. +39 328.21.73.470
Web: www.gruppocomet.it



-Messaggio originale-
Da: John Hardin [mailto:jhar...@impsec.org]
Inviato: giovedì 5 gennaio 2017 17:35
A: users@spamassassin.apache.org
Oggetto: Re: learn ham

On Thu, 5 Jan 2017, Marc Stürmer wrote:


Am 2017-01-04 10:58, schrieb Nicola Piazzi:


 I found useful to put in cron a little script like this

 Each minute cron launch this script that takes messages of last
minute  reading from maillog database


What's the purpose of this script, what's the reasoning behind running
this thingie every minute?

What you do is training the Bayes filter every minute. Training a
filter is something which should never be done unattended, but always
supervised, because if not you will get bad results over time.


The execution of the training program can safely be automated, though I'd agree 
once per minute is a bit excessive. The classification of messages into the 
folders that are trained from is what needs manual supervision.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Individual liberties are always "loopholes" to absolute authority.
---
 381 days since the first successful real return to launch site (SpaceX)

R: learn ham

2017-01-05 Thread Nicola Piazzi
Each minute it learn messages of the last minute so it read and learn one time 
only for each message
Messages are that it sends from internal, so il learn that words are not spam

Internal messages are not spam



Nicola Piazzi
CED - Sistemi
COMET s.p.a.
Via Michelino, 105 - 40127 Bologna - Italia
Tel.  +39 051.6079.293
Cell. +39 328.21.73.470
Web: www.gruppocomet.it



-Messaggio originale-
Da: John Hardin [mailto:jhar...@impsec.org] 
Inviato: giovedì 5 gennaio 2017 17:35
A: users@spamassassin.apache.org
Oggetto: Re: learn ham

On Thu, 5 Jan 2017, Marc Stürmer wrote:

> Am 2017-01-04 10:58, schrieb Nicola Piazzi:
>
>>  I found useful to put in cron a little script like this
>>
>>  Each minute cron launch this script that takes messages of last 
>> minute  reading from maillog database
>
> What's the purpose of this script, what's the reasoning behind running 
> this thingie every minute?
>
> What you do is training the Bayes filter every minute. Training a 
> filter is something which should never be done unattended, but always 
> supervised, because if not you will get bad results over time.

The execution of the training program can safely be automated, though I'd agree 
once per minute is a bit excessive. The classification of messages into the 
folders that are trained from is what needs manual supervision.

-- 
  John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
  jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
   Individual liberties are always "loopholes" to absolute authority.
---
  381 days since the first successful real return to launch site (SpaceX)


Re: learn ham

2017-01-05 Thread John Hardin

On Thu, 5 Jan 2017, Marc Stürmer wrote:


Am 2017-01-04 10:58, schrieb Nicola Piazzi:


 I found useful to put in cron a little script like this

 Each minute cron launch this script that takes messages of last minute
 reading from maillog database


What's the purpose of this script, what's the reasoning behind running this 
thingie every minute?


What you do is training the Bayes filter every minute. Training a filter is 
something which should never be done unattended, but always supervised, 
because if not you will get bad results over time.


The execution of the training program can safely be automated, though I'd 
agree once per minute is a bit excessive. The classification of messages 
into the folders that are trained from is what needs manual supervision.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Individual liberties are always "loopholes" to absolute authority.
---
 381 days since the first successful real return to launch site (SpaceX)

Re: learn ham

2017-01-05 Thread Marc Stürmer

Am 2017-01-04 10:58, schrieb Nicola Piazzi:


I found useful to put in cron a little script like this

Each minute cron launch this script that takes messages of last minute 
reading from maillog database


What's the purpose of this script, what's the reasoning behind running 
this thingie every minute?


What you do is training the Bayes filter every minute. Training a filter 
is something which should never be done unattended, but always 
supervised, because if not you will get bad results over time.


There's autolearn in Spamassassin, if it's confident enough that a 
message is ham it will act accordingly, normally this is enough when you 
trained your initial ham corpus accordingly.


BTW: saving your root password for MySQL in a cronjob script is a very 
bad idea, too.

Re: Does spamc unwrap spam reports?

2017-01-05 Thread @lbutlr
On Dec 28, 2016, at 3:01 AM, Lukas Erlacher  wrote:
> I'm calling "spamc --learntype=spam/ham" from a script, passing in emails 
> fetched from imap (I'm using ISBG with --learnspambox / --learnhambox and 
> --spamc actually).

Why are you calling spamc instead of sa-learn?

-- 
Apple broke AppleScripting signatures in Mail.app, so no random signatures.