Re: learn ham
> On Jan 5, 2017, at 8:54 AM, Dave Funk wrote: > > On Thu, 5 Jan 2017, Nicola Piazzi wrote: > >> Each minute it learn messages of the last minute so it read and learn one >> time only for each message >> Messages are that it sends from internal, so il learn that words are not spam >> >> Internal messages are not spam > > Until one of your users gets their account hacked/phished and spammers then > use it to abuse your server to send out megabytes of spam. > (or they may have had an account on Yahoo that used the same password). > > Careless users happen to the best of us. ;( > > John's point is still valid; blind un-vetted automated Bayes learning is > asking for trouble. I would have to agree and re-inforce the message here... automated learning of SPAM/HAM is not a good idea. I have users dropping emails THEY HAVE SUBSCRIBED TO and forgotten they did so in their SPAM folder, and I would argue those are NOT SPAM. They actually contain a LOT of industry standard nomenclature that if trained as SPAM would not necessarily be valid tokens. Think about it, the best machine to tell whether something is SPAM or not is the human machine. learning in this regard is telling SA emails like this one that I have specifically identified as SPAM are ones you should look out for. It (in and of itself) does not make a judgement call on what is or is not SPAM. You need to do that. Keep teaching and pretty soon everything is in every pool (there is such a thing as knowing too much, so much so, that you are left indecisive and perplexed at event the simplest problem). I think it's far better to have a smaller pool of tokens keyed with precision than a lot of tokens that well frankly could go either way. > > -- > Dave Funk University of Iowa > College of Engineering > 319/335-5751 FAX: 319/384-0549 1256 Seamans Center > Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 > #include > Better is not better, 'standard' is better. B{
Re: R: learn ham
On Thu, 5 Jan 2017, Nicola Piazzi wrote: Each minute it learn messages of the last minute so it read and learn one time only for each message Messages are that it sends from internal, so il learn that words are not spam Internal messages are not spam Until one of your users gets their account hacked/phished and spammers then use it to abuse your server to send out megabytes of spam. (or they may have had an account on Yahoo that used the same password). Careless users happen to the best of us. ;( John's point is still valid; blind un-vetted automated Bayes learning is asking for trouble. -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
Re: R: learn ham
On Thu, 5 Jan 2017, Nicola Piazzi wrote: Each minute it learn messages of the last minute so it read and learn one time only for each message There is a certain amount of overhead involved in reading the mailbox and processing messages even if they have already been learned... Messages are that it sends from internal, so il learn that words are not spam Internal messages are not spam ...until you get infected by a spambot. Bayes training should be manually reviewed. Blind training is fragile and invites the system to go badly off the rails when for some reason it makes a poor decision that is self-reinforcing. Nicola Piazzi CED - Sistemi COMET s.p.a. Via Michelino, 105 - 40127 Bologna - Italia Tel. +39 051.6079.293 Cell. +39 328.21.73.470 Web: www.gruppocomet.it -Messaggio originale- Da: John Hardin [mailto:jhar...@impsec.org] Inviato: giovedì 5 gennaio 2017 17:35 A: users@spamassassin.apache.org Oggetto: Re: learn ham On Thu, 5 Jan 2017, Marc Stürmer wrote: Am 2017-01-04 10:58, schrieb Nicola Piazzi: I found useful to put in cron a little script like this Each minute cron launch this script that takes messages of last minute reading from maillog database What's the purpose of this script, what's the reasoning behind running this thingie every minute? What you do is training the Bayes filter every minute. Training a filter is something which should never be done unattended, but always supervised, because if not you will get bad results over time. The execution of the training program can safely be automated, though I'd agree once per minute is a bit excessive. The classification of messages into the folders that are trained from is what needs manual supervision. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Individual liberties are always "loopholes" to absolute authority. --- 381 days since the first successful real return to launch site (SpaceX)
R: learn ham
Each minute it learn messages of the last minute so it read and learn one time only for each message Messages are that it sends from internal, so il learn that words are not spam Internal messages are not spam Nicola Piazzi CED - Sistemi COMET s.p.a. Via Michelino, 105 - 40127 Bologna - Italia Tel. +39 051.6079.293 Cell. +39 328.21.73.470 Web: www.gruppocomet.it -Messaggio originale- Da: John Hardin [mailto:jhar...@impsec.org] Inviato: giovedì 5 gennaio 2017 17:35 A: users@spamassassin.apache.org Oggetto: Re: learn ham On Thu, 5 Jan 2017, Marc Stürmer wrote: > Am 2017-01-04 10:58, schrieb Nicola Piazzi: > >> I found useful to put in cron a little script like this >> >> Each minute cron launch this script that takes messages of last >> minute reading from maillog database > > What's the purpose of this script, what's the reasoning behind running > this thingie every minute? > > What you do is training the Bayes filter every minute. Training a > filter is something which should never be done unattended, but always > supervised, because if not you will get bad results over time. The execution of the training program can safely be automated, though I'd agree once per minute is a bit excessive. The classification of messages into the folders that are trained from is what needs manual supervision. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Individual liberties are always "loopholes" to absolute authority. --- 381 days since the first successful real return to launch site (SpaceX)
Re: learn ham
On Thu, 5 Jan 2017, Marc Stürmer wrote: Am 2017-01-04 10:58, schrieb Nicola Piazzi: I found useful to put in cron a little script like this Each minute cron launch this script that takes messages of last minute reading from maillog database What's the purpose of this script, what's the reasoning behind running this thingie every minute? What you do is training the Bayes filter every minute. Training a filter is something which should never be done unattended, but always supervised, because if not you will get bad results over time. The execution of the training program can safely be automated, though I'd agree once per minute is a bit excessive. The classification of messages into the folders that are trained from is what needs manual supervision. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Individual liberties are always "loopholes" to absolute authority. --- 381 days since the first successful real return to launch site (SpaceX)
Re: learn ham
Am 2017-01-04 10:58, schrieb Nicola Piazzi: I found useful to put in cron a little script like this Each minute cron launch this script that takes messages of last minute reading from maillog database What's the purpose of this script, what's the reasoning behind running this thingie every minute? What you do is training the Bayes filter every minute. Training a filter is something which should never be done unattended, but always supervised, because if not you will get bad results over time. There's autolearn in Spamassassin, if it's confident enough that a message is ham it will act accordingly, normally this is enough when you trained your initial ham corpus accordingly. BTW: saving your root password for MySQL in a cronjob script is a very bad idea, too.
Re: Does spamc unwrap spam reports?
On Dec 28, 2016, at 3:01 AM, Lukas Erlacher wrote: > I'm calling "spamc --learntype=spam/ham" from a script, passing in emails > fetched from imap (I'm using ISBG with --learnspambox / --learnhambox and > --spamc actually). Why are you calling spamc instead of sa-learn? -- Apple broke AppleScripting signatures in Mail.app, so no random signatures.