Re: No rule updates since 1/1/17
On Sat, 21 Jan 2017, Kevin Golding wrote: On Sat, 21 Jan 2017 19:08:39 -, Jari Fredrikssonwrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 John Hardin kirjoitti 20.1.2017 22:38: > Collecting spam after RBL filtering is much less helpful to masscheck. > Ideally your spam corpus is from a totally unfiltered feed. > > However, even if it is filtered and small, it helps, *especially* if > the ham is not in English - masscheck is perennially starved for > non-English ham and rule scoring is thus baised against non-English > languages to a degree. This is NOT what I have learned from SA lists. I used to do this, but learned in SA discussions that it is *harmful* to pass such spam to masscheck. That it harms the SA users doing proper pre SA filtering. We do *need* an official policy! What are we going to do with mixed messages like this?? It was written down once. I saw the unfiltered thing again when I looked earlier today, but I can't spot it just now. I believe I was also told by someone who knows this stuff that it wasn't a requirement, more an ideal. I apologize if there's empirical evidence that including spam that would be blocked by RBLs causes poorer masscheck results. That seems strongly counterintuitive to me, especially for sites where such filtering is *not* done at the MTA level - there are such. However looking for that comment again just now I registered another discrepancy on the wiki: https://wiki.apache.org/spamassassin/CorpusCleaning - no spam older than 2 months https://wiki.apache.org/spamassassin/HandClassifiedCorpora - no spam older than 6 months I don't think either are actually strict rules. There is age filtering in the masscheck code, but I don't remember off the top of my head what the cutoff actually is. I agree that the discrepancies in the wiki should be corrected... -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- An operating system design that requires a system reboot in order to install a document viewing utility does not earn my respect. --- 2 days until John Moses Browning's 162nd Birthday
Re: No rule updates since 1/1/17
On Sat, 21 Jan 2017, Kevin Golding wrote: On Sat, 21 Jan 2017 16:35:12 -, David Joneswrote: I think the "barrier to entry" is too difficult for most. I would have to setup a new MX on a domain without MTA checks (DNS and RBL) then create a honeypot email address to attract spam if I didn't have established recipient addresses/mailboxes. I may be wrong but I don't believe the majority of the current masscheckers have honeypots in place. I also believe that at least some have some form of filtering in place - in fact the most common filtering in place is the manual classification since I bet most of us come across the odd message that we second guess and just put to one side. Likely true. What I contribute is what gets throgh Zen to my personal mailbox, for example. I did liberally sprinkle "ideally" through my description... :) Then I would have to setup an SA development environment with scripts to keep it up-to-date from SVN and compiled regularly. I forget the exact steps involved for running the checks because basically I set it up and largely forget about it, but essentially it was grab an svn copy of SpamAssassin, pick one of the various helper scripts, create a config and let cron deal with the daily workload of updating/checking/submitting - it's all done in the helper scripts. Right, that's pretty much all there is: install and schedule the local masscheck script. It's not quite totally black box, but pretty close. You do, however, have to get all the bits working initially. I like the idea of a VM image. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- An operating system design that requires a system reboot in order to install a document viewing utility does not earn my respect. --- 2 days until John Moses Browning's 162nd Birthday
Re: Low spam score: -1.9
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Reindl Harald kirjoitti 21.1.2017 22:33: > Am 21.01.2017 um 21:21 schrieb Jari Fredriksson: >> Emin Akbulut kirjoitti 10.1.2017 9:48: >> >>> Hi all, >>> >>> Recently we receive spam messages and SA cannot block them. >>> I've also checked the raw message at http://spamcheck.postmarkapp.com/ >>> and score was very low either. >>> >>> I've trained the SA and it worked for a while but now it's useless. >>> >>> >>> How can I prevent those spams? They look like poems >>> >>> * * * * * * * * * * * * * * * >>> >> >> Please do NOT post spam to list. Put it to a pastebin.ca or similar and >> post a link. The spam you spew might poison our SA... > > then fix your SA not to train manually every crap and do it proper at your own Oh well. Well, my SA does not train SA lists anyhow, but this was just a common thing. Sorry about that. - -- ja...@iki.fi -BEGIN PGP SIGNATURE- Version: GnuPG v1 iEYEARECAAYFAliDyh0ACgkQKL4IzOyjSrZZZgCffdlLPoZDmGDJfkCqS6HWxjYw bV4AoJRXGaA7EJGXlcwTp2EFggIruN+V =lKgd -END PGP SIGNATURE-
Re: Low spam score: -1.9
Emin Akbulut kirjoitti 10.1.2017 9:48: > Hi all, > > Recently we receive spam messages and SA cannot block them. > I've also checked the raw message at http://spamcheck.postmarkapp.com/ > and score was very low either. > > I've trained the SA and it worked for a while but now it's useless. > > How can I prevent those spams? They look like poems > > * * * * * * * * * * * * * * * Please do NOT post spam to list. Put it to a pastebin.ca or similar and post a link. The spam you spew might poison our SA... -- ja...@iki.fi signature.asc Description: OpenPGP digital signature
Re: No rule updates since 1/1/17
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kevin Golding kirjoitti 21.1.2017 21:22: > On Sat, 21 Jan 2017 19:08:39 -, Jari Fredrikssonwrote: > >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA1 >> >> John Hardin kirjoitti 20.1.2017 22:38: >> >>> Collecting spam after RBL filtering is much less helpful to masscheck. >>> Ideally your spam corpus is from a totally unfiltered feed. >>> >>> However, even if it is filtered and small, it helps, *especially* if >>> the ham is not in English - masscheck is perennially starved for >>> non-English ham and rule scoring is thus baised against non-English >>> languages to a degree. >> >> This is NOT what I have learned from SA lists. I used to do this, but >> learned in SA discussions that it is *harmful* to pass such spam to >> masscheck. That it harms the SA users doing proper pre SA filtering. >> >> We do *need* an official policy! What are we going to do with mixed >> messages like this?? > > It was written down once. I saw the unfiltered thing again when I > looked earlier today, but I can't spot it just now. I believe I was > also told by someone who knows this stuff that it wasn't a > requirement, more an ideal. > > However looking for that comment again just now I registered another > discrepancy on the wiki: > > https://wiki.apache.org/spamassassin/CorpusCleaning - no spam older > than 2 months > > https://wiki.apache.org/spamassassin/HandClassifiedCorpora - no spam > older than 6 months > > I don't think either are actually strict rules. It will help lower the > barrier to entry if we can make this stuff more uniform. It could > also be argued that having two such similar pages is somewhat > redundant actually. What has CorpusCleaning from garbage to with this? Really confused now. - -- ja...@iki.fi -BEGIN PGP SIGNATURE- Version: GnuPG v1 iEYEARECAAYFAliDwMEACgkQKL4IzOyjSrZVegCeP+YQcK6s4AlHb4iTqbzUtige ZTAAoKFGolEuLmElzqZu1KT3+RmMm/s2 =mDIS -END PGP SIGNATURE-
Re: No rule updates since 1/1/17
On Sat, 21 Jan 2017 19:08:39 -, Jari Fredrikssonwrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 John Hardin kirjoitti 20.1.2017 22:38: Collecting spam after RBL filtering is much less helpful to masscheck. Ideally your spam corpus is from a totally unfiltered feed. However, even if it is filtered and small, it helps, *especially* if the ham is not in English - masscheck is perennially starved for non-English ham and rule scoring is thus baised against non-English languages to a degree. This is NOT what I have learned from SA lists. I used to do this, but learned in SA discussions that it is *harmful* to pass such spam to masscheck. That it harms the SA users doing proper pre SA filtering. We do *need* an official policy! What are we going to do with mixed messages like this?? It was written down once. I saw the unfiltered thing again when I looked earlier today, but I can't spot it just now. I believe I was also told by someone who knows this stuff that it wasn't a requirement, more an ideal. However looking for that comment again just now I registered another discrepancy on the wiki: https://wiki.apache.org/spamassassin/CorpusCleaning - no spam older than 2 months https://wiki.apache.org/spamassassin/HandClassifiedCorpora - no spam older than 6 months I don't think either are actually strict rules. It will help lower the barrier to entry if we can make this stuff more uniform. It could also be argued that having two such similar pages is somewhat redundant actually.
Re: No rule updates since 1/1/17
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 John Hardin kirjoitti 20.1.2017 22:38: > Collecting spam after RBL filtering is much less helpful to masscheck. > Ideally your spam corpus is from a totally unfiltered feed. > > However, even if it is filtered and small, it helps, *especially* if > the ham is not in English - masscheck is perennially starved for > non-English ham and rule scoring is thus baised against non-English > languages to a degree. This is NOT what I have learned from SA lists. I used to do this, but learned in SA discussions that it is *harmful* to pass such spam to masscheck. That it harms the SA users doing proper pre SA filtering. We do *need* an official policy! What are we going to do with mixed messages like this?? - -- ja...@iki.fi -BEGIN PGP SIGNATURE- Version: GnuPG v1 iEYEARECAAYFAliDsbcACgkQKL4IzOyjSrbG1wCg8kbOuaUlyjogQw0Tm0bUGcNA nrUAoINhOU8+veBBzQlipYI657FMsXfW =2Fkw -END PGP SIGNATURE-
Re: No rule updates since 1/1/17
>On Sat, 21 Jan 2017 16:35:12 + >David Jones wrote: >> I think the "barrier to entry" is too difficult for most. I would >> have to setup a new MX on a domain without MTA checks (DNS and RBL) >I hope it doesn't actually say that anywhere. IMO the corpora should be >dominated by the spam that's gets through to SA in actual production >environments. It was implied by a response from John Hardin yesterday and makes sense. I have tuned my production mail filters to block > 90% of spam via DNS checks and RBLs in Postfix so SA only has to block a few percent of the total potential mail. That means my production SA is not going to see the majority of spam. I only have to deal with the occassional compromised account sending spam for a short period before it is either detected and locked or becomes listed on enough RBLs. I am currently setting up a new MX and getting mail flowing to a newly built iRedMail server. Then I will look at the SVN scripts to get that part setup. http://svn.apache.org/viewvc/spamassassin/trunk/masses/contrib/automasscheck-minimal/ I am not familiar with amavis-new since I have been using MailScanner so I will research how to setup the SA development environment with iRedMail's amavis-new. I have disabled most of the Postfix settings to block spam (DNS and RBLs) that iRedMail sets up so SA should see almost everything sent to a catchall mailbox. Then I plan to login to that account regularly and categorize ham and spam.
Re: No rule updates since 1/1/17
On Sat, 21 Jan 2017 16:35:12 + David Jones wrote: > I think the "barrier to entry" is too difficult for most. I would > have to setup a new MX on a domain without MTA checks (DNS and RBL) I hope it doesn't actually say that anywhere. IMO the corpora should be dominated by the spam that's gets through to SA in actual production environments.
Re: No rule updates since 1/1/17
On Sat, 21 Jan 2017 16:35:12 -, David Joneswrote: I think the "barrier to entry" is too difficult for most. I would have to setup a new MX on a domain without MTA checks (DNS and RBL) then create a honeypot email address to attract spam if I didn't have established recipient addresses/mailboxes. I may be wrong but I don't believe the majority of the current masscheckers have honeypots in place. I also believe that at least some have some form of filtering in place - in fact the most common filtering in place is the manual classification since I bet most of us come across the odd message that we second guess and just put to one side. Then I would have to setup an SA development environment with scripts to keep it up-to-date from SVN and compiled regularly. I forget the exact steps involved for running the checks because basically I set it up and largely forget about it, but essentially it was grab an svn copy of SpamAssassin, pick one of the various helper scripts, create a config and let cron deal with the daily workload of updating/checking/submitting - it's all done in the helper scripts. You can write your own if you like too, but in the various options out there one stands a good chance of meeting your needs. I only really have to think about it when I move my masschecks to a new machine. Finally I would need to manually categorize the ham and spam. Okay, I agree this part involves doing stuff regularly. The amount will vary depending on how active you are. Personally? If I am confident a mail is ham or spam then I am confident that mail could be used for both bayes training and masschecking. I was going to do that classification anyway so it's not really any extra for me. Sure, I could spend more time working to get a few extra samples etc. but I have found my personal happy balance in terms of input vs output. You're not expected to neglect your pets to make it perfect. Just, if anyone has the ability to help out (even a little bit) it might be handy. What could be just as helpful as actually running masschecks might be looking at the current documentation and poking at it with a stick. Maybe it does need tweaking to sound less complicated (I think it's improved over the years but maybe not enough). It sounds as if there are a couple of things that could be looked at, perhaps there are more.
Re: How does sa know when pyzor sees a spam msg?
On 21.01.17 11:20, Harry Putnam wrote: I'm looking to understand the actual mechanism whereby pyzor tells sa it thinks a message is spam. What does sa look for. that's problem of Mail::SpamAssassin::Plugin::Pyzor plugin. My sa setup allows sa to insert X-Spam headers and then procmail looks for certain of those :0fw | /usr/bin/spamc :0: * ^X-Spam-Status: Yes spama_spam_tr.in So does pyzor insert something I can tell procmail to lookfor or does pyzor somehow tell SA a msg is spam and then sa inserts something I can tell procmail to look for? What is the actual mechanism between pyzor and sa. How do they converse. well, SA does use the pyzor output. It's quite irelevant how. The point is, when pyzor reports message as spam, SA increases its score, so the probability of mail being detected as spam is higher. How is it all found by procmail? exactly as you use above. It also does not matter if you use pyzor, all checks SA uses are evaluated that way. Simply install pyzor (razor, dcc, ...), activate SA plugins and let SA do the job. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. "Two words: Windows survives." - Craig Mundie, Microsoft senior strategist "So does syphillis. Good thing we have penicillin." - Matthew Alton
Re: No rule updates since 1/1/17
On 01/21/2017 05:35 PM, David Jones wrote: On Fri, 20 Jan 2017 19:02:09 -, Tom Hendrikxwrote: As John has said, diversity makes the rules more accurate for more people. Also many hands make light work. With more people involved there's not such a requirement to contribute thousands of messages per person. I think the "barrier to entry" is too difficult for most. I would have to setup a new MX on a domain without MTA checks (DNS and RBL) then create a honeypot email address to attract spam if I didn't have established recipient addresses/mailboxes. Then I would have to setup an SA development environment with scripts to keep it up-to-date from SVN and compiled regularly. Finally I would need to manually categorize the ham and spam. I am capable of doing everything above and really want to help the mass- checking but it would be better if the "barrier to entry" is lower for everyone. If there were setup scripts (maybe there already is) or a VM that could be downloaded ready to use, that would help get more masscheckers going easily. For example, would it be possible to setup an iRedMail server VM (only takes a few minutes) that could be turned into this SA development environment for masschecking? This would quickly provide a separate mail server with an IMAP server and webmail interface for easy categorization of spam and ham. If so, then could someone point me to some documentation on setting up an SA development environment for masschecking? I have a domain that is about to be retired with a lot of addresses on spam lists that would be very good to attract spam and help the masscheck corpus. /trunk/masses/contrib/automasscheck-minimal The Wiki also has a lot of detailed info.
Re: No rule updates since 1/1/17
>On Fri, 20 Jan 2017 19:02:09 -, Tom Hendrikxwrote: >As John has said, diversity makes the rules more accurate for more people. >Also many hands make light work. With more people involved there's not >such a requirement to contribute thousands of messages per person. I think the "barrier to entry" is too difficult for most. I would have to setup a new MX on a domain without MTA checks (DNS and RBL) then create a honeypot email address to attract spam if I didn't have established recipient addresses/mailboxes. Then I would have to setup an SA development environment with scripts to keep it up-to-date from SVN and compiled regularly. Finally I would need to manually categorize the ham and spam. I am capable of doing everything above and really want to help the mass- checking but it would be better if the "barrier to entry" is lower for everyone. If there were setup scripts (maybe there already is) or a VM that could be downloaded ready to use, that would help get more masscheckers going easily. For example, would it be possible to setup an iRedMail server VM (only takes a few minutes) that could be turned into this SA development environment for masschecking? This would quickly provide a separate mail server with an IMAP server and webmail interface for easy categorization of spam and ham. If so, then could someone point me to some documentation on setting up an SA development environment for masschecking? I have a domain that is about to be retired with a lot of addresses on spam lists that would be very good to attract spam and help the masscheck corpus.
How does sa know when pyzor sees a spam msg?
Having a heck of a time googling for this answer. I'm looking to understand the actual mechanism whereby pyzor tells sa it thinks a message is spam. What does sa look for. My sa setup allows sa to insert X-Spam headers and then procmail looks for certain of those :0fw | /usr/bin/spamc :0: * ^X-Spam-Status: Yes spama_spam_tr.in So does pyzor insert something I can tell procmail to lookfor or does pyzor somehow tell SA a msg is spam and then sa inserts something I can tell procmail to look for? What is the actual mechanism between pyzor and sa. How do they converse. How is it all found by procmail? The actual details are hard to find. Plenty about pyzor checking its servers but not much about what happens then.
Re: No rule updates since 1/1/17
On Fri, 20 Jan 2017 19:02:09 -, Tom Hendrikxwrote: I think I can say the same about my platform, but since this issue keeps popping up I just applied for an account just to find out if my contribution could help. I can't speculate so I'm just gonna try if it helps :) Top move, it's definitely worth looking into. The list sees a lot of questions about either the scores that are generated or why they haven't been generated, and the answer tends to come down to one factor - the masscheck team is pretty small. As John has said, diversity makes the rules more accurate for more people. Also many hands make light work. With more people involved there's not such a requirement to contribute thousands of messages per person.