Re: No rule updates since 1/1/17

2017-01-21 Thread John Hardin

On Sat, 21 Jan 2017, Kevin Golding wrote:


On Sat, 21 Jan 2017 19:08:39 -, Jari Fredriksson  wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

John Hardin kirjoitti 20.1.2017 22:38:

> Collecting spam after RBL filtering is much less helpful to masscheck.
> Ideally your spam corpus is from a totally unfiltered feed.
> 
> However, even if it is filtered and small, it helps, *especially* if

> the ham is not in English - masscheck is perennially starved for
> non-English ham and rule scoring is thus baised against non-English
> languages to a degree.

This is NOT what I have learned from SA lists. I used to do this, but
learned in SA discussions that it is *harmful* to pass such spam to
masscheck. That it harms the SA users doing proper pre SA filtering.

We do *need* an official policy! What are we going to do with mixed
messages like this??


It was written down once. I saw the unfiltered thing again when I looked 
earlier today, but I can't spot it just now. I believe I was also told by 
someone who knows this stuff that it wasn't a requirement, more an ideal.


I apologize if there's empirical evidence that including spam that would 
be blocked by RBLs causes poorer masscheck results. That seems strongly 
counterintuitive to me, especially for sites where such filtering is *not* 
done at the MTA level - there are such.


However looking for that comment again just now I registered another 
discrepancy on the wiki:


https://wiki.apache.org/spamassassin/CorpusCleaning - no spam older than 2 
months


https://wiki.apache.org/spamassassin/HandClassifiedCorpora - no spam older 
than 6 months


I don't think either are actually strict rules.


There is age filtering in the masscheck code, but I don't remember off the 
top of my head what the cutoff actually is. I agree that the discrepancies 
in the wiki should be corrected...



--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 An operating system design that requires a system reboot in order to
 install a document viewing utility does not earn my respect.
---
 2 days until John Moses Browning's 162nd Birthday


Re: No rule updates since 1/1/17

2017-01-21 Thread John Hardin

On Sat, 21 Jan 2017, Kevin Golding wrote:


On Sat, 21 Jan 2017 16:35:12 -, David Jones  wrote:

I think the "barrier to entry" is too difficult for most.  I would have 
to setup a new MX on a domain without MTA checks (DNS and RBL) then 
create a honeypot email address to attract spam if I didn't have 
established recipient addresses/mailboxes.


I may be wrong but I don't believe the majority of the current masscheckers 
have honeypots in place. I also believe that at least some have some form of 
filtering in place - in fact the most common filtering in place is the manual 
classification since I bet most of us come across the odd message that we 
second guess and just put to one side.


Likely true. What I contribute is what gets throgh Zen to my personal 
mailbox, for example. I did liberally sprinkle "ideally" through my 
description... :)



Then I would have to setup an SA development
environment with scripts to keep it up-to-date from SVN and compiled 
regularly.


I forget the exact steps involved for running the checks because basically I 
set it up and largely forget about it, but essentially it was grab an svn 
copy of SpamAssassin, pick one of the various helper scripts, create a config 
and let cron deal with the daily workload of updating/checking/submitting - 
it's all done in the helper scripts.


Right, that's pretty much all there is: install and schedule the local 
masscheck script. It's not quite totally black box, but pretty close. You 
do, however, have to get all the bits working initially.


I like the idea of a VM image.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 An operating system design that requires a system reboot in order to
 install a document viewing utility does not earn my respect.
---
 2 days until John Moses Browning's 162nd Birthday


Re: Low spam score: -1.9

2017-01-21 Thread Jari Fredriksson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Reindl Harald kirjoitti 21.1.2017 22:33:
> Am 21.01.2017 um 21:21 schrieb Jari Fredriksson:
>> Emin Akbulut kirjoitti 10.1.2017 9:48:
>> 
>>> Hi all,
>>> 
>>> Recently we receive spam messages and SA cannot block them.
>>> I've also checked the raw message at  http://spamcheck.postmarkapp.com/
>>> and score was very low either.
>>> 
>>> I've trained the SA and it worked for a while but now it's useless.
>>> 
>>> 
>>> How can I prevent those spams? They look like poems
>>> 
>>> * * * * * * * * * * * * * * *
>>> 
>> 
>> Please do NOT post spam to list. Put it to a pastebin.ca or similar and
>> post a link. The spam you spew might poison our SA...
> 
> then fix your SA not to train manually every crap and do it proper at your own

Oh well. Well, my SA does not train SA lists anyhow, but this was just a
common thing. Sorry about that.


- -- 
ja...@iki.fi
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iEYEARECAAYFAliDyh0ACgkQKL4IzOyjSrZZZgCffdlLPoZDmGDJfkCqS6HWxjYw
bV4AoJRXGaA7EJGXlcwTp2EFggIruN+V
=lKgd
-END PGP SIGNATURE-


Re: Low spam score: -1.9

2017-01-21 Thread Jari Fredriksson
Emin Akbulut kirjoitti 10.1.2017 9:48:

> Hi all, 
> 
> Recently we receive spam messages and SA cannot block them. 
> I've also checked the raw message at  http://spamcheck.postmarkapp.com/ 
> and score was very low either. 
> 
> I've trained the SA and it worked for a while but now it's useless. 
> 
> How can I prevent those spams? They look like poems 
> 
> * * * * * * * * * * * * * * *

Please do NOT post spam to list. Put it to a pastebin.ca or similar and
post a link. The spam you spew might poison our SA... 

-- 
ja...@iki.fi

signature.asc
Description: OpenPGP digital signature


Re: No rule updates since 1/1/17

2017-01-21 Thread Jari Fredriksson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Kevin Golding kirjoitti 21.1.2017 21:22:
> On Sat, 21 Jan 2017 19:08:39 -, Jari Fredriksson  wrote:
> 
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>> 
>> John Hardin kirjoitti 20.1.2017 22:38:
>> 
>>> Collecting spam after RBL filtering is much less helpful to masscheck.
>>> Ideally your spam corpus is from a totally unfiltered feed.
>>> 
>>> However, even if it is filtered and small, it helps, *especially* if
>>> the ham is not in English - masscheck is perennially starved for
>>> non-English ham and rule scoring is thus baised against non-English
>>> languages to a degree.
>> 
>> This is NOT what I have learned from SA lists. I used to do this, but
>> learned in SA discussions that it is *harmful* to pass such spam to
>> masscheck. That it harms the SA users doing proper pre SA filtering.
>> 
>> We do *need* an official policy! What are we going to do with mixed
>> messages like this??
> 
> It was written down once. I saw the unfiltered thing again when I
> looked  earlier today, but I can't spot it just now. I believe I was
> also told by  someone who knows this stuff that it wasn't a
> requirement, more an ideal.
> 
> However looking for that comment again just now I registered another
> discrepancy on the wiki:
> 
> https://wiki.apache.org/spamassassin/CorpusCleaning - no spam older
> than 2  months
> 
> https://wiki.apache.org/spamassassin/HandClassifiedCorpora - no spam
> older  than 6 months
> 
> I don't think either are actually strict rules. It will help lower the
>  barrier to entry if we can make this stuff more uniform. It could
> also be  argued that having two such similar pages is somewhat
> redundant actually.

What has CorpusCleaning from garbage to with this? Really confused now.

- -- 
ja...@iki.fi
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iEYEARECAAYFAliDwMEACgkQKL4IzOyjSrZVegCeP+YQcK6s4AlHb4iTqbzUtige
ZTAAoKFGolEuLmElzqZu1KT3+RmMm/s2
=mDIS
-END PGP SIGNATURE-


Re: No rule updates since 1/1/17

2017-01-21 Thread Kevin Golding

On Sat, 21 Jan 2017 19:08:39 -, Jari Fredriksson  wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

John Hardin kirjoitti 20.1.2017 22:38:


Collecting spam after RBL filtering is much less helpful to masscheck.
Ideally your spam corpus is from a totally unfiltered feed.

However, even if it is filtered and small, it helps, *especially* if
the ham is not in English - masscheck is perennially starved for
non-English ham and rule scoring is thus baised against non-English
languages to a degree.


This is NOT what I have learned from SA lists. I used to do this, but
learned in SA discussions that it is *harmful* to pass such spam to
masscheck. That it harms the SA users doing proper pre SA filtering.

We do *need* an official policy! What are we going to do with mixed
messages like this??


It was written down once. I saw the unfiltered thing again when I looked  
earlier today, but I can't spot it just now. I believe I was also told by  
someone who knows this stuff that it wasn't a requirement, more an ideal.


However looking for that comment again just now I registered another  
discrepancy on the wiki:


https://wiki.apache.org/spamassassin/CorpusCleaning - no spam older than 2  
months


https://wiki.apache.org/spamassassin/HandClassifiedCorpora - no spam older  
than 6 months


I don't think either are actually strict rules. It will help lower the  
barrier to entry if we can make this stuff more uniform. It could also be  
argued that having two such similar pages is somewhat redundant actually.


Re: No rule updates since 1/1/17

2017-01-21 Thread Jari Fredriksson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

John Hardin kirjoitti 20.1.2017 22:38:

> Collecting spam after RBL filtering is much less helpful to masscheck.
> Ideally your spam corpus is from a totally unfiltered feed.
> 
> However, even if it is filtered and small, it helps, *especially* if
> the ham is not in English - masscheck is perennially starved for
> non-English ham and rule scoring is thus baised against non-English
> languages to a degree.

This is NOT what I have learned from SA lists. I used to do this, but
learned in SA discussions that it is *harmful* to pass such spam to
masscheck. That it harms the SA users doing proper pre SA filtering.

We do *need* an official policy! What are we going to do with mixed
messages like this??


- -- 
ja...@iki.fi
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iEYEARECAAYFAliDsbcACgkQKL4IzOyjSrbG1wCg8kbOuaUlyjogQw0Tm0bUGcNA
nrUAoINhOU8+veBBzQlipYI657FMsXfW
=2Fkw
-END PGP SIGNATURE-


Re: No rule updates since 1/1/17

2017-01-21 Thread David Jones
>On Sat, 21 Jan 2017 16:35:12 +
>David Jones wrote:

>> I think the "barrier to entry" is too difficult for most.  I would
>> have to setup a new MX on a domain without MTA checks (DNS and RBL)

>I hope it doesn't actually say that anywhere. IMO the corpora should be
>dominated by the spam that's gets through to SA in actual production
>environments.

It was implied by a response from John Hardin yesterday and makes
sense.  I have tuned my production mail filters to block > 90% of spam
via DNS checks and RBLs in Postfix so SA only has to block a few
percent of the total potential mail.  That means my production SA is
not going to see the majority of spam.  I only have to deal with the
occassional compromised account sending spam for a short period
before it is either detected and locked or becomes listed on enough
RBLs.

I am currently setting up a new MX and getting mail flowing to a newly
built iRedMail server.  Then I will look at the SVN scripts to get that
part setup.
http://svn.apache.org/viewvc/spamassassin/trunk/masses/contrib/automasscheck-minimal/
I am not familiar with amavis-new since I have been using MailScanner
so I will research how to setup the SA development environment with
iRedMail's amavis-new.  I have disabled most of the Postfix settings
to block spam (DNS and RBLs) that iRedMail sets up so SA should see
almost everything sent to a catchall mailbox.  Then I plan to login to
that account regularly and categorize ham and spam.

Re: No rule updates since 1/1/17

2017-01-21 Thread RW
On Sat, 21 Jan 2017 16:35:12 +
David Jones wrote:



> I think the "barrier to entry" is too difficult for most.  I would
> have to setup a new MX on a domain without MTA checks (DNS and RBL)

I hope it doesn't actually say that anywhere. IMO the corpora should be
dominated by the spam that's gets through to SA in actual production
environments.


Re: No rule updates since 1/1/17

2017-01-21 Thread Kevin Golding

On Sat, 21 Jan 2017 16:35:12 -, David Jones  wrote:

I think the "barrier to entry" is too difficult for most.  I would have  
to

setup a new MX on a domain without MTA checks (DNS and RBL) then
create a honeypot email address to attract spam if I didn't have  
established

recipient addresses/mailboxes.


I may be wrong but I don't believe the majority of the current  
masscheckers have honeypots in place. I also believe that at least some  
have some form of filtering in place - in fact the most common filtering  
in place is the manual classification since I bet most of us come across  
the odd message that we second guess and just put to one side.



Then I would have to setup an SA development
environment with scripts to keep it up-to-date from SVN and compiled  
regularly.


I forget the exact steps involved for running the checks because basically  
I set it up and largely forget about it, but essentially it was grab an  
svn copy of SpamAssassin, pick one of the various helper scripts, create a  
config and let cron deal with the daily workload of  
updating/checking/submitting - it's all done in the helper scripts. You  
can write your own if you like too, but in the various options out there  
one stands a good chance of meeting your needs.


I only really have to think about it when I move my masschecks to a new  
machine.



Finally I would need to manually categorize the ham and spam.


Okay, I agree this part involves doing stuff regularly. The amount will  
vary depending on how active you are. Personally? If I am confident a mail  
is ham or spam then I am confident that mail could be used for both bayes  
training and masschecking. I was going to do that classification anyway so  
it's not really any extra for me.


Sure, I could spend more time working to get a few extra samples etc. but  
I have found my personal happy balance in terms of input vs output. You're  
not expected to neglect your pets to make it perfect. Just, if anyone has  
the ability to help out (even a little bit) it might be handy.


What could be just as helpful as actually running masschecks might be  
looking at the current documentation and poking at it with a stick. Maybe  
it does need tweaking to sound less complicated (I think it's improved  
over the years but maybe not enough). It sounds as if there are a couple  
of things that could be looked at, perhaps there are more.


Re: How does sa know when pyzor sees a spam msg?

2017-01-21 Thread Matus UHLAR - fantomas

On 21.01.17 11:20, Harry Putnam wrote:

I'm looking to understand the actual mechanism whereby pyzor tells sa
it thinks a message is spam.

What does sa look for.


that's problem of Mail::SpamAssassin::Plugin::Pyzor plugin.


My sa setup allows sa to insert X-Spam headers and then procmail looks
for certain of those

 :0fw
 | /usr/bin/spamc

 :0:
 * ^X-Spam-Status: Yes
  spama_spam_tr.in

So does pyzor insert something I can tell procmail to lookfor or does
pyzor somehow tell SA a msg is spam and then sa inserts something I
can tell procmail to look for?

What is the actual mechanism between pyzor and sa.  How do they
converse.


well, SA does use the pyzor output. It's quite irelevant how.
The point is, when pyzor reports message as spam, SA increases its score, so
the probability of mail being detected as spam is higher.


How is it all found by procmail?


exactly as you use above. It also does not matter if you use pyzor, all
checks SA uses are evaluated that way.

Simply install pyzor (razor, dcc, ...), activate SA plugins and let SA do
the job.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"Two words: Windows survives." - Craig Mundie, Microsoft senior strategist
"So does syphillis. Good thing we have penicillin." - Matthew Alton


Re: No rule updates since 1/1/17

2017-01-21 Thread Axb

On 01/21/2017 05:35 PM, David Jones wrote:

On Fri, 20 Jan 2017 19:02:09 -, Tom Hendrikx  wrote:



As John has said, diversity makes the rules more accurate for more people.
Also many hands make light work. With more people involved there's not
such a requirement to contribute thousands of messages per person.


I think the "barrier to entry" is too difficult for most.  I would have to
setup a new MX on a domain without MTA checks (DNS and RBL) then
create a honeypot email address to attract spam if I didn't have established
recipient addresses/mailboxes.  Then I would have to setup an SA development
environment with scripts to keep it up-to-date from SVN and compiled regularly.
Finally I would need to manually categorize the ham and spam.

I am capable of doing everything above and really want to help the mass-
checking but it would be better if the "barrier to entry" is lower for everyone.
If there were setup scripts (maybe there already is) or a VM that could be
downloaded ready to use, that would help get more masscheckers going easily.

For example, would it be possible to setup an iRedMail server VM (only takes
a few minutes) that could be turned into this SA development environment for
masschecking?  This would quickly provide a separate mail server with an IMAP
server and webmail interface for easy categorization of spam and ham.  If so,
then could someone point me to some documentation on setting up an SA
development environment for masschecking?

I have a domain that is about to be retired with a lot of addresses on spam
lists that would be very good to attract spam and help the masscheck corpus.



/trunk/masses/contrib/automasscheck-minimal

The Wiki also has a lot of detailed info.




Re: No rule updates since 1/1/17

2017-01-21 Thread David Jones
>On Fri, 20 Jan 2017 19:02:09 -, Tom Hendrikx  wrote:

>As John has said, diversity makes the rules more accurate for more people.
>Also many hands make light work. With more people involved there's not
>such a requirement to contribute thousands of messages per person.

I think the "barrier to entry" is too difficult for most.  I would have to
setup a new MX on a domain without MTA checks (DNS and RBL) then
create a honeypot email address to attract spam if I didn't have established
recipient addresses/mailboxes.  Then I would have to setup an SA development
environment with scripts to keep it up-to-date from SVN and compiled regularly.
Finally I would need to manually categorize the ham and spam.

I am capable of doing everything above and really want to help the mass-
checking but it would be better if the "barrier to entry" is lower for everyone.
If there were setup scripts (maybe there already is) or a VM that could be
downloaded ready to use, that would help get more masscheckers going easily.

For example, would it be possible to setup an iRedMail server VM (only takes
a few minutes) that could be turned into this SA development environment for
masschecking?  This would quickly provide a separate mail server with an IMAP
server and webmail interface for easy categorization of spam and ham.  If so,
then could someone point me to some documentation on setting up an SA
development environment for masschecking?

I have a domain that is about to be retired with a lot of addresses on spam
lists that would be very good to attract spam and help the masscheck corpus.


How does sa know when pyzor sees a spam msg?

2017-01-21 Thread Harry Putnam
Having a heck of a time googling for this answer.

I'm looking to understand the actual mechanism whereby pyzor tells sa
it thinks a message is spam.

What does sa look for.

My sa setup allows sa to insert X-Spam headers and then procmail looks
for certain of those

  :0fw
  | /usr/bin/spamc

  :0:
  * ^X-Spam-Status: Yes   
   spama_spam_tr.in

So does pyzor insert something I can tell procmail to lookfor or does
pyzor somehow tell SA a msg is spam and then sa inserts something I
can tell procmail to look for?

What is the actual mechanism between pyzor and sa.  How do they
converse.

How is it all found by procmail?

The actual details are hard to find.

Plenty about pyzor checking its servers but not much about what
happens then.



Re: No rule updates since 1/1/17

2017-01-21 Thread Kevin Golding

On Fri, 20 Jan 2017 19:02:09 -, Tom Hendrikx  wrote:


I think I can say the same about my platform, but since this issue keeps
popping up I just applied for an account just to find out if my
contribution could help. I can't speculate so I'm just gonna try if it
helps :)


Top move, it's definitely worth looking into. The list sees a lot of  
questions about either the scores that are generated or why they haven't  
been generated, and the answer tends to come down to one factor - the  
masscheck team is pretty small.


As John has said, diversity makes the rules more accurate for more people.  
Also many hands make light work. With more people involved there's not  
such a requirement to contribute thousands of messages per person.