Re: Learning only on read emails?

2015-10-20 Thread Matus UHLAR - fantomas

On 19.10.15 17:30, Ryan Coleman wrote:

I actually get THOUSANDS of emails a day. Most of it is spam. And not
caught by SA.  And when it is put into the spam folder it is not learned.

But, hey, you know… you obviously know me better than me so why don’t you
have this back and forth publicly with yourself and keep me out of it.


I would be still carefull to train all read mail as ham... there are cases
you don't notice it's spam (some of them are hardly distinguishable), forget
to move it to spam or don't have time to do it...


P.S.: no need for reply-all on a mailing list


Habit. Besides, there’s no reply-to header rewrite on this mailing list. If I 
hit reply it goes only to you


that's why there are list headers and why some MUAs support them.


On Oct 19, 2015, at 5:25 PM, Reindl Harald  wrote:
nonsense - there are list headers and if you use a broken client just remove 
anything but the list-address



Wow, you really are an asshole, huh?

I looked at the headers before I said anything. Broken Client? No… Apple Mail. 
There are lists where it works because it EXISTS IN THE HEADERS.


not that I like him, but he's right that mail client that is not capable of
handling mailing lists is kind of broken...

Lists should not break mail by inserting reply-to header, because it's
supposed to be inserted by a client, not by a mailing list.

and the fact that it's made by apple doesn't make it good client.

Microsoft and apple tend to screw things their way just because they are
huge companies and don't care about (even backwards) compatibility and
correctness


Speaking of learning spam… your email address will be joining the blacklist 
very soon.


just be careful when blacklisting and spam-training...
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Nothing is fool-proof to a talented fool. 


Re: Learning only on read emails?

2015-10-20 Thread Jari Fredriksson

On 10/20/2015 12:41 AM, Ryan Coleman wrote:

Actually it makes absolute sense since I dump my spam into a folder to be 
scanned as spam and anything that is still in my inbox, and read, is indeed ham.

I just have to re-investigate the ./new and ./cur folders to make sure they 
will operate how I want. But if the email was delivered to my phone and it 
moves (but not read) then it’s not an option.


cur and new folders work as supposed when the IMAP server is Courier, 
but NOT when you use Dovecot.


That is how I have been learning from these two.

br. jarif





On Oct 19, 2015, at 4:35 PM, Reindl Harald  wrote:



Am 19.10.2015 um 23:21 schrieb Ryan Coleman:

Ok so it was established I don’t have a ham scan (correct). So how do I do it 
so that it only scans the read emails in a MAILDIR?

that makes no sense

train a spcific ham and a specific spam folder where you move messages you are 
sure how to classify and not a generic inbox just because you have read a 
message






Re: Learning only on read emails?

2015-10-20 Thread RW
On Tue, 20 Oct 2015 08:29:27 -0500
Ryan Coleman wrote:

> 
> > On Oct 20, 2015, at 8:21 AM, RW  wrote:
> > 
> > On Tue, 20 Oct 2015 15:14:42 +0300
> > Jari Fredriksson wrote:
> > 
> >> On 10/20/2015 12:41 AM, Ryan Coleman wrote:
> >>> Actually it makes absolute sense since I dump my spam into a
> >>> folder to be scanned as spam and anything that is still in my
> >>> inbox, and read, is indeed ham.
> >>> 
> >>> I just have to re-investigate the ./new and ./cur folders to make
> >>> sure they will operate how I want. But if the email was delivered
> >>> to my phone and it moves (but not read) then it?s not an option.
> >> 
> >> cur and new folders work as supposed when the IMAP server is
> >> Courier, but NOT when you use Dovecot.
> > 
> > How does it not work as expected? 
> 
> I haven?t seen anything appear in the ?new? folder, to be honest.

Bear in mind that the "new" directory is there for mail that's been
delivered into the maildir folder without going through a mail client.
If the mail is delivered there by a pop/imap client, or copied/moved
between maildir folders, "new" shouldn't be used. Even when it is used,
an IMAP server should move mail from "new" to "cur" immediately
after its existence been reported to a client, and that can be
instantaneous if the IMAP client supports IDLE.

In my experience Dovecot's MDA does the right thing. There is a
complication though in that when Sieve is used to set a flag, the MDA
has no choice but to put it in "cur".


Re: Misbehaving HEADER_HOST_IN_BLACKLIST? And no SPF on SA list host?

2015-10-20 Thread John Hardin

On Tue, 20 Oct 2015, Amir Caspi wrote:


On Oct 19, 2015, at 1:16 PM, RW  wrote:


body   URI_HOST_IN_BLACKLISTeval:check_uri_host_in_blacklist()
header HEADER_HOST_IN_BLACKLIST eval:check_uri_host_listed('BLACK')

These appear to be the same thing. The first call is just a shorthand
form for the second. I don't see where headers come into it. I think the
second rule is probably just a mistake.


So, following up on this... do any of the main devs see the second rule 
as a problem?  It seems to be that a header rule shouldn't be checking 
URI hosts, but even if so, it absolutely shouldn't be hitting when those 
hosts aren't even in the headers (per the two spamples I posted).


My default assumption for the behavior of a header eval() rule would be 
that it only checks message headers. If that's not the case (as you 
describe) then I'd agree the rule is a problem, especially if it leads to 
duplicate hits.


Whether that's a bug in the documentation, or a bug in the rules, or a bug 
in eval(), or a bug in the implementation of check_uri_host_*, I can't 
really say at this point.


Speculation: If the check_uri_host_* eval()s are looking only at the URI 
list regardless of the rule type (i.e. it always behaves as if it was a 
uri rule) then I'd say that needs to be documented clearly (if it isn't 
documented by more than just an example uri rule) and the rules fixed to 
remove the duplicate hits. If the intent of the eval()s was to respect the 
rule type, it's apparently not doing that.


I don't have time at the moment to dig around in the code to see what it's 
doing and whether it's a documentation/rule issue or an eval() code issue.



Kevin, John, others?

Obviously this is only causing a few rare FPs, and presumably it would 
most likely affect this or some other spam-discussion list... but it 
appears to be a bug, no?


Thanks!

--- Amir



--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  You cannot bring about prosperity by discouraging thrift. You
  cannot help small men by tearing down big men. You cannot
  strengthen the weak by weakening the strong. You cannot lift the
  wage-earner by pulling down the wage-payer. You cannot help the
  poor man by destroying the rich. You cannot keep out of trouble by
  spending more than your income. You cannot further the brotherhood
  of man by inciting class hatred. You cannot establish security on
  borrowed money. You cannot build character and courage by taking
  away men's initiative and independence. You cannot help men
  permanently by doing for them what they could and should do for
  themselves.   -- William J. H. Boetcker
---


Re: Misbehaving HEADER_HOST_IN_BLACKLIST? And no SPF on SA list host?

2015-10-20 Thread Amir Caspi
On Oct 19, 2015, at 1:16 PM, RW  wrote:

> body   URI_HOST_IN_BLACKLISTeval:check_uri_host_in_blacklist()
> header HEADER_HOST_IN_BLACKLIST eval:check_uri_host_listed('BLACK')
> 
> These appear to be the same thing. The first call is just a shorthand
> form for the second. I don't see where headers come into it. I think the
> second rule is probably just a mistake.

So, following up on this... do any of the main devs see the second rule as a 
problem?  It seems to be that a header rule shouldn't be checking URI hosts, 
but even if so, it absolutely shouldn't be hitting when those hosts aren't even 
in the headers (per the two spamples I posted).

Kevin, John, others?

Obviously this is only causing a few rare FPs, and presumably it would most 
likely affect this or some other spam-discussion list... but it appears to be a 
bug, no?

Thanks!

--- Amir



Re: SpamAssassin Rules Regarding Abuse of New Top Level Domains

2015-10-20 Thread shanew

On Tue, 20 Oct 2015, Rob McEwen wrote:


On 10/20/2015 12:13 PM, sha...@shanew.net wrote:

 Unlike Larry (and others) I DO want to block the vast majority of the
 new tlds, because we see nothing but spam from them (and my users tend
 toward the more false-positives than false-negatives side of the
 spectrum).  Rather than maintain a list of all the problematic tlds,
 I'd rather have a blanket block rule with the ability whitelist the
 handful that might be legit. 


Be careful about doing this for the long term. I think that spammer exploit 
new TLDs because they know that many anti-spam systems don't account for them 
correctly at first. (and/or maybe they are cheaper at first?). But in the 
longer term (years down the road).. they tend to move on to other ones, while 
the legit TLDs slowly increase. So this strategy can backfire in the long 
term. (but, of course, MMV... and some smaller hosters don't have to be as 
concerned about a few extra FPs)


I totally agree.  In fact, I assume anything I'm doing right now to
successfully block spam could change tomorrow, much less months or
years from now.  For now, though, I'm seeing almost no legitimate
traffic from most of the new ones (I'm thinking of the longer ones
especially; .work, .ninja, .site, .science, etc.).

I already have rules that score for these tlds in received or envelope
from, but I'm getting tired of making the regular expression longer
and longer (in two different places), and I know there's a smarter
way.  Whether I'm smart enough to implement that smarter way is
another matter entirely.

Is there an existing (relatively simple) plugin that behaves similarly
that I could crib from?


--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: SpamAssassin Rules Regarding Abuse of New Top Level Domains

2015-10-20 Thread Rob McEwen

On 10/20/2015 12:13 PM, sha...@shanew.net wrote:

Unlike Larry (and others) I DO want to block the vast majority of the
new tlds, because we see nothing but spam from them (and my users tend
toward the more false-positives than false-negatives side of the
spectrum).  Rather than maintain a list of all the problematic tlds,
I'd rather have a blanket block rule with the ability whitelist the
handful that might be legit. 


Be careful about doing this for the long term. I think that spammer 
exploit new TLDs because they know that many anti-spam systems don't 
account for them correctly at first. (and/or maybe they are cheaper at 
first?). But in the longer term (years down the road).. they tend to 
move on to other ones, while the legit TLDs slowly increase. So this 
strategy can backfire in the long term. (but, of course, MMV... and some 
smaller hosters don't have to be as concerned about a few extra FPs)


--
Rob McEwen
+1 478-475-9032



Re: SpamAssassin Rules Regarding Abuse of New Top Level Domains

2015-10-20 Thread Kevin A. McGrail
If you have 3.4.1 and use sa-update then we add new tlds to a rule file that is 
then parsed.

This does not block those tlds.  It let's the engine recognize the urls for 
further rules.

If you have a tld that is missed and you are using 3.4.1 with sa-update, let us 
know.
Regards,
KAM

On October 14, 2015 3:37:58 PM PDT, sha...@shanew.net wrote:
>On Tue, 13 Oct 2015, Kevin A. McGrail wrote:
>
>> At the end of the day, if you are having problems with new TLDs, ONE
>solution
>> is to use something that uses SA 3.4.1 and has sa-update configured
>so you 
>> get updates with said new TLDs.
>
>I think maybe people are confused about how exactly this change helps
>them get rid of all the spam that's coming from the "new" TLDs.
>
>So, in other words, having just updated to 3.4.1, how does one go from
>having a list of all the new TLDs that can now be nicely maintained
>with sa-update to getting rules which actually score against the vast
>majority of the new TLDs (since most of them seem to be 99.99% spam)?
>
>I had created a local rule before moving to 3.4.1 that looks for new
>TLDs in the Received, From and EnvelopeFrom headers, but it was
>obvious that this wasn't going to scale well.  Did the new system in
>3.4.1 make this easier for me to do, or did it just make it possible
>for new TLDs to be handed off to RBLs and the like (not that that's
>not a major win)?
>
>Any elaboration (or a pointer to documentation (not the man page))
>would be greatly appreciated.
>
>-- 
>Public key #7BBC68D9 at| Shane Williams
>http://pgp.mit.edu/|  System Admin - UT CompSci
>=--+---
>All syllogisms contain three lines |  sha...@shanew.net
>Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: SpamAssassin Rules Regarding Abuse of New Top Level Domains

2015-10-20 Thread shanew

I've got 3.4.1 installed and sa-update runs regularly.

Unlike Larry (and others) I DO want to block the vast majority of the
new tlds, because we see nothing but spam from them (and my users tend
toward the more false-positives than false-negatives side of the
spectrum).  Rather than maintain a list of all the problematic tlds,
I'd rather have a blanket block rule with the ability whitelist the
handful that might be legit.

Is anyone doing anything like this (perhaps as a plugin)?


On Tue, 20 Oct 2015, Kevin A. McGrail wrote:


If you have 3.4.1 and use sa-update then we add new tlds to a rule file that
is then parsed.

This does not block those tlds. It let's the engine recognize the urls for
further rules.

If you have a tld that is missed and you are using 3.4.1 with sa-update, let
us know.
Regards,
KAM

On October 14, 2015 3:37:58 PM PDT, sha...@shanew.net wrote:

On Tue, 13 Oct 2015, Kevin A. McGrail wrote:
 At the end of the day, if you are having problems with new TLDs, ONE soluti
on
 is to use something that uses SA 3.4.1 and has sa-update configured so you
 get updates with said new TLDs.
I think maybe people are confused about how exactly this change helps
them get rid of all the spam that's coming from the "new" TLDs.
So, in other words, having just updated to 3.4.1, how does one go from
having a list of all the new TLDs that can now be nicely maintained
with sa-update to getting rules which actually score against the vast
majority of the new TLDs (since most of them seem to be 99.99% spam)?
I had created a local rule before moving to 3.4.1 that looks for new
TLDs in the Received, From and EnvelopeFrom
headers, but it was
obvious that this wasn't going to scale well.  Did the new system in
3.4.1 make this easier for me to do, or did it just make it possible
for new TLDs to be handed off to RBLs and the like (not that that's
not a major win)?
Any elaboration (or a pointer to documentation (not the man page))
would be greatly appreciated.





--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: Misbehaving HEADER_HOST_IN_BLACKLIST? And no SPF on SA list host?

2015-10-20 Thread RW
On Tue, 20 Oct 2015 11:58:11 -0700 (PDT)
John Hardin wrote:

> On Tue, 20 Oct 2015, Amir Caspi wrote:
> 
> > On Oct 19, 2015, at 1:16 PM, RW  wrote:
> >
> >> body   URI_HOST_IN_BLACKLISTeval:check_uri_host_in_blacklist()
> >> header HEADER_HOST_IN_BLACKLIST eval:check_uri_host_listed('BLACK')
> >>
> >> These appear to be the same thing. The first call is just a
> >> shorthand form for the second. I don't see where headers come into
> >> it. I think the second rule is probably just a mistake.
> >
> > So, following up on this... do any of the main devs see the second
> > rule as a problem?  It seems to be that a header rule shouldn't be
> > checking URI hosts, but even if so, it absolutely shouldn't be
> > hitting when those hosts aren't even in the headers (per the two
> > spamples I posted).
> 
> My default assumption for the behavior of a header eval() rule would
> be that it only checks message headers. If that's not the case (as
> you describe) then I'd agree the rule is a problem, especially if it
> leads to duplicate hits.
> 
> Whether that's a bug in the documentation, or a bug in the rules, or
> a bug in eval(), or a bug in the implementation of check_uri_host_*,
> I can't really say at this point.

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7256


Re: SpamAssassin Rules Regarding Abuse of New Top Level Domains

2015-10-20 Thread Axb

On 10/20/2015 10:04 PM, RW wrote:

On Tue, 20 Oct 2015 13:29:45 -0500 (CDT)
sha...@shanew.net wrote:



I already have rules that score for these tlds in received or envelope
from, but I'm getting tired of making the regular expression longer
and longer (in two different places), and I know there's a smarter
way.  Whether I'm smart enough to implement that smarter way is
another matter entirely.

Is there an existing (relatively simple) plugin that behaves similarly
that I could crib from?


You don't need a plugin, just autogenerate your rules from this:

http://data.iana.org/TLD/tlds-alpha-by-domain.txt


or put a choice of wildcarded TLDs in a rbldnsd zone and use a header 
check_rbl_envfrom rule for senders and URIBL.pm plugin lookups







Re: SpamAssassin Rules Regarding Abuse of New Top Level Domains

2015-10-20 Thread RW
On Tue, 20 Oct 2015 13:29:45 -0500 (CDT)
sha...@shanew.net wrote:


> I already have rules that score for these tlds in received or envelope
> from, but I'm getting tired of making the regular expression longer
> and longer (in two different places), and I know there's a smarter
> way.  Whether I'm smart enough to implement that smarter way is
> another matter entirely.
> 
> Is there an existing (relatively simple) plugin that behaves similarly
> that I could crib from?

You don't need a plugin, just autogenerate your rules from this:

http://data.iana.org/TLD/tlds-alpha-by-domain.txt


Re: Learning only on read emails?

2015-10-20 Thread RW
On Tue, 20 Oct 2015 15:14:42 +0300
Jari Fredriksson wrote:

> On 10/20/2015 12:41 AM, Ryan Coleman wrote:
> > Actually it makes absolute sense since I dump my spam into a folder
> > to be scanned as spam and anything that is still in my inbox, and
> > read, is indeed ham.
> >
> > I just have to re-investigate the ./new and ./cur folders to make
> > sure they will operate how I want. But if the email was delivered
> > to my phone and it moves (but not read) then it?s not an option.
> 
> cur and new folders work as supposed when the IMAP server is Courier, 
> but NOT when you use Dovecot.

How does it not work as expected? 


Re: Learning only on read emails?

2015-10-20 Thread Ryan Coleman

> On Oct 20, 2015, at 8:21 AM, RW  wrote:
> 
> On Tue, 20 Oct 2015 15:14:42 +0300
> Jari Fredriksson wrote:
> 
>> On 10/20/2015 12:41 AM, Ryan Coleman wrote:
>>> Actually it makes absolute sense since I dump my spam into a folder
>>> to be scanned as spam and anything that is still in my inbox, and
>>> read, is indeed ham.
>>> 
>>> I just have to re-investigate the ./new and ./cur folders to make
>>> sure they will operate how I want. But if the email was delivered
>>> to my phone and it moves (but not read) then it?s not an option.
>> 
>> cur and new folders work as supposed when the IMAP server is Courier, 
>> but NOT when you use Dovecot.
> 
> How does it not work as expected? 

I haven’t seen anything appear in the “new” folder, to be honest.