spamasssassin possible extended to support yara just like clamav 0.99 now do ?

2015-12-18 Thread Benny Pedersen
i think clamav is a very good virus scanner, but see more and more fokus 
on clamav begins to make spamscanning and reject of valid mails for 
maillists, i think its more time to think about it again ?


same fault maked in amavisd-new :(


Re: spamasssassin possible extended to support yara just like clamav 0.99 now do ?

2015-12-18 Thread Axb

On 12/18/2015 10:58 AM, Benny Pedersen wrote:

i think clamav is a very good virus scanner, but see more and more fokus
on clamav begins to make spamscanning and reject of valid mails for
maillists, i think its more time to think about it again ?

same fault maked in amavisd-new :(


Some third party signatures *may* focus on spam patterns - Default 
ClamAV sigs don't


SA could easily support YARA rules via a third party plugin. 
Contributions welcome


If you want to use Yara pattern rules to scan mail then you might as 
well use SA's meta rules.


Just because it's possible to "use" ClamAV as a spam detection engine, 
doesn't mean SA will mutate into being an AV engine.


redirector_pattern question

2015-12-18 Thread Paul Stead

After the messages last night I've been looking into the
redirector_pattern config option - I'm seeing weird results...

Given the redirector_pattern of:

redirector_pattern m'https?://www.google.com/url?q=([^&]+).*'i

I've noticed that spamassassin can sometimes miss - I don't think it's
to do with the regex above as I've tried multiple variations - I've made
it reproducible but I'm not sure why this is happening:

p1: http://pastebin.com/w93ZZp9h
p2: http://pastebin.com/p2pciGC3

# spamassassin -D -t < p1 2>&1 | grep baddomain
Dec 18 10:16:28.386 [30263] dbg: check: tagrun - tag URIHOSTS is now
ready, value: baddomain.com
Dec 18 10:16:28.386 [30263] dbg: check: tagrun - tag URIDOMAINS is now
ready, value: baddomain.com
Dec 18 10:16:28.386 [30263] dbg: uridnsbl: considering
host=baddomain.com, domain=baddomain.com

etc

# spaspamassassin -D -t < p2 2>&1 | grep baddomain

p2 doesn't pick up on baddomain.com

Any thoughts or have I stumbled upon a problem?

Paul
--
Paul Stead
Systems Engineer
Zen Internet


Re: Google redirects

2015-12-18 Thread Axb

On 12/18/2015 04:17 PM, Mark Martinec wrote:

On 2015-12-17 22:41, Axb wrote:

could you make a version using redirector_pattern so the redirected
target can be looked up via URIBL plugin?


Isn't this already the case? Redirect targets are added
to a list of URIs and are subject to same rules as
directly collected URIs.



I suggested converting the rawbody rule John was working on into a 
redirector_pattern




Re: Google redirects

2015-12-18 Thread Mark Martinec

On 2015-12-18 16:29, Axb wrote:

On 12/18/2015 04:17 PM, Mark Martinec wrote:

On 2015-12-17 22:41, Axb wrote:

could you make a version using redirector_pattern so the redirected
target can be looked up via URIBL plugin?


Isn't this already the case? Redirect targets are added
to a list of URIs and are subject to same rules as
directly collected URIs.



I suggested converting the rawbody rule John was working on into a
redirector_pattern


Note that the following rule as posted by John:

  uri __GOOG_MALWARE_DNLD 
m;^https?://[^/]*\.google\.com/[^?]*url\?.*[\?&]download=1;i


would not currently work as a redirector_pattern due to the problem
I posted in my today's reply (Re: redirector_pattern question);
i.e. where the redirector target contains "http:", followed
by other URI arguments (like "=1" here).

  Mark


Re: Google redirects

2015-12-18 Thread Mark Martinec

On 2015-12-17 22:41, Axb wrote:

could you make a version using redirector_pattern so the redirected
target can be looked up via URIBL plugin?


Isn't this already the case? Redirect targets are added
to a list of URIs and are subject to same rules as
directly collected URIs.

  Mark



Re: redirector_pattern question

2015-12-18 Thread Mark Martinec

On 2015-12-18 11:19, Paul Stead wrote:

After the messages last night I've been looking into the
redirector_pattern config option - I'm seeing weird results...

Given the redirector_pattern of:

redirector_pattern m'https?://www.google.com/url?q=([^&]+).*'i

I've noticed that spamassassin can sometimes miss - I don't think it's
to do with the regex above as I've tried multiple variations - I've 
made

it reproducible but I'm not sure why this is happening:

p1: http://pastebin.com/w93ZZp9h
p2: http://pastebin.com/p2pciGC3

[...]

# spaspamassassin -D -t < p2 2>&1 | grep baddomain

p2 doesn't pick up on baddomain.com

Any thoughts or have I stumbled upon a problem?



Two problems there, one is in your regexp, the other is in
the SpamAssassin logic of dealing with redirects.

The parameter of redirector_pattern is a regular expression,
dots and a question mark have a special meaning in a regexp.
If this special meaning is not wanted, they need to be quoted.
Also, SpamAssassin does not add anchoring to a redirector_pattern
regexp, so you need to add it yourself when needed.
And the .* at the end is redundant for this reason.

So it should be something like:

  redirector_pattern m{^https?://www\.google\.com/url\?q=([^&]+)}i


The other problem is in 
Mail::SpamAssassin::Util::uri_list_canonicalize()

near line 1346:

  # deal with http redirectors.  strip off one level of redirector
  # and add back to the array.  the foreach loop will go over those
  # and deal appropriately.
  # bug 3308: redirectors like yahoo only need one '/' ... 
  if ($rest =~ m{(https?:/{0,2}.+)$}i) {
push(@uris, $1);
  }

  # resort to redirector pattern matching if the generic https? 
check

  # doesn't result in a match -- bug 4176
  else {
foreach (@{$redirector_patterns}) {

Note that it tries the hard-coded check first, and skips
evaluating redirector patterns when the hard-coded match
was successful.

So your redirector pattern was not tried at all, and at the same time
the hard-coded check obtained an invalid URL: from an URL
  "/url?q=http://baddomain.com=1;
it collected as "http://baddomain.com=1; instead of 
"http://baddomain.com;

The URI syntax after a '?' should treat "=1" as a second argument
and not glue it with the first argument "http://baddomain.com;.

So the resulting invalid URL "http://baddomain.com=1;
does not match any URI rules for baddomain.com .


Please try the attached patch (to 3.4.1 or trunk), it reverses
the order of checks: tries redirector_patterns first, and only
falls back to a sloppy hard-coded check as a last resort.
(and that hard-coded check would better be fixed too)

... and please open a bug report in bugzilla.


MarkIndex: lib/Mail/SpamAssassin/Util.pm
===
--- lib/Mail/SpamAssassin/Util.pm	(revision 1720791)
+++ lib/Mail/SpamAssassin/Util.pm	(working copy)
@@ -1342,24 +1342,28 @@
   # deal with http redirectors.  strip off one level of redirector
   # and add back to the array.  the foreach loop will go over those
   # and deal appropriately.
-  # bug 3308: redirectors like yahoo only need one '/' ... 
-  if ($rest =~ m{(https?:/{0,2}.+)$}i) {
-push(@uris, $1);
-  }
 
-  # resort to redirector pattern matching if the generic https? check
-  # doesn't result in a match -- bug 4176
-  else {
-	foreach (@{$redirector_patterns}) {
-	  if ("$proto$host$rest" =~ $_) {
-	next unless defined $1;
-	dbg("uri: parsed uri pattern: $_");
-	dbg("uri: parsed uri found: $1 in redirector: $proto$host$rest");
-	push (@uris, $1);
-	last;
-	  }
-	}
+  # try redirector pattern matching first
+  # (but see also bug 4176)
+  my $found_redirector_match;
+  foreach my $re (@{$redirector_patterns}) {
+if ("$proto$host$rest" =~ $re) {
+  next unless defined $1;
+  dbg("uri: parsed uri pattern: $re");
+  dbg("uri: parsed uri found: $1 in redirector: $proto$host$rest");
+  push (@uris, $1);
+  $found_redirector_match = 1;
+  last;
+}
   }
+  if (!$found_redirector_match) {
+# try generic https? check if redirector pattern matching failed
+# bug 3308: redirectors like yahoo only need one '/' ... 
+if ($rest =~ m{(https?:/{0,2}.+)$}i) {
+  push(@uris, $1);
+  dbg("uri: parsed uri found: $1 in hard-coded redirector");
+}
+  }
 
   
   ## TVD: known issue, if host has multiple combinations of the following,


Re: Google redirects

2015-12-18 Thread John Hardin

On Fri, 18 Dec 2015, Mark Martinec wrote:


On 2015-12-18 16:29, Axb wrote:

 On 12/18/2015 04:17 PM, Mark Martinec wrote:
>  On 2015-12-17 22:41, Axb wrote:
> >  could you make a version using redirector_pattern so the redirected
> >  target can be looked up via URIBL plugin?
> 
>  Isn't this already the case? Redirect targets are added

>  to a list of URIs and are subject to same rules as
>  directly collected URIs.
> 


 I suggested converting the rawbody rule John was working on into a
 redirector_pattern


Note that the following rule as posted by John:

 uri __GOOG_MALWARE_DNLD 
m;^https?://[^/]*\.google\.com/[^?]*url\?.*[\?&]download=1;i


would not currently work as a redirector_pattern due to the problem
I posted in my today's reply (Re: redirector_pattern question);
i.e. where the redirector target contains "http:", followed
by other URI arguments (like "=1" here).


Right, and I would take that into account when composing the 
redirector_pattern. That extra bit is there to avoid treating *all* google 
redirects as malware downloads.


Question: has anyone ever seen a *legit* (non-spam, 
non-phishing, non-malware) google redirect like that in an email? Maybe 
this rule is too restrictive and we should be suspicious of *all* google 
redirects?


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
  does quite what I want. I wish Christopher Robin was here."
   -- Peter da Silva in a.s.r
---
 7 days until Christmas


Re: Google redirects

2015-12-18 Thread Joe Quinn

On 12/18/2015 11:32 AM, John Hardin wrote:

On Fri, 18 Dec 2015, Mark Martinec wrote:


On 2015-12-18 16:29, Axb wrote:

 On 12/18/2015 04:17 PM, Mark Martinec wrote:
>  On 2015-12-17 22:41, Axb wrote:
> >  could you make a version using redirector_pattern so the 
redirected

> >  target can be looked up via URIBL plugin?
> >  Isn't this already the case? Redirect targets are added
>  to a list of URIs and are subject to same rules as
>  directly collected URIs.
>
 I suggested converting the rawbody rule John was working on into a
 redirector_pattern


Note that the following rule as posted by John:

 uri __GOOG_MALWARE_DNLD 
m;^https?://[^/]*\.google\.com/[^?]*url\?.*[\?&]download=1;i


would not currently work as a redirector_pattern due to the problem
I posted in my today's reply (Re: redirector_pattern question);
i.e. where the redirector target contains "http:", followed
by other URI arguments (like "=1" here).


Right, and I would take that into account when composing the 
redirector_pattern. That extra bit is there to avoid treating *all* 
google redirects as malware downloads.


Question: has anyone ever seen a *legit* (non-spam, non-phishing, 
non-malware) google redirect like that in an email? Maybe this rule is 
too restrictive and we should be suspicious of *all* google redirects?


I do it occasionally, if I am sending a link to someone and I 
right-click -> "copy link location" on the search results. I'd be 
suspicious of those sorts of links, but not too suspicious.


Re: Google redirects

2015-12-18 Thread John Hardin

On Fri, 18 Dec 2015, Joe Quinn wrote:


On 12/18/2015 11:32 AM, John Hardin wrote:
> 
>  uri __GOOG_MALWARE_DNLD 
>  m;^https?://[^/]*\.google\.com/[^?]*url\?.*[\?&]download=1;i


 Question: has anyone ever seen a *legit* (non-spam, non-phishing,
 non-malware) google redirect like that in an email? Maybe this rule is too
 restrictive and we should be suspicious of *all* google redirects?

I do it occasionally, if I am sending a link to someone and I right-click -> 
"copy link location" on the search results. I'd be suspicious of those sorts 
of links, but not too suspicious.


It's already there as a subrule for masscheck eval and use in metas:

http://ruleqa.spamassassin.org/20151218-r1720729-n/__GOOG_REDIR/detail

SPAM%   HAM%S/O
0.3357  0.0288  0.921

~12% of spam hits are at <5 points.

It's meta'd for score in a couple of rules:

http://ruleqa.spamassassin.org/20151218-r1720729-n/GOOG_REDIR_SHORT/detail

http://ruleqa.spamassassin.org/20151218-r1720729-n/GOOG_REDIR_NORDNS/detail

...and those are hitting the bulk of the spams but they are not hitting 
the low-scoring spams.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
  does quite what I want. I wish Christopher Robin was here."
   -- Peter da Silva in a.s.r
---
 7 days until Christmas


Re: Google redirects

2015-12-18 Thread John Hardin

On Fri, 18 Dec 2015, Axb wrote:


On 12/18/2015 04:17 PM, Mark Martinec wrote:

 On 2015-12-17 22:41, Axb wrote:
>  could you make a version using redirector_pattern so the redirected
>  target can be looked up via URIBL plugin?

 Isn't this already the case? Redirect targets are added
 to a list of URIs and are subject to same rules as
 directly collected URIs.


I suggested converting the rawbody rule John was working on into a 
redirector_pattern


It doesn't appear to be needed. With your changes to the existing google 
redirector patterns I now get this in my debug log for Alex's original 
message:


ran uri rule __ALL_URI ==> got hit: 
"http://www.mediafire.com/download/izdq{snip};

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
  does quite what I want. I wish Christopher Robin was here."
   -- Peter da Silva in a.s.r
---
 7 days until Christmas


Re: Google redirects

2015-12-18 Thread John Hardin

On Thu, 17 Dec 2015, Axb wrote:


On 12/17/2015 10:38 PM, John Hardin wrote:

 On Thu, 17 Dec 2015, Axb wrote:

>  On 12/17/2015 09:15 PM, John Hardin wrote:
> >   On Thu, 17 Dec 2015, Alex wrote:
> > 
> > >   Hi,

> > > >   Can someone explain why spamassassin is allowing apparent google
> > >   redirects? Cryptolocker :-( This one's blocked now.
> > > >   > > > 
> >  href="https://www.google.com/url?q=http://www.mediafire.com/download/{snip};
> > 
> > > >   style="color: rgb(89, 143, 222);

> > >   outline: 0px;" target="_blank">1Z4566W50378875...
> > > >   #
> > > 
> >  href="https://www.google.com/url?q=http://www.mediafire.com/download/izdqjzml6
> > 
> > > >   rawbody GOOG_VIEW1

> > >   m;https?://www\.google\.com/url\?(q=http(s)?|sa=t\\;url=http);
> > >   describeGOOG_VIEW1Using google url
> > >   score   GOOG_VIEW16.0
> > > >   Ideas for improving the rule or making it more flexible would be
> > >   appreciated.
> > 
> >   There are google rules. I'll take a look at why this wasn't scored 
> >   when

> >   I get a chance later today or tomorrow.
> 
>  there's a bunch of Henry Stern's google redirector_pattern rules but

>  they're all made for http only.
>  Adding and commiting s? now

 And this in my sandbox, with a different pattern:

 uri __GOOG_MALWARE_DNLD
 m;^https?://[^/]*\.google\.com/[^?]*url\?.*[\?&]download=1;i

 I will broaden that a bit.


could you make a version using redirector_pattern so the redirected target 
can be looked up via URIBL plugin?


Sadly, there's nothing in the corpus that matches that rule. I think it'll 
be published, but with a low score.


http://ruleqa.spamassassin.org/20151218-r1720729-n/GOOG_MALWARE_DNLD/detail

Alex, if you still have some of those around, send them to me as RFC822 
attachments and I'll add them to my masscheck spam corpora.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
  does quite what I want. I wish Christopher Robin was here."
   -- Peter da Silva in a.s.r
---
 7 days until Christmas


Re: Google redirects

2015-12-18 Thread Alex
Hi,

>>>  I suggested converting the rawbody rule John was working on into a
>>>  redirector_pattern
>>
>>
>> Note that the following rule as posted by John:
>>
>>  uri __GOOG_MALWARE_DNLD
>> m;^https?://[^/]*\.google\.com/[^?]*url\?.*[\?&]download=1;i
>>
>> would not currently work as a redirector_pattern due to the problem
>> I posted in my today's reply (Re: redirector_pattern question);
>> i.e. where the redirector target contains "http:", followed
>> by other URI arguments (like "=1" here).
>
> Right, and I would take that into account when composing the
> redirector_pattern. That extra bit is there to avoid treating *all* google
> redirects as malware downloads.
>
> Question: has anyone ever seen a *legit* (non-spam, non-phishing,
> non-malware) google redirect like that in an email? Maybe this rule is too
> restrictive and we should be suspicious of *all* google redirects?

I've forwarded you a few samples.

I'm not entirely sure I've kept up with the pieces of this. Has a rule
yet been developed? Is both a rule and Marc's patch required? After
the patch was posted, there was a comment about the redirector_pattern
not being necessary...

Thanks,
Alex


Re: redirector_pattern question

2015-12-18 Thread Paul Stead



On 18/12/15 14:23, Mark Martinec wrote:.


The parameter of redirector_pattern is a regular expression,
dots and a question mark have a special meaning in a regexp.


Oops I had escaped characters and anchors, in my numerous iterations
testing I guess they got lost


... and please open a bug report in bugzilla.


Mark

Patch looks to work.. Done - thanks!
--
Paul Stead
Systems Engineer
Zen Internet


Re: Customized header (add_header) doesn't work

2015-12-18 Thread listsb-spamassassin
On Dec 17, 2015, at 13.16, Alfredo Saldanha  wrote:
> 
> My second SA is a Zimbra server.
> I use Zimbra SA only to drop the message in junk folder.
> I don't want to clean at the Zimbra server, it is default behavior.

for what it's worth, if you were to use amavis rather than a milter, you could 
just deliver mail directly from amavis to the zimbra mailbox server [bypassing 
the zimbra mta which you don't need in term of this] via lmtp [typically port 
7025], and it would just work.  none of the attempting to trick one instance of 
spamassassin with another instance of spamassasin.  additionally, depending on 
how you're using your existing postfix server, you likely don't even need the 
zimbra mta at all.

-ben

Re: Google redirects

2015-12-18 Thread John Hardin

On Fri, 18 Dec 2015, Alex wrote:


 I suggested converting the rawbody rule John was working on into a
 redirector_pattern


Note that the following rule as posted by John:

 uri __GOOG_MALWARE_DNLD
m;^https?://[^/]*\.google\.com/[^?]*url\?.*[\?&]download=1;i

would not currently work as a redirector_pattern due to the problem
I posted in my today's reply (Re: redirector_pattern question);
i.e. where the redirector target contains "http:", followed
by other URI arguments (like "=1" here).


Right, and I would take that into account when composing the
redirector_pattern. That extra bit is there to avoid treating *all* google
redirects as malware downloads.

Question: has anyone ever seen a *legit* (non-spam, non-phishing,
non-malware) google redirect like that in an email? Maybe this rule is too
restrictive and we should be suspicious of *all* google redirects?


I've forwarded you a few samples.


Thanks.


I'm not entirely sure I've kept up with the pieces of this. Has a rule
yet been developed?


I've relaxed my google malware redirect rule (above) to match your sample. 
It will go out the next time rules pass masscheck. The corpus looks 
well-fed today so that *should* occur overnight.



Is both a rule and Marc's patch required?


I re-ran a test against your original sample after the other Alex edited 
the existing google redirect patterns to also match https but before the 
pattern order patch was committed and it did pull out the malware download 
URL, so that should allow URIBL to see the download hostname (again, 
pending rules being published from masscheck) and I don't think the patch 
matters in this case.


However, that only helps if the download is being hosted by a site that 
hits URIBL et. al. (or some other rule) and I don't think 
www.mediafire.com will be listed, so yes, a scored rule that matches that 
pattern is necessary in addition to the patch.


As soon as masscheck publishes an update, that redirect will get at least 
one point; possibly more after your spamples are in the corpus and that 
rule starts getting some fresh spam hits.


After the patch was posted, there was a comment about the 
redirector_pattern not being necessary...


Yeah, the existing google redirect pattern for "url=" did work when it was 
broadened to include https, so my rule doesn't need to be used as the 
basis for another new redirect pattern.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
  does quite what I want. I wish Christopher Robin was here."
   -- Peter da Silva in a.s.r
---
 7 days until Christmas