Re: Why single periods in regex in spamassassin rules?

2021-04-25 Thread Joe Quinn

On 4/23/21 2:52 PM, David B Funk wrote:

On Fri, 23 Apr 2021, Steve Dondley wrote:


I'm looking at KAM.cf. There is this rule:

body    __KAM_WEB2  /INDIA based 
IT|indian.based.website|certified.it.company/i


I'm wondering if there is a good reason why a singe period is used 
instead of something like \s+ which would catch multiple spaces 
whereas a singe period doesn't.


Because '/indian.based.website'/ will match 'indian-based_website' but 
\s will not.



This is the real reason (or at least, it was for all of my contributions 
to KAM.cf). I was also concerned about tricks like , which is 
visibly a space but has all the technical characteristics of 
non-whitespace. Using "." was easier than knowing everything about 
unicode codepoints.




Re: EX_IOERR

2017-05-29 Thread Joe Quinn

On 5/28/2017 10:59 AM, Cecil Westerhof wrote:

On Sunday 28 May 2017 14:50 CEST, Joe Quinn wrote:


On 5/28/2017 2:11 AM, Cecil Westerhof wrote:

When executing:
spamc -L spam 
It looks like EX_IOERR simply refers to the fact that some process
exited with status 74. Restart spamd with the -D option so you get
debugging output, and it should be easier to narrow it down to a
specific cause.

That gave me:
 spamd: service unavailable: TELL commands are not enabled, set the 
--allow-tell switch.

I added --allow-tell in spamassassin.service and it works.

Thanks.


Yay!



Re: EX_IOERR

2017-05-28 Thread Joe Quinn

On 5/28/2017 2:11 AM, Cecil Westerhof wrote:

When executing:
 spamc -L spam It looks like EX_IOERR simply refers to the fact that some process 
exited with status 74. Restart spamd with the -D option so you get 
debugging output, and it should be easier to narrow it down to a 
specific cause.




Re: Strange audio spam

2017-05-08 Thread Joe Quinn

On 5/5/2017 8:53 PM, do...@mail.com wrote:

I received this very unusual email a few days ago. It (or another
email), timed out my spamassassin check (which is a first).

I'm including the full text of the spam below along with all of the
headers.

I'm interested if this mail is legit, or if it's just a new trap.
I have skipped through parts of the audio (play as user nobody :)  and
there is no voice, or discernible instrument; just a bunch of tones and
really bad synthetic sounding drums.

I don't even have an idea why someone would listen to this...

I can send you the whole mp3, but I've opted to just send the md5sum for
now since the file is 10MiB. The md5 sum is
3fec277311e73175c6f49b70d8a063e8 .

The email also contains an html part (identical to the text part in
content), and 8 images; 1 jpeg and 7 png. These include a facebook and
twitter buttons.

Thanks,
David



Return-Path: 
Received: from racolage.xxx ([216.51.232.227]) by mx.mail.com
(mxgmxus005 [74.208.5.20]) with ESMTP (Nemesis) id
0MBmC1-1dGJ253K3r-00AlEr for ; Tue, 02 May 2017
15:42:19 +0200 Received: from [127.0.0.1] (localhost.localdomain
[127.0.0.1]) by racolage.xxx (Postfix) with ESMTP id CEC563060E55
  for ; Tue,  2 May 2017 09:42:16 -0400 (EDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=racolage.xxx;
s=mail; t=1493732537; bh=mjg3vHGJXalwbtWTwqzRztpRTwhvBrVGp+58Vhw6DJM=;
  h=List-Unsubscribe:From:To:Subject:Date:From;
  b=l6O3++WGARbyASNz/FZWqZJB3Ghdyx0pzy7CtiM9O4viBfiayWejyZEi1dXy3lT6t
   FjOmZGb7hzymCJ4TcIcUCBPEkEVUqcb1YRn0YyqQ0Zn/9YYoVqvXZIrFHIlAj5fZWN
   PzyyhGyAeRJaJ18acQAVhtNz79xeH3CPYyyGGjIA=
Content-Type: multipart/mixed;
  boundary="sinikael-?=_1-14937325368410.12218541851445819"
List-Unsubscribe: http://racolage.xxx/unsubscribe.html
Precedence: bulk
Feedback-ID: release1:racolage.xxx
From: racolage.xxx ⛅ ⚡ 
To: do...@mail.com
Subject: AUDIO TRACK #1 | Contact Person - Your Email Address Was
Selected Message-ID: 
X-Mailer: nodemailer (2.7.2; +https://nodemailer.com/;
  SMTP/2.7.2[client:2.12.0])
Date: 05/02/2017(Tue) 09:42
MIME-Version: 1.0
Envelope-To: 
X-GMX-Antispam: 0 (Mail was not recognized as spam); Detail=V3;
X-GMX-Antivirus: 0 (no virus found)
X-UI-Filterresults:





YOU HAVE RECEIVED A TRACK <<
  CHECK THE ATTACHMENT!!!  <<

Contact Person - Your Email Address Was Selected

Underprocecessed ultrasonic glitch bossanova (low bitrate mix specially
for racolage.xxx). CREDIT: written  produced in moscow 2014-2017


YOU HAVE RECEIVED A TRACK <<
  CHECK THE ATTACHMENT!!!  <<

Released by : http://racolage.xxx
facebook : https://www.facebook.com/racolage/
twitter : https://twitter.com/racolagexxx
contact : cont...@racolage.xxx
unsubscribe : http://racolage.xxx/unsubscribe.html


The .xxx TLD was made to separate porn from the general internet, so 
it's unlikely that this is legit.




Re: SpamAssassin score

2017-03-20 Thread Joe Quinn

On 3/20/2017 6:37 AM, Bernard wrote:


Thanks for that information.

After ~1750 messages having been digested, still no improvement:
0.000  0  3  0  non-token data: bayes db version
0.000  0 23  0  non-token data: nspam
0.000  0   1729  0  non-token data: nham
0.000  0 123471  0  non-token data: ntokens
0.000  0 1358530476  0  non-token data: oldest atime
0.000  0 1489938564  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal 
sync atime

0.000  0  0  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire 
atime delta
0.000  0  0  0  non-token data: last expire 
reduction count


Have you got an idea of the required order of magnitude of the input 
volume for the bayesian filter to kick in?

---
Bernard

On 20/03/2017 11:15, Reindl Harald wrote:



Am 20.03.2017 um 11:12 schrieb Bernard:

 1. How come the same message being classified either as spam/ham
returns the same score? I would expect a message learnt as 
'spam' to

get a score at least equal to the spam score threshold
 2. Even though the message was correctly learnt as spam before and
after the test, receiving this email message is still not tagged as
spam:

X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on ***
X-Spam-Level: **
X-Spam-Status: No, score=2.1 required=5.0 
tests=MISSING_HEADERS,SPF_FAIL,

SPF_HELO_FAIL autolearn=no autolearn_force=no version=3.4.0

Am I missing something?


yes, tarin your bayers properly with enough spam *and* ham samples 
and train the bayes wihich is really in use - do you see any BAYES_ 
tag above? no! so bayes was not used at all


You need to train more than 23 messages as ham first. Read the 
documentation in the SA manpages and on the wiki to make sure you meet 
every criteria for running bayes.




Re: List of legit mass mailers

2017-03-08 Thread Joe Quinn

On 3/8/2017 9:39 AM, @lbutlr wrote:

On 2017-03-08 (07:23 MST), Ruga  wrote:

This is spamassassin...
We are against mass mailers.

That’s absurd. No one with any sense at all is against mass mailers.


If you measure "mass mailer" by volume of distribution, apache.org 
easily qualifies.




Re: Custom rule not applied when running Postfix + SA

2017-02-20 Thread Joe Quinn

On 2/20/2017 6:54 AM, aquilinux wrote:
Hi all, i noticed that a custom rule i created (in 
/etc/spamassassin/local.cf ) is not applied in the 
regular postfix + spamassassin flow but it is when i pipe the mail to 
spamc or spamassassin.


1) normal flow with postfix

spamassassinunix-   n   n   -   30   pipe
flags=Rq user=spamd argv=/usr/bin/spamc -u ${recipient} -e 
/usr/sbin/sendmail -oi -f ${sender} ${recipient}


X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on myserver
X-Spam-Level:
X-Spam-Status: No, score=-1.0 required=5.0 tests=RCVD_IN_DNSWL_LOW
autolearn=ham autolearn_force=no version=3.4.0
X-Spam-Report:
* -1.0 RCVD_IN_DNSWL_LOW RBL: Sender listed at 
http://www.dnswl.org/, low

*  trust
*  [108.59.11.79 listed in list.dnswl.org 
]



2) cat 1487115381.M993470P12484.ne254\,S\=4827\,W\=4936\:2\,S | spamc

X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on myserver
X-Spam-Flag: YES
X-Spam-Level: ***
X-Spam-Status: Yes, score=7.0 required=5.0 tests=MDSPAM,RCVD_IN_DNSWL_LOW
autolearn=no autolearn_force=no version=3.4.0
X-Spam-Report:
* -1.0 RCVD_IN_DNSWL_LOW RBL: Sender listed at 
http://www.dnswl.org/, low

*  trust
*  [108.59.11.79 listed in list.dnswl.org 
]

*  8.0 MDSPAM No description available.

3) spamassassin -t < 
1487115381.M993470P12484.ne254\,S\=4827\,W\=4936\:2\,S


Content analysis details:   (7.0 points, 5.0 required)

 pts rule name  description
 -- 
--
-1.0 RCVD_IN_DNSWL_LOW  RBL: Sender listed at 
http://www.dnswl.org/, low

trust
[108.59.11.79 listed in list.dnswl.org 
]

 8.0 MDSPAM No description available.


What is happening here?

Thanks for helping.
Regards,

--
"Madness, like small fish, runs in hosts, in vast numbers of instances."

Nessuno mi pettina bene come il vento.R


Make sure you restart spamd after changing the rule, perhaps?



Re: Uninitialized values in URIDNSBL

2017-02-08 Thread Joe Quinn

On 2/8/2017 2:58 PM, Kevin A. McGrail wrote:

On February 8, 2017 2:27:56 PM EST, Alex  wrote:

Hi,

On Wed, Feb 8, 2017 at 2:08 PM, Kevin A. McGrail  wrote:

On 2/8/2017 1:22 PM, Philip Prindeville wrote:

While we’re waiting for that, can I just grab Util.pm
 and Plugin/URIDNSBL.pm
 out of trunk, or are there more
dependencies than that to splice the fix back into 3.4.1? 


I wouldn't be able to say. EIther custom patch or run trunk
would be my recommendation. 



You recommend trunk (4.0.0) at this time, not the latest 3.4.2 branch?


I was not aware trunk had been diverted. So 3.4.2 branch.
Regards,
KAM 


To my memory, the two branches are currently largely the same. 3.4.2 is 
technically closer as it's going to be in RC before 4.0.0, but you could 
get away with either.




Re: RFC compliance pedantry (was Re: New type of monstrosity)

2017-02-08 Thread Joe Quinn

On 2/8/2017 1:36 PM, Philip Prindeville wrote:

Having been through the process of authoring 2 RFC’s, perhaps I can shed some 
light on the process for you.

All proposed standards started life as draft RFC’s (this was before the days of 
IDEA’s but after the days of IEN’s).

If it were validated by the working group and passed up to the IAB and they 
concurred (they usually deferred to the WG except on editorial matters), then 
the proposed draft was issued officially as an RFC and given a number.

Later, after it accepted wide enough adoption in the Internet community, an 
existing RFC might be promoted to “standard” from “experimental”, etc.

Occasionally, if a WG (working group) did enough reference implementations and 
proved them at one or more interoperability meetings (the so-called 
“bake-offs”), then the WG could petition for immediate labeling as a “standard” 
when the RFC was approved by the IAB.

It’s even possible for a standard (like RFC-1035) to have both “standard” parts 
(like A RR’s) and “experimental” parts (like MB RR’s).



On Feb 8, 2017, at 7:04 AM, Ruga  wrote:

Read the headers of RFCs; some o them are explicitly  labeled as standard. Most 
of them are request for comments.


On Wed, Feb 8, 2017 at 2:58 PM, Kevin A. McGrail <'kmcgr...@pccc.com'> wrote:

On 2/8/2017 8:52 AM, Ruga wrote:

Not all RFCs are standards.
Educate yourself.

The personal attacks aren't necessary. These RFCs are the basis for
effectively 100% of the email on the planet for decades. If that's not
a standard, what is?


This bears some emphasis, actually. Going from experimental to standard 
comes /after/ the implementations are used in practice and proven to be 
useful. Beyond that, SA is not a standards checker or an RFC checker or 
an IEEE checker. All it does is classify email as wanted or not wanted. 
A large class of wanted email comes with the "undisclosed recipients" 
header. A large class of wanted email comes from domains that lack SPF. 
A smaller class of wanted email comes from the actual manufacturer of 
Viagra. Some mail servers disregard some standards entirely. You just 
have to deal with it.


As Dianne points out, the "undisclosed recipients" to header is valid 
under RFC822, which has been itself expanded on in multiple subsequent 
RFCs. As multiple other people here have mentioned, the "undisclosed 
recipients" to header is used in wanted email. I am right now two clicks 
away from adding it to this email with my mail client. It is an 
implementation detail of BCC, and unambiguously is not spam indicator on 
its own.




Re: Custom rule problem

2017-01-31 Thread Joe Quinn

On 1/31/2017 3:22 PM, Zinski, Steve wrote:

Sorry for the trouble, everyone… I had been forwarding the spam through my 
personal IMAP account (to test my rule) which was apparently blocking it. I 
forwarded it using my gmail account and my new rule fired. I feel like an idiot.

Steve

I suggest you work on setting things up so you can break down each part 
individually. Mail flow is not always simple thing to keep track of, 
even when you have good tools.


Re: List of trusted senders

2017-01-25 Thread Joe Quinn

On 1/25/2017 11:03 AM, Benny Pedersen wrote:

Kevin A. McGrail skrev den 2017-01-25 16:46:

On 1/25/2017 9:10 AM, David Jones wrote:

Could we build a tool like masscheck to help extend these
entries for trusted senders that are known to maintain
proper SPF, DKIM, DMARC with valid opt-out processing?

Off the cuff, this sounds like the concept of more than a few 
whitelist RBLs.


dkim is domain based, spf and dmarc is ip based, so not really easy to 
use a ip based rbl :=)


one day spamassassin supports dmarc it would change, hope it will do 
arc testing where it imho is more simple then its is today


i have personly not make local rbls that is ip based, all is for me 
just domains, not usefull to block ip, and then ask for help if its 
diffeerent spamming domain that relay from a good ip


dwl / swl on spamhaus is currently empty, still lots of dynamic ips 
missing in ther pbl listnings, hmm


As a side note, spamassassin will not be able to fully implement DMARC. 
Part of a valid implementation involves being able to notify the 
authentic sender of when they are being forged, which involves sending a 
new email.




Re: Ignore third-party SA headers

2017-01-25 Thread Joe Quinn

On 1/25/2017 10:48 AM, Ruga wrote:

SA runs as follows.

master.cf, last line of section smtp:
>   -o content_filter=spamcheck

spamcheck unix - n n - 10 pipe
   flags=Rq
   user=spamd
   argv=/usr/sbin/spamc 
   --dest=127.0.0.1 --port=783 --filter-retries=3 --filter-retry-sleep=2
   --headers
   --pipe-to /usr/sbin/sendmail 
 -G -i -f ${sender} -- 
${recipient}







spam that already includes SA headers is getting through without 
local SA filtering. Is it posible to tell the local SA to always add 
its own headers, possibly taking note of the existence of former 
SA headers while rewriting them out of the way?


The spam contains the following header, generated by a third-party relay:

X-Spam-Flag: YES
X-Spam-Score: 15.015
X-Spam-Level: ***
X-Spam-Status: Yes, score=15.015 tagged_above=- required=7
tests=[DKIM_SIGNED=-0.1, DKIM_VALID=-0.01, DKIM_VERIFIED=-0.01,
INVALUEMENT_SIP=4, RCVD_IN_BL=0.01, RCVD_IN_MANY_BL=2,
RCVD_IN_SORBS_SPAM=0.5, RCVD_IN_TWO_BL=1, RCVD_IN_UCEPROTECT1=1,
RCVD_IN_UCEPROTECT2=1, RCVD_IN_UCEPROTECT3=1, RCVD_IN_UNSUBSCORE=2,
SUBJ_ALL_CAPS=1.625, TO_NO_BRKTS_NOTLIST=1] autolearn=disabled

Why SA accepts the third-party X-Spam header instead of producing its own?


What is spamcheck?



Re: Ignore third-party SA headers

2017-01-23 Thread Joe Quinn

On 1/23/2017 5:43 PM, Ruga wrote:
spam that already includes SA headers is getting through without local 
SA filtering. Is it posible to tell the local SA to always add its own 
headers, possibly taking note of the existence of former SA headers 
while rewriting them out of the way?


SA never short-circuits from pre-existing headers. Look at where your 
mailflow calls SA (postfix, amavis, mimedefang, etc).




Re: Asynchronous plugin skeleton needed

2017-01-19 Thread Joe Quinn

On 1/19/2017 1:48 AM, Pedro David Marco wrote:



>You should be able to use the other asynchronous plugins as a reference
>
>as well.

Thanks... but i cannot find documentation about thinks like 
"register_async_rule_start()" for example...  can anyone point to me 
where is it documented, please?


Thanks!

Pedro.



Look in ./lib/Mail/SpamAssassin/Dns.pm. I would also recommend getting 
comfortable navigating the code in general, because you're going to be 
looking at parts of SA that only developers touch.




Re: Asynchronous plugin skeleton needed

2017-01-18 Thread Joe Quinn

On 1/18/2017 7:08 AM, Kiwi User wrote:

On Wed, 2017-01-18 at 11:36 +, Pedro David Marco wrote:

I would like to write a simple plugin to check some local Databases
(cannot use rbldnsd) that takes long so making it asynchronous seems
the best idea..
If possible, can anyone provide any skeleton, please?


Local databases SQL, i.e. MariaDB, PostgreSQL or Derby should be quite
fast enough without going to the trouble of writing an asynchronous
plugin. If you'd like a copy of the plugin I developed and use with
PostgreSQL, just ask.

My solution uses my PostgreSQL based mail archive, working on the
principle that anybody I'be previously sent mail to can be safely
whitelisted, but of course it would work equally well if the database
is a single table containing a list of addresses to be whitelisted and
could be easily extended to handle blacklisting as well.

Martin


You should be able to use the other asynchronous plugins as a reference 
as well.




Re: how to enable autolearn?

2017-01-09 Thread Joe Quinn

On 1/9/2017 6:01 PM, Linda Walsh wrote:

John Hardin wrote:

On Mon, 9 Jan 2017, L A Walsh wrote:

I have:
   bayes_auto_learn_threshold_nonspam -5.0
   bayes_auto_learn_threshold_spam 10.0
in my user_prefs. When I get a message though, I see autolearn being 
set to 'no':
  X-Spam-Status: Yes, score=18.7 req=4.8..autolearn=no 
autolearn_force=no

Shouldn't a score of 18.7 trigger an autolearn?


Not all rules contribute to the score used for the autolearn 
decision. Particularly, the BAYES rules don't contribute to the 
autolearning decision in order to avoid positive feedback loops.


   That's why my "bayes_auto_learn" thresholds were fairly high.

So why is it called bayes_auto_learn_threshold if it isn't used for
auto-learning?  Isn't that a bit confusing?


It is used for auto-learning. That threshold just doesn't sum up every 
rule to prevent several classes of "my auto-learning is giving this a 
ridiculous score" problems.




Re: Bayes scoring and role accounts

2016-11-21 Thread Joe Quinn

On 11/21/2016 11:27 AM, Karl Denninger wrote:


On 11/21/2016 10:12, Karl Denninger wrote:
I'm using SpamAssassin on a system that uses Postfix for MTA and 
Dovecot for handling final delivery.  Spamassassin is being called 
via Postfix through spamd with:


#
# Spam Assassin bayesian filter updaters
#
sa-spam unix-   n   n   -   -   pipe 
user=spamd:spamd argv=/usr/local/bin/sa-wrapper.pl spam ${sender}
sa-ham  unix-   n   n   -   -   pipe 
user=spamd:spamd argv=/usr/local/bin/sa-wrapper.pl ham ${sender}


I have a material number of role accounts on the box that are all 
aliased to the various places they need to go.  Most of these do not 
have entries in /etc/passwd, that is, they're not real login accounts.


The issue is that if I am reading the code correctly my particular 
Bayes database (for "karl") is not being consulted, and can't be, for 
anything that comes into a role account since the user side of the 
email address is (obviously) not altered in the message.  As a result 
I have the rulesets, but none of the "training" that individual Bayes 
recognition would provide, nor is there any way for that training to 
take place since none of these accounts are "real".


sa-learn --dump magic -u karl shows the expected (large) number of 
tokens in the database, but the same command targeting any of the 
role account names shows nearly nothing (which isn't surprising since 
they're role accounts and not real user logins.)


How have people dealt with this -- or do they?


To add to this the way the bayes database gets built (other than via 
auto-add) is from anything that a user sticks in the "Junk" folder.  
There is a cron job that runs every hour that runs sa-learn against 
that and then moves anything it finds in there to a "Junk-Saved" 
folder, expiring anything older than 14 days from that folder (so spam 
emails are held for 2 weeks.)  Dovecot is configured to deliver 
confirmed spam to the "Junk" folder as well.


Is the best way to handle role accounts to (1) create a "dummy" user 
account for them and (2) have the script that runs sa-learn add spam 
to not only the target's account but also, if the target is a role 
account, to each of the role account's database entries as well?  
That's a somewhat-messy maintenance job if/when role accounts are 
added/removed/changed, but it appears to be the only way to accomplish 
the goal.


--
Karl Denninger
k...@denninger.net 
/The Market Ticker/
/[S/MIME encrypted email preferred]/


I can't speak for specifically making it work with Postfix, but you 
usually want a site-wide Bayes database. No matter what (real or fake) 
user is receiving the message, it would get trained as the spamd user, 
or whatever ends up running SA. That same user runs SA and reads that 
appropriate database, which gets training from everyone and classifies 
based on a much more statistically useful volume of data.




Re: uceprotect issue

2016-11-05 Thread Joe Quinn

On 11/4/2016 11:03 AM, Dianne Skoll wrote:

On Fri, 4 Nov 2016 12:23:16 +0100
Holger Schramm  wrote:


If you don't like them, don't use their services. It is really that
easy.

It's not that easy.  If you provide email services to a large number
of people and someone they are trying to correspond with uses UCEPROTECT,
you are basically at the mercy of UCEPROTECT.  There's no accountability,
and your customers are not going to be interested in any sort of
discussion; they'll just want their damned emails to go through NOW.

Shady blocklists can cause all sorts of headaches as people with inadequate
spam filtering desperately use any and all blocklists available, regardless
of the collateral damage.


I trust _none_ of them. Do you know the people of any other blacklist?
Who assures you that there is not a crazy monkey in the background
doing some strange stuff with the listings? Nobody.

You are right.  I don't trust any blocklist.  But some of the bigger ones
such as SpamHaus seem to operate on a more professional and responsible
level than some of the crazier ones.

Regards,

Dianne.
I always look at the process, and part of it involves removing barriers 
to keeping the list accurate. Charging to be delisted faster says that 
when most people pay for delisting it's a money-making scheme. When most 
people don't pay, it says they are not prompt to delist false positives. 
Either way, the poor process makes their list suspect. As long as other 
lists like SpamHaus follow a process that operates for the good of the 
list, I'll happily use them.


https://en.wikipedia.org/wiki/The_Spamhaus_Project#Conflicts
And this is the benefit you get as an organization. You can get Google 
to budge, you can survive lawsuits and recover legal costs, and people 
who attack your network get arrested.


Re: uceprotect issue

2016-11-02 Thread Joe Quinn

On 11/2/2016 2:46 PM, Marc Stürmer wrote:

 Zitat von Marco :

Sorry, I know this is not uceprotect list, but I don't know how to 
contact uceprotect, their contact form is unavailable.


It seems the problem starts on 30 october. Did you have noticed too 
something about?


UCE Protect has a very questionable reputation, foremost reason is 
that they do charge money for delisting entries.


And no one knows who's behind them, since they do not publish this 
kind of information. They want to stay anonymous, that's why there is 
no easy way to concat them on their home page.


So you should really ask yourself: why do you trust them?

I have to agree. Their inscrutability extends deep into the 
public-facing parts of their infrastructure. Their MX doesn't have any 
registrant information in their whois, and their DNS provider doesn't 
even have a website. Their own domain uses a whois privacy service, and 
that service's website is a single page for submitting email 
non-delivery reports /to UCE Protect/. It's ridiculous.




Re: How to create a URIBL

2016-10-18 Thread Joe Quinn

On 10/18/2016 6:21 PM, Alex wrote:

Hi,

I've collected a bunch of URIs that I'd like to incorporate into my
rulebase. I know how to create a DNSBL, but I don't specifically know
how to create a URIBL. Can I use rbldnsd for this? Or would I have to
extract the IP or hostname from the URL, then also use a bunch of uri
rules? If so, is there a way of automating this, given a list of URIs?

For example, I have URIs like:

http://109.73.134.241/dgq01px
http://51steel1.org/s4b5ztgcx
http://amessofblues1.com/m0dqfx

I'm also then not sure which of uri* rule definition should be used.
I've used urirhsbl before for a local host blocklist, but now after
reading the man page again for the first time in a while, I'm not even
sure that's correct.

I'm also unclear about rbldnsd config for dnset, where hostnames would
be used. Here is my current command-line:

/usr/sbin/rbldnsd -n -srbldnsd.stats -r/var/lib/rbldnsd -f -n -b
66.123.123.106/53 uri.example.com:dnset:urilist

My urilist file looks like this:

:127.0.0.2:Blocked System: http://example.com/bl?$
$NS 1w uri.example.com
$SOA 1w uri.example.com admin.uri.example.com 0 2h 2h 1w 1h
@ A 66.123.123.106
@ MX 10 uri.example.com
@ TXT "example hostname blocklist"
25z5g623wpqpdwis.onion1.to:127.0.0.2:Blocked System, Last-Attack: 1476825181
27lelchgcvs2wpm7.3lhjyx1.top:127.0.0.2:Blocked System, Last-Attack: 1476825181
27lelchgcvs2wpm7.7jiff71.top:127.0.0.2:Blocked System, Last-Attack: 1476825181

Using the following (and variations, including dig +short) fail with NXDOMAIN
# host 25z5g623wpqpdwis.onion1.to.uri.example.com 66.123.123.106

Can someone show me an example zone file using the dnset option?

I'm guessing my first attempt at this message being received by the
list was due to the domain samples I've included, so they've been
modified.

Any ideas greatly appreciated.
Thanks,
Alex


rbldnsd is still suitable for this, as the DNS lookups are fundamentally 
just mapping strings to IPs. Getting too deep into it is outside SA's 
scope, but the only real difference between an IP rbl and a domain rbl 
is that IP rbls tend to reverse the IP so the most significant octet is 
the most significant subdomain.


On the rules side of things there's multiple different ways to write uri 
rules that match against a dns lookup. Some of them are looking for 
nxdomain vs anything else, some of them can look for particular IPs, 
etc. Just look for the existing RBL that's most similar to what you are 
looking to create.




Re: Persistent phishing attacks with word/pdf macros

2016-10-04 Thread Joe Quinn

On 10/4/2016 12:37 PM, Alex wrote:

Hi Joe, do you recall more specifically the subject or location of
this conversation regarding using perl and mimedefang to deal with
word macros?

I recall something from Feb 2015, but I don't know how to parlay that
into something usable with amavis and perl...



Keep replies on list.

Having trouble finding it again, but I recognize the code in this:
http://lists.roaringpenguin.com/pipermail/mimedefang/2016-February/037750.html

It only detects macros in the old format, but .docm and similar are 
renamed zip files and all the macros are confined to one file that you 
can look for in the index.


Re: Persistent phishing attacks with word/pdf macros

2016-10-04 Thread Joe Quinn

On 10/3/2016 4:30 PM, John Hardin wrote:

On Mon, 3 Oct 2016, Axb wrote:


On 10/03/2016 09:03 PM, John Hardin wrote:

 On Mon, 3 Oct 2016, Axb wrote:

>  On 10/03/2016 07:46 PM, Alex wrote:
> >   Hi,
> > > >   These are a real concern. If you receive any kind of real 
mail > >   volume,
> >   you're receiving these too, and they're not always being 
caught by

> >   RBLs or virus scanners. Or even our well-trained bayes.
> > > >   http://pastebin.com/YhLBqpKm
> > > >   I used to have some rules that would reliably block them, 
but they're

> >   not performing well now at all.
> > > >   I'm posting this in hopes someone has some other ideas, as 
well as to

> >   raise awareness about their existence.
> > > >   Ideas greatly appreciated.
> >  SA isn't the right tool to detect virus infected attachments

 Agreed, but *phishing* PDFs are appropriate to detect, as are 419 scam
 PDFs (which I am starting to see).


John,

That sample has an attached bulk_inquiry_317141.doc
not a PDF.


Yeah. I was (too) quickly responding to "phishing" and "PDF" in the 
subject line, and bayes not catching them.


ClamAV is probably the correct approach to macro-based malware, unless 
we want to do a MS Office document plugin with something like an eval 
for has_macros().


I haven't looked at the spample doc in detail, but I will (again) plug 
my email sanitizer, which does document macro scanning and might be 
able to catch these:


   http://www.impsec.org/email-tools/procmail-security.html

Some of the approaches there could probably be usefully extracted to 
SA plugins.



There's been discussion on the MIMEDefang list about dealing with word 
macros, and some people have posted good perl snippets as well that you 
can add to your filters if you use it. If you just want to detect the 
presence of macros in any form, writing that in ClamAV's signature 
system would probably be doable, but far more annoying than just a bit 
of code.


Re: Greymail and marketing junk

2016-09-30 Thread Joe Quinn

On 9/30/2016 5:35 AM, Robert Schetterer wrote:

Am 30.09.2016 um 02:28 schrieb Alex:

Hi all,

Has anyone given any thought to special rules or methods designed to
catch greymail? That is, mail that perhaps may be opt-in, but abusive,
like marketing mailing lists or newsletters?

This might include mail with List-Unsubscribe headers, but that's not
necessarily enough to use to block an email.

I've written a handful of rules based on Received headers for mail
servers like 'businesswatchnetwork.com' or 'list-manage.net' etc, but
there's obviously just too many of them and it's time-consuming.

Any ideas for improving this process?

Any thoughts on how the typical marketing email should be scored with bayes?

Perhaps there's a DNSBL or other RBL out there whose purpose is to
identify marketing domains?

Is anyone interested in sharing resources to start such a thing?


from tec side there is not really a difference
between marketing and other mails, dealing with their domains
might never end , but you can always use your own a reject list on smtp
level cause why should you do expensive content filter to known not
wanted domains.

At the end ,at a server with many different users you will see that some
marketing is really wanted by some users, but others like to see it
so best way for this are users own white/blacklists after you filtered
the most bad things global.


Best Regards
MfG Robert Schetterer

I think Alex is envisioning something more like DCC in severity, where 
it's not actually a blacklist but uses the RBL functionality to apply a 
light score penalty.


Re: HTTPS_HTTP_MISMATCH and explanation

2016-09-26 Thread Joe Quinn

On 9/26/2016 8:54 AM, RW wrote:

Informational rules do that, but IIRC __RULES are simply a special
case.

Hmm, you're probably right on that point. I can't find anything in the 
source that behaves that way, but the documentation claims that's how it 
works and I also don't see anything to support being scored 0.001 either.


Re: HTTPS_HTTP_MISMATCH and explanation

2016-09-26 Thread Joe Quinn

On 9/25/2016 9:25 PM, Sean Greenslade wrote:

On Sun, Sep 25, 2016 at 07:57:37PM -0400, Alex wrote:

I think the rule still has a use, perhaps in a meta or something.

I believe (though don't quote me on this) that a zero-weight rule will
still be checked if it's used as part of a metarule.

--Sean
A rule that's weighted exactly zero will never fire. The way __RULES get 
around this is by being scored 0.001.


Re: How to reject mails with special message-id (Debian, Amavis, Spamassassin)

2016-09-20 Thread Joe Quinn

On 9/20/2016 9:46 AM, Thomas Barth wrote:



Am 20.09.2016 um 15:27 schrieb Bowie Bailey:


X-Spam-Status: Yes, score=14.009 tag=2 tag2=6.31 kill=6.31
tests=[HTML_MESSAGE=0.001, MESSAGEID_LOCAL=8,
MIME_HTML_ONLY=1.105,
PYZOR_CHECK=1.985, RCVD_IN_BRBL_LASTEXT=1.644, RDNS_NONE=1.274]
autolearn=no autolearn_force=no


The base SA ruleset is optimized to detect spam with a score of 5.0.  If
you raise that score, you will allow more spam to come through. If you
lower that score, you will see more legitimate messages blocked as
spam.  Make sure you know what you are doing before you change this 
score.




I read that 5.0 is aggressive and suitable for single user setup, 
conservative values are 8.0 or 11.0.


required_score n.nn (default: 5)
https://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Conf.html 




I ve checked most of the mails recognized as spam. The lowest score 
was 8.6x so far.


Here is another mail from ...local. It definitely was spam with zip 
attachment. Common is a sender address with digits.
 -> , quarantine: 
l/spam-lEHVGcheLkyq.gz, Message-ID: 
<20160920202635.6b90ec7...@allfromboats.com.local>, mail_id: 
lEHVGcheLkyq, Hits: 19.118


May be I also should block sender adresses with more than 2 digits in 
the name?
My experience has been that spam scoring gets error-dominated pretty 
rapidly outside the range near 5.0. That is to say, the difference in 
actual spamminess between messages scored 4 and 6 is far more 
predictable and significant than between -1 and 1, or 10 and 12. Even a 
score of 8.0 I would expect to take months of tuning to get right, 
between rescoring rules and RBLs appropriately and then giving the bayes 
thresholds accurate scores on top of that. The furthest I would probably 
go is 4.5 to 6.0. Outside that range, it's easy to run into 
unpredictable "why was this spam blocked and that spam wasn't" scenarios.


Many of the stock published rules are scored by AI, which runs an 
optimization problem to get the most spam on the right side of 5.0 and 
the most ham on the left side. For the purposes of solving that problem, 
the difference between a message scoring 4.8 and 4.9 is the same as the 
difference between 4.0 and 4.9, or -50 and 4.9. Developers smooth out 
the scoring curve by determining what rules the AI gets to score and for 
how much, but that effect is strongest where we can quantify its 
usefulness (near the default threshold).


Bayes is scored with a similar consideration, built around probability.


Re: X-Spam Tagging - Spam Status YESNO Flags - Sometimes not appended...

2016-09-16 Thread Joe Quinn

On 9/16/2016 12:59 PM, li...@rhsoft.net wrote:

...

in case you have postscreen or something else which does proper 
rbl-scoring in front of the content-scanners it's no problem because 
only a small part of spam attempts are mahing it to SA


may depend on the amount of ham which can be also mitigated by 
shortcurcuit trustable senders with large amount of mail


i have seen in the past a lot of junk with some 5-10 MB crap attached, 
completly unrelated images because spammers know that they can bypass 
many spamfilters that way (in case of a large binary it's also no 
problem for cpu ressources, only when they have a wrong text mimetype)
Another strategy sometimes is to truncate the message to that max size 
before scanning, though making sure you get the most meaningful content 
of a message without breaking the MIME format is in general not an easy 
problem.


Re: Tuning recommendations?

2016-09-13 Thread Joe Quinn

On 9/13/2016 1:55 AM, John Hardin wrote:

On Mon, 12 Sep 2016, thomas cameron wrote:


Keep the tips coming, I appreciate learning from you!


Here's another: there's some anecdotal evidence that publishing your 
own SPF record reduces the likelihood you'll be joe-jobbed. I'm not 
sure whether that's still the case, but it did help a few years back.



Well, if the choice is between having an SPF record and not having an 
SPF record, I choose having it every time. ;)


Re: Matching infinite sets

2016-08-22 Thread Joe Quinn

On 8/22/2016 8:54 AM, Michael Orlitzky wrote:

On 08/21/2016 03:22 PM, Damian wrote:

There is no such set B, as it would contain itself.

The empty set contains itself.
That's an easy mistake to make. The empty set is {}, the set that 
contains only the empty set is {{}}. Sets are discrete elements that 
don't get "flattened".


In perl syntactic lists do get flattened though, which leads to some fun 
times. You can do silly things like @concatenated = (@listOne, @listTwo).


Re: Matching infinite sets

2016-08-22 Thread Joe Quinn

On 8/21/2016 5:55 PM, Sidney Markowitz wrote:

Dianne Skoll wrote on 22/08/16 8:56 AM:

And... why can't a set contain itself?


It can't in standard modern set theory (ZFC), through the foundation axioms,
also known as the axiom of regularity
   https://en.wikipedia.org/wiki/Axiom_of_regularity
which is a formulation that allows set theory to avoid Russell's Paradox.
(see also https://en.wikipedia.org/wiki/ZFC)

Just like Euclidean Geometry has the axiom that parallel lines never meet, and
you get various non-euclidean geometries by changing that axiom, there are
non-standard set theories that do not include the axiom of regularity, in
which there can be sets that include themselves.

None of that is relevant to the discussion of Marc Perkel's ideas because he
is talking about sets of tokens from email (or sets of potential tokens?) not
sets that contain sets. And all he needs to do with his infinite sets is be
able to test if a token is in it, which is easy to do since the set is defined
as the complement of a finite set. (I'm not saying this to agree with the
method as good or to argue against it. I'm one of those people he mentions who
understands how Bayesian spam filtering works who has yet to wrap my head
around what he is presenting - For now I'm staying agnostic about it until I
do understand it better).

  Sidney
This is a good summary. As a fun theoretical side-note, ZFC can be 
interpreted as a type theory and then used as a way to reason about the 
behavior of programs. One of its major weaknesses is that it's possible 
to formulate exactly this sort of issue where a set can contain other 
sets of unknown depth. This corresponds to untyped programming languages 
and is almost always resolved by formalizations that correspond to 
adding a type system (as your last paragraph does).


But back to discussing Bayes... ;)


Re: New Install - Tons of Spam Getting Through

2016-08-18 Thread Joe Quinn

On 8/18/2016 2:27 PM, Jerry Malcolm wrote:
I haven't figured out a way to get Thunderbird to allow me to 
copy/paste the headers.  But I did look at all of the headers. There 
are no headers in the email with names like you mentioned. There is 
only the X-Spam-Status header and X-Spam-Flag header that appear to be 
anywhere related to SA.
If you're not seeing a breakdown of the spam test, you should configure 
SA to add it if you can. Run "man Mail::SpamAssassin::Conf" for 
information on how to add that report.


I'm running ISC BIND in my server.  But it only serves my own domains' 
records.  I guess it forwards to my Peer1 host DNS servers to resolve 
anything that is not local.  Is that what you are referring to?  What 
would I do to get around this problem?
Set it to resolve recursively instead of by forwarding. A recursive 
resolver will seek out unknown answers by itself instead of asking an 
upstream resolver that's being shared and rate-limited. There's 
documentation elsewhere that describes how to do this, as it varies by 
what named you are using.


Re: Unsubscribe

2016-08-18 Thread Joe Quinn

On 8/18/2016 10:57 AM, Benjamin E. Nichols wrote:




 Benjamin  E. Nichols

http://www.squidblacklist.org


1-405-397-1360 

Documentation on how to unsubscribe from the list can be found on 
apache.org or in the notification you received when you first subscribed.


Re: Fwd: Re: New domain blacklist options available.

2016-08-18 Thread Joe Quinn

On 8/18/2016 10:50 AM, Benjamin E. Nichols wrote:


Im sorry. I thought this was an intelligent users list, if I had known 
it was loaded with cry babies and bitches I would never joined.



UNSUBSCRIBE ME.


 Benjamin  E. Nichols

http://www.squidblacklist.org


1-405-397-1360 


-- Original message--

*From: *Joe Quinn

*Date: *Thu, Aug 18, 2016 9:15 AM

*To: *users@spamassassin.apache.org 
<mailto:users@spamassassin.apache.org>;


*Cc: *

*Subject:*Re: Fwd: Re: New domain blacklist options available.


On 8/18/2016 10:03 AM, Benny Pedersen wrote:> no point in spamming freee maillists so ?>>  Original Message > Subject: Re: 
New domain blacklist options available.> Date:2016-08-18 15  <tel:2016-08-18%2015>:46> From: "Benjamin E. Nichols"> To: Benny 
Pedersen>>>> Because we dont work for freebonehead.To  <http://bonehead.To>  put it more politely, this is the SpamAssassin 
mailinglist.Ad  <http://list.Ad>vertising is off topic here. Your original post is not of use to SA users nor does it encourage discussion that 
leads back to SA. Apache policy also is that all replies should be on-list and calling other users "bonehead" is against the code of conduct.

Keep replies on-list.


Re: Fwd: Re: New domain blacklist options available.

2016-08-18 Thread Joe Quinn

On 8/18/2016 10:03 AM, Benny Pedersen wrote:

no point in spamming freee maillists so ?

 Original Message 
Subject: Re: New domain blacklist options available.
Date: 2016-08-18 15:46
From: "Benjamin E. Nichols" 
To: Benny Pedersen 



Because we dont work for free bonehead.

To put it more politely, this is the SpamAssassin mailing list.

Advertising is off topic here. Your original post is not of use to SA 
users nor does it encourage discussion that leads back to SA. Apache 
policy also is that all replies should be on-list and calling other 
users "bonehead" is against the code of conduct.


Re: google spamming ?

2016-08-15 Thread Joe Quinn

On 8/15/2016 9:21 AM, Benny Pedersen wrote:

On 2016-08-15 15:16, Joe Quinn wrote:


Have you tried asking on either the rspamd or dnswl mailing lists?


why should i waste my time with it ?

i have reported spam to dnswl


If you reported it already, why are you still asking how?
how to report it to dnswl now since i use rspamd and not spamassassin 
any more 




Re: google spamming ?

2016-08-15 Thread Joe Quinn

On 8/15/2016 8:37 AM, Benny Pedersen wrote:

On 2016-08-15 14:21, Joe Quinn wrote:


This is not the mailing list for rspamd or dnswl. How is SA involved
in this issue?


:(

i give up !


Have you tried asking on either the rspamd or dnswl mailing lists?


Re: google spamming ?

2016-08-15 Thread Joe Quinn

On 8/15/2016 8:01 AM, Benny Pedersen wrote:

X-Spamd-Result: default: False [-10.25 / 15.00]
 WHITELIST_DMARC(-7.00)[google.com]
 WHITELIST_SPF_DKIM(-3.00)[google.com]
 SUSPICIOUS_RECIPS(1.50)[]
 CLAMAV_VIRUS_CLEAN(-2.00)[]
 DMARC_POLICY_ALLOW(-0.25)[google.com]
 MIME_GOOD(-0.10)[multipart/alternative, text/plain]
 FORGED_SENDER_MAILLIST(0.00)[]
 R_SPF_ALLOW(-0.20)[ip6:2607:f8b0:4000::/36]
 RCVD_IN_DNSWL_LOW(0.00)[]
 R_DKIM_ALLOW(-0.20)[google.com]
 HTML_SHORT_LINK_IMG_2(1.00)[]
X-Rspamd-Server: 127.0.0.1
X-Rspamd-Scan-Time: 0.22
X-Rspamd-Queue-ID: 654531BE12E

is what rspamd see them as, but its a spam, i can forward it to rule 
maintainer in spamassassin on request


how to report it to dnswl now since i use rspamd and not spamassassin 
any more




This is not the mailing list for rspamd or dnswl. How is SA involved in 
this issue?


Re: Spoofed Domain

2016-08-10 Thread Joe Quinn
DFS wrote some more about this technique (with code!) on the MD mailing 
list, if you search their archives.


On 8/10/2016 9:40 AM, Ruga wrote:

thank you for teasing us...

Sent from ProtonMail Mobile


On Wed, Aug 10, 2016 at 3:36 PM, Larry Starr 
<'lar...@fullcompass.com'> wrote:


That is what I'm doing here.

Rather than attempting that with SA, I wrote a MimeDefang routine to 
interrogate the "Magic" number of any office document, blocking all 
macro enabled documents, and any document that was renamed so that 
the Magic number does not match the extension ( I don't care if these 
are Macro enabled or not, there is no legitimate reason to rename 
them ).


On Wednesday, August 10, 2016 09:31:21 Joe Quinn wrote:

That's a very good warning indeed! Perhaps blocking .doc files with a 
zip-like file structure is in order? I can't think of a legitimate 
reason to use the old extension on the new file format.


On 8/10/2016 9:28 AM, Larry Starr wrote:

On Tuesday, August 09, 2016 18:01:57 Rob McEwen wrote:

> On 8/9/2016 5:56 PM, Anthony Hoppe wrote:

> > Here are the headers as an example:

> > http://pastebin.com/bnU0npLR

> > This particular email has a macro-enabled Word document attached, 
but I


> > don't want to assume this will be the case every time.

> > Any tips/tricks/suggestions would be greatly appreciated!

>

> I think there is a trend now... towards blocking ALL .docm files (if

> not, there should be!). I think it is EXTREMELY rare for normal human

> beings to send Word documents in that particularly dangerous format.

> Most would be send in .doc or .docx format.

>

> I'm not sure if there is already a SA rule for scoring against .docm

> files attachments? Perhaps someone else could help you with that.

Just a short warning, although word will not open a .docm that is 
renamed to .docx, it will open a .docm renamed to .doc.


I found this the hard way!

It is necessary, if you wish to be safe from macro enabled documents 
to verify that the file is what the attachment's extension claims to be.


--

Larry Starr

Software Engineer

Full Compass Systems

9770 Silicon Prairie Pkwy

Madison, WI 53593-8442

P: 608-831-7330 x1347

F: 608-831-6330

E: lar...@fullcompass.com <mailto:lar...@fullcompass.com>




--

Larry Starr

Software Engineer

Full Compass Systems

9770 Silicon Prairie Pkwy

Madison, WI 53593-8442

P: 608-831-7330 x1347

F: 608-831-6330

E: lar...@fullcompass.com





Re: Spoofed Domain

2016-08-10 Thread Joe Quinn
That's a very good warning indeed! Perhaps blocking .doc files with a 
zip-like file structure is in order? I can't think of a legitimate 
reason to use the old extension on the new file format.


On 8/10/2016 9:28 AM, Larry Starr wrote:


On Tuesday, August 09, 2016 18:01:57 Rob McEwen wrote:

> On 8/9/2016 5:56 PM, Anthony Hoppe wrote:

> > Here are the headers as an example:

> > http://pastebin.com/bnU0npLR

> > This particular email has a macro-enabled Word document attached, 
but I


> > don't want to assume this will be the case every time.

> > Any tips/tricks/suggestions would be greatly appreciated!

>

> I think there is a trend now... towards blocking ALL .docm files (if

> not, there should be!). I think it is EXTREMELY rare for normal human

> beings to send Word documents in that particularly dangerous format.

> Most would be send in .doc or .docx format.

>

> I'm not sure if there is already a SA rule for scoring against .docm

> files attachments? Perhaps someone else could help you with that.

Just a short warning, although word will not open a .docm that is 
renamed to .docx, it will open a .docm renamed to .doc.


I found this the hard way!

It is necessary, if you wish to be safe from macro enabled documents 
to verify that the file is what the attachment's extension claims to be.


--

Larry Starr

Software Engineer

Full Compass Systems

9770 Silicon Prairie Pkwy

Madison, WI 53593-8442

P: 608-831-7330 x1347

F: 608-831-6330

E: lar...@fullcompass.com





Re: USER_IN_WHITELIST

2016-07-07 Thread Joe Quinn

On 7/6/2016 11:42 PM, Bill Cole wrote:

On 6 Jul 2016, at 23:10, lorenzo wrote:

[...]
The output from spamassassin -t -D < In-whitelist.txt gives the 
answer, I believe:


address hefg...@hkjhkjhk.onmicrosoft.com matches whitelist or 
blacklist regexp: ^.*microsoft\.com$


Very sneaky. I think I can handle this one from here.
Thanks again.


Happy to be of help.

For what it's worth: *.onmicrosoft.com domains are part of free trials 
of Office365 and generate almost entirely spam. I suppose one could be 
a regular paying O365 customer and keep that free domain, but no one 
who does that can care much about their email. Spammers have been 
using those domains for years and MS really seems not to care about 
the fact that they've become a de facto indication of spam.
In addition to the above, it's easy for a spammer to register something 
like kajsdhfkjasghdskghlaskfhmicrosoft.com which would also be 
whitelisted for you. I would recommend against using wildcard whitelist 
patterns like that.


Re: Corpus of Spam/Ham headers(Source IP) for research

2016-06-29 Thread Joe Quinn

On 6/29/2016 11:50 AM, Shivram Krishnan wrote:

Hello Antony,

We will be getting headers from our University. The only reason why we 
want other list is that we are tailoring Blacklists for specific 
networks, to see how these blacklists perform. The idea being , your 
network may not be seeing the same attack vectors as what the USC 
network sees.



Also getting the IP's in anonymized last octet would also help , as we 
are creating Blacklists in terms of Prefixes.


You should look at what masscheck does. Instead of uploading messages to 
get tested, masscheckers run the rules against their corpus locally and 
upload the match set.


Provide a mechanism for people to generate their own results, which they 
can upload with absolutely no identifying information.


Re: Catching well directed spear phishing messages

2016-06-29 Thread Joe Quinn

On 6/29/2016 11:12 AM, Dianne Skoll wrote:

On Wed, 29 Jun 2016 15:04:04 +
David Jones  wrote:

If everyone (really Microsoft) had some sense, they will start
showing the full display name with the email address to help users
see the incorrect domain and possibly help users notice the wrong
address.  It's only going to get worse.

Yep.
Especially after going through all the bother of extending SPF to test 
against that information to begin with.


Re: How SA reactes to a bunch of garbage characters

2016-06-14 Thread Joe Quinn

On 6/14/2016 8:33 AM, Matus UHLAR - fantomas wrote:
that is just what I would like to know: If OCR produces results good 
enough

for BAYES and other rules.

I don't think there's difference between bayes and other rules.
It's also possible that BAYES would have better results with misread
characters than other rules.
I've dealt with OCR in the past, and have always had to go back 
afterwards and manually proofread the results. I expect the impact on 
Bayes would be a massively increased dictionary of rare words that 
result from poor "keming" in the image. Some PDFs are written in 
extractable text instead of images, but those tend to use 
fractional-width spaces for kerning so it's not always easy to figure 
out what's a real word there either.


That said, Google seems to use OCR on images in their filtering (quoth 
Wikipedia), so maybe it works when you have a sufficiently enormous data 
set that the OCR glitches are no longer rare and a decent inference can 
be made from them.


Re: SPF should always hit?

2016-06-09 Thread Joe Quinn

On 6/9/2016 11:23 AM, Robert Fitzpatrick wrote:
Excuse me if this is too lame a question, but I have the SPF plugin 
enabled and it hits a lot. Should SPF_ something hit on every message 
if the domain has an SPF record in DNS?


Furthermore, a message found as Google phishing did not get a hit on a 
email address where the domain has SPF setup. Not sure if it would 
fail anyway if the envelope from is the culprit?



In a perfect world, every message you scan will hit one of the following:
SPF_HELO_NONE
SPF_HELO_NEUTRAL
SPF_HELO_PASS
SPF_HELO_FAIL
SPF_HELO_SOFTFAIL
T_SPF_HELO_PERMERROR
T_SPF_HELO_TEMPERROR

And additionally one of the following:
SPF_NONE
SPF_NEUTRAL
SPF_PASS
SPF_FAIL
SPF_SOFTFAIL
T_SPF_PERMERROR
T_SPF_TEMPERROR

In practice, there's almost certainly a few edge cases where messages 
can avoid getting one in either category. For purposes of writing your 
own metas against these, the rules that matter most for measuring 
spamminess are the none, pass, and fail/softfail results. The rest are 
for total coverage of the results that an SPF query can yield, for 
debugging and documentation purposes.


Also, none of these will hit at all if you disable network tests.


Re: Where to find DETAIL for spamassassin default RULES

2016-06-09 Thread Joe Quinn

On 6/9/2016 7:55 AM, jimimaseye wrote:

Once upon a time the include rules for spamassassin was published in its wiki
(example here: http://spamassassin.apache.org/tests_3_3_x.html) which in
turn gave a link to an 'explanation' detail of the individual rules.

However, as you know, these wiki ages are no longer updated due to "rules
being updated nightly".  And googling an individual rule does only give
something useful as long as it appeared in the old 3.3 wiki (like in the
link above). So where does one find details of new rules?

For those of use that have no idea on the behind-the-scenes workings of
spamassassin (including the development contributions, scoring evaluations
etc), could someone give me a starter please to where I can start to look or
find a page giving detail similar to the above in order I can then lookup.
(I assume that every rule has some form of explanation to it before it gets
committed and included).  How do I find such detail (in a readable, end-user
understandable form)?  (Links to development, discussion, commits or
whatever are fine just as long as ultimately it ends up giving the rule
detail).

(Currently it seems I am just having 'to trust' whatever scores are given to
the rules, and that the rules are pertinent to every system.  And without an
explanation of the rules, it seems a little strange that we, as admins, are
allowed to then tailor the scoring of such rules (if we wish to) even though
we have no idea what the rules are in the first place).

TIA



--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Where-to-find-DETAIL-for-spamassassin-default-RULES-tp121218.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
I have a bookmark in Firefox that points to 
http://ruleqa.spamassassin.org/?rule=%s==Change which is the 
status page for the nightly rule updates and is likely what you are 
looking for.


I give it a keyword too, so I can type "ruleqa RULENAME" and it will 
replace the "%s" with whatever I type.


Re: Email with attachment caused 100% CPU usage.

2016-06-08 Thread Joe Quinn

On 6/8/2016 1:20 PM, John Hardin wrote:

On Wed, 8 Jun 2016, Mark London wrote:

Hi - We received an email with several large postscript attachments,  
and the content type was "text/plain".   This caused our spamassassin 
server to use up 100% CPU, parsing the attachments as text.   I 
temporarily disabled spam scanning to allow the message to go 
through.   How can I prevent this in the future?   I know about the 
time limit feature, but this doesn't prevent the server from running 
100% of the time, before the time limit is reached. Any suggestions? 
Thanks. - Mark


Content-Transfer-Encoding: base64
Content-Type: text/plain;
name=OTBW_3D_256_ngtot100_de03_coll_dissip_1248.ps
Content-Disposition: attachment;
 filename=OTBW_3D_256_ngtot100_de03_coll_dissip_1248.ps


Ouch. Please file a bug for that.

Do you have something that could catch text/plain + *.ps before SA get 
handed the message (e.g. a regex milter or other test)?


Before you file a bug, do you have a size limit set? How large is the 
attachment?


Re: SA 3.4.1 on FC22/sendmail with a .procmailrc not triiggering spamc

2016-06-08 Thread Joe Quinn

On 6/8/2016 12:39 PM, Kris Deugau wrote:

kud...@netzero.com wrote:

We're running SA 3.4.1 with sendmail on Fedora Core 22. Every users has a 
.procmailrc upon creation of the user but we have some legacy users being 
inundated. If I just create a /etc/procmailrc will SA look at that first?

Usually.  However, it'll *also* be called before your non-legacy users'
individual .procmailrc files.

I would just either copy the standard file to your legacy users, or a
minimal file with just the call to SA.


Does anyone have an example of a 2016-friendly local.cf file?

I'm not sure what sort of advice you're looking for here, IME SA has
been "pretty good" out of the box for the last few releases.

About the only thing I'd recommend is if you intend to use autolearning
with Bayes, to drop bayes_auto_learn_threshold_nonspam to at least -0.1;
  the default 0.1 has a tendency to mislearn low-scoring spam as ham.
(At one point the default was 0.7, but my experience was that any
positive value tended to get spam autolearned as ham.)

-kgd
Usually you don't want to be autolearning at all, and only train with 
messages that have been reviewed by a human. It's very easy for a Bayes 
DB to spiral out of control after even just one or two wrong results.


Re: SPF_TEMPERROR now firing

2016-06-06 Thread Joe Quinn

On 6/5/2016 3:38 AM, Chalmers wrote:

SPF_TEMPERROR now firing now scoring 1. Good.
As I am still learning I now know something I didn't previously.
Interesting responses here.
It's worth noting that the rule may have a good S/O for you but it's 
still not a good idea to score it. Those rules only exist to get 
complete coverage of SPF query results, for diagnosing why a message 
doesn't SPF fail/pass. I would recommend leaving those scores as-is, and 
if there is a pattern to be learned from them use a meta that combines 
it with a true spam indicator.


Re: Bayes filter marking everything as ham

2016-06-01 Thread Joe Quinn

On 6/1/2016 3:06 AM, Reindl Harald wrote:



Am 01.06.2016 um 02:38 schrieb David Jones:

From: Reindl Harald 
Sent: Tuesday, May 31, 2016 6:27 PM
To: users@spamassassin.apache.org
Subject: Re: Bayes filter marking everything as ham



Am 31.05.2016 um 23:58 schrieb Peter Carlson:

May 30 09:04:53 www amavis[16577]: (16577-03) Passed CLEAN
{RelayedInbound},  Tests:
[BAYES_00=-1.9,RCVD_IN_MSPIKE_H2=-0.001,SPF_PASS=-0.001,URIBL_BLOCKED=0.001],
autolearn=ham autolearn_force=no, autolearnscore=-0.001, 3992 ms


https://wiki.apache.org/spamassassin/ImproveAccuracy


the next one with amavis and URIBL_BLOCKED
(http://uribl.com/refused.shtml) - i get tired of aksing for help hile
not doing basic homework



Amavias != pure SA
URIBL_BLOCKED == read some basics


Too bad we couldn't make SA do something very annoying and
more obvious when the URIBL_BLOCKED rule was hit.  Any ideas?


write 1000 times " YOUR SETUP IS CRIPPLED 
http://uribl.com/refused.shtml " in the rpeort header and 
every 5 seconds into the maillog so that the biggest fool can't ignore it



Perhaps, score URIBL_BLOCKED -1000?


Re: Accidental Spam Forward

2016-05-31 Thread Joe Quinn

On 5/31/2016 12:06 PM, Anthony Hoppe wrote:

All,

I accidentally forwarded some spam to this list.  Autocomplete got the 
best of me and I chose "spamassassin" instead of "spamcop" in the "TO" 
field of the message.  I haven't received the message myself (not sure 
if I will), but wanted to apologize in case any of you got it.


Happy, uh, Tuesday? :-D

Thanks,
Anthony
It likely was caught by the list's filters, as I have not seen it 
either. Just as a warning to everyone else as I have done it recently as 
well, certain clients like Thunderbird have this nasty habit of 
reordering the tab-complete between (automatic background) updates. It's 
very easy to do this.


Re: Reporting gmail spam to Google

2016-05-18 Thread Joe Quinn

On 5/18/2016 11:10 AM, Alarig Le Lay wrote:

On Thu May 19 00:00:31 2016, Byung-Hee HWANG (황병희) wrote:

As far as i know, they are doing those best to reduce spam by DMARC.

DMARC is used to prevent incomming spam, not outgoing.

Well to be more specific, DMARC allows forgeries to be aggressively 
rejected. Doesn't help a bit when your users are sending spam.


Re: understanding HELO_DYNAMIC_IPADDR

2016-05-13 Thread Joe Quinn
SA uses IP-in-name as a machine-decidable definition of a dynamic IP, 
since you can't really automate it otherwise. This heuristic holds in 
the vast majority of cases, and is effective against a huge class of 
spam that comes from public ISPs who don't block port 25.


An ISP's customers are generally going to have hosts like 
ipXXX-XXX-XXX-XXX.city.region.isp.net, and the name includes their IP 
because simply being an IP address is that host's purpose. That same 
ISP's mail servers are going to have hostnames like mail-15.isp.net. 
It's more specific because the list of mail servers is far smaller than 
the list of IPs, and this is the 15th of them.


The solution is to give your mail servers better hostnames that clue 
into the narrower scope of their purpose.


On 5/13/2016 12:42 PM, Robert Boyl wrote:

Thanks a lot for your answer, sorry for confusion.

But why add such a high score of 3,24 just before the host that sent 
my server mail is webmail-201.76.63.163.ig.com.br 
 ?


Its considered a dynamic IP? It isnt, its IGs server sending mail to 
our server.


Can I ask Spamassassin folks to improve this?

Thanks

2016-05-01 11:06 GMT-03:00 RW >:


On Sun, 1 May 2016 10:20:09 -0300
Robert Boyl wrote:

> Hi, everyone
>
> Ive seen some discussion in Spamassassin's bugzilla about this
> HELO_DYNAMIC_IPADDR rule, some unanswered over years.
>
> It says in description: # (require an alpha first, as legit
> HELO'ing-as-IP-address is hit otherwise)
>
> Is it talking about the host that first appears, that sent the email
> authenticated to his ISP or the host/ISP that delivers to our
server?

The latter.

> This is the host that delivered mail to my ISP:
>
> Received: from webmail-201.76.63.163.ig.com.br
 (
> webmail-201.76.63.163.ig.com.br
 [201.76.63.163
]) by mx3.myisp.com  with
> ESMTP id rDrGtcYe1PdHDBfh; Wed, 06 Apr 2016 09:02:10 -0400 (EDT)
> X-Barracuda-Envelope-From: some-sen...@ig.com.br

>

> I dont understand, since IMHO it shouldnt matter the host that sent
> mail to its ISP, if its dynamic or not. IMHO what should matter is
> the ISP sending mail to our ISP and in that case, the host does NOT
> start with a number.

It not about whether it start with number.  The comment you quoted is
"require an alpha first", and alpha means a letter.


webmail-201.76.63.163.ig.com.br
 starts with a letter and
contains an IP
address.






Re: Is this spam?

2016-04-19 Thread Joe Quinn

On 4/18/2016 10:52 PM, Alex wrote:

Hi,


I'm curious as to whether you think this email is spam?

http://pastebin.com/bFVSgwnR

It looks like your typical unsolicited "Buyers Guide" junk, but I've
heard of actonsoftware before, and this email appears to have a
legitimate unsubscribe link. It also doesn't appear on any blacklists.
Is it opt-in?

A few users have complained about it, and I'm now seeing there are a
couple hundred of them being received. Unsubscribing requires
confirmation of the email address, which seems a little suspect.

Thanks for any ideas.
Alex

They're a mass marketer, so their most visible emails are from the
misbehaving segment of their customer base. We have them on a lightly
scoring DNS marketers list. Some marketers are better than others at keeping
hammy company, and this domain probably deserves a slight positive score but
definitely not a blacklist. I suggest learning each campaign they send and
moving on.

Do you mean KAM_FROM_URIBL_PCCC? If so, it doesn't appear to be on it,
and at this point techproductupdate.com should be all over it.

Thanks,
Alex
I mean KAM_MARKETINGBL_PCCC, which looks for 127.0.0.32 instead of 
127.0.0.4. If you don't have that rule I suggest updating KAM.cf, as 
it's been rather effective for moderating the flood of ad-spam coming 
out of their servers.


Re: Is this spam?

2016-04-18 Thread Joe Quinn

On 4/18/2016 1:23 PM, Alex wrote:

Hi all,

I'm curious as to whether you think this email is spam?

http://pastebin.com/bFVSgwnR

It looks like your typical unsolicited "Buyers Guide" junk, but I've
heard of actonsoftware before, and this email appears to have a
legitimate unsubscribe link. It also doesn't appear on any blacklists.
Is it opt-in?

A few users have complained about it, and I'm now seeing there are a
couple hundred of them being received. Unsubscribing requires
confirmation of the email address, which seems a little suspect.

Thanks for any ideas.
Alex
They're a mass marketer, so their most visible emails are from the 
misbehaving segment of their customer base. We have them on a lightly 
scoring DNS marketers list. Some marketers are better than others at 
keeping hammy company, and this domain probably deserves a slight 
positive score but definitely not a blacklist. I suggest learning each 
campaign they send and moving on.


Re: How does SpamAssassin processing languages other than English

2016-04-12 Thread Joe Quinn

On 4/12/2016 1:16 PM, Reindl Harald wrote:



Am 12.04.2016 um 18:44 schrieb Yu Qian:

SpamAssassin used Bayes as classier, this is typical and efficient for
English. But how does it processing languages like Asian language?

Can anyone introduce that or anyone can show the code where SpamAssassin
do that?


bayes is by definition language agnostic

*you train* bayes with samples of ham and spam (at least a few hundret 
of both) and the tokenizer splits the messages in parts and creates a 
database which words appear how often in spam and ham (simplified 
explained)
While that's true, tokenizing languages that don't delimit words by 
whitespace is extremely difficult. For languages like Chinese, it can 
only be done by carrying around a language dictionary.


Yu Qian, if you're up to reading code you may want to look at 
lib/Mail/SpamAssassin/Bayes.pm and 
lib/Mail/SpamAssassin/Plugin/Bayes.pm. I'm not familiar enough with the 
Bayes side of SA to say for sure, but you might be able to configure it 
or write a plugin that can do the tokenization you desire. You may also 
be able to reuse existing research from http://nlp.stanford.edu/ and such.


Re: Anyone else just blocking the ".top" TLD?

2016-03-28 Thread Joe Quinn

On 3/28/2016 3:02 PM, Vincent Fox wrote:
From:whoswho REJECT 
This is the one that really annoys me. KAM.cf has a 5.0-scored rule 
named exactly that, and there's an entire Wikipedia article on the 
subject! https://en.wikipedia.org/wiki/Who's_Who_scam. It really makes 
ICANN look like they do no research on the TLDs they accept.


Re: Regex problem

2016-03-28 Thread Joe Quinn

On 3/28/2016 11:59 AM, RW wrote:

On Mon, 28 Mar 2016 09:58:23 -0400
Joe Quinn wrote:


On 3/28/2016 9:55 AM, RW wrote:

Subject =~ /\$\b/

There's no word boundary between the $ and the ' ' because they're
both in \W.

Thanks, I'd forgotten what the definition of a boundary was.


I presume that, until spamassassin gets full unicode support,
non-ascii characters are seen as one or more \W characters.
So:

"  Ångström  "

would have boundaries at the points marked by "|"

   " Å|ngstr|ö|m| "

split into several words and without a boundary before the Å.
Possibly. Perl's documentation indicates it would work that way if /a is 
in effect. Otherwise 
(http://perldoc.perl.org/perlrecharclass.html#Word-characters):


For code points above 255 ...
\w matches the same as \p{Word} matches in this range. That is, it 
matches Thai letters, Greek letters, etc. This includes connector 
punctuation (like the underscore) which connect two words together, or 
diacritics, such as a COMBINING TILDE and the modifier letters, which 
are generally used to add auxiliary markings to letters.

For code points below 256 ...
if locale rules are in effect ...
\w matches the platform's native underscore character plus 
whatever the locale considers to be alphanumeric.

if Unicode rules are in effect ...
\w matches exactly what \p{Word} matches.
otherwise ...
\w matches [a-zA-Z0-9_].

It looks like the "Word" property might be a Perl extension to unicode 
(or at least it's very hard to google), so that's as far as my digging 
can go into the precise semantics of \w.


Re: Regex problem

2016-03-28 Thread Joe Quinn

On 3/28/2016 9:55 AM, RW wrote:

Am I missing something?

With the test message

   printf  'Subject:  x 555$ x\n\n '


I get a match on "$ " and "$" with

   Subject =~ /\$ /
   Subject =~ /\$/


but no match with

   Subject =~ /\$\b/
There's no word boundary between the $ and the ' ' because they're both 
in \W.


Re: Continuing - Re: How do I actually add these descriptions then...

2016-03-07 Thread Joe Quinn

On 3/7/2016 1:05 PM, RW wrote:

On Mon, 7 Mar 2016 15:12:25 +
Robert Chalmers wrote:


I?ve added descriptions, grabbing the actual RULE name with awk, and
creating the list that way.

{
a=$12;
 print "describe " a " Spam check applied.";
}


The result is like this.
describe LONG_TERM_PRICE Spam check applied.
describe MULTIPART_ALT_NON_TEXT Spam check applied.
describe TVD_IP_OCT Spam check applied.
describe HK_NAME_DR Spam check applied.

What's the benefit of this?

If someone gets around to creating descriptions for these rules you
wont see them.

Agreed. The right way to silence those warnings would be to learn what 
each rule does, write an accurate description, then commit it or submit 
a patch on the bug tracker.


Re: My new method for blocking spam - REVEALED!

2016-01-20 Thread Joe Quinn

On 1/20/2016 3:20 PM, Dianne Skoll wrote:

On Wed, 20 Jan 2016 12:11:02 -0800
Marc Perkel  wrote:


Again - it's not about matching as Bayes does. It's about not
matching.

It's not about not matching.  It's about a preprocessing step that
discards tokens that don't have extreme probabilities.

I think your method works as well as it does because you're using up
to four-word phrases as tokens.  The rest of the method is nonsense, but
the four-word phrase tokens are the magic ingredient; they'd make Bayes work
awesomely also.

Regards,

Dianne.
Differential analysis time? Add comparisons to 4-word Bayes and 4-word 
Bayes on the same subset of the message as this new method.


Re: DNS lookups - bug with recursive lookups, or shoddy bind config?

2016-01-04 Thread Joe Quinn

On 1/4/2016 3:39 PM, Quanah Gibson-Mount wrote:
--On Monday, January 04, 2016 8:28 PM + Chris J 
 wrote:



Before I raise this on Bugzilla, I just want to run this past people as
I'm quite happy that I've failed to configure something, but can't see
what.

In short, RBL blacklists haven't been working and I've finally, with
tcpdump, traced it to SpamAssassin not requesting recursive queries.

The setup is:
Linux - Debian Jessie 8.2
Bind - 9.9.5-9+deb8u3-Debian
SpamAssassin - installed from CPAN, 3.4.1
Perl - 5.20.2
Net::DNS - 1.01


If you're using Net::DNS 1.01 or later, you must patch SA.  There is 
an entire thread dedicated to this issue.






7265 is only required for 1.03 (not necessary for 1.01, 1.02, or 1.04).

--Quanah

--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.

Zimbra ::  the leader in open source messaging and collaboration
By the way, have you considered subscribing to the dev@ list and 
contributing to SA? You ran through this issue pretty much perfectly, 
other than the bad luck with our Bugzilla's results on Google.


Re: AWL ?

2015-12-23 Thread Joe Quinn

On 12/23/2015 10:53 AM, Olivier CALVANO wrote:

Hi

i have installed a new server on Centos with Postfix/Amavisd and 
SpamAssassin


my problems, 90% of mail are tagged spam:

X-Spam-Flag: YES
X-Spam-Score: 22.876
X-Spam-Level: **
X-Spam-Status: Yes, score=22.876 required=5.0 tests=[AWL=20.375,
FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-0.001,
CLASSIC_SUJET_GENERAL_1=2.5] autolearn=no autolearn_force=no


this mail is a very simple mail.

What is AWL ? why score is very big ?

thanks
Olivier

AWL is a poorly-named and deprecated module that does score averaging. 
It's been replaced by TxRep which does a better job at keeping scores 
sane. What likely happened is the sender of the email has a very bad 
reputation carried over from previous emails to you being marked as 
spam. Clearing your AWL database will fix it as a short-term measure.


Re: Google redirects

2015-12-18 Thread Joe Quinn

On 12/18/2015 11:32 AM, John Hardin wrote:

On Fri, 18 Dec 2015, Mark Martinec wrote:


On 2015-12-18 16:29, Axb wrote:

 On 12/18/2015 04:17 PM, Mark Martinec wrote:
>  On 2015-12-17 22:41, Axb wrote:
> >  could you make a version using redirector_pattern so the 
redirected

> >  target can be looked up via URIBL plugin?
> >  Isn't this already the case? Redirect targets are added
>  to a list of URIs and are subject to same rules as
>  directly collected URIs.
>
 I suggested converting the rawbody rule John was working on into a
 redirector_pattern


Note that the following rule as posted by John:

 uri __GOOG_MALWARE_DNLD 
m;^https?://[^/]*\.google\.com/[^?]*url\?.*[\?&]download=1;i


would not currently work as a redirector_pattern due to the problem
I posted in my today's reply (Re: redirector_pattern question);
i.e. where the redirector target contains "http:", followed
by other URI arguments (like "=1" here).


Right, and I would take that into account when composing the 
redirector_pattern. That extra bit is there to avoid treating *all* 
google redirects as malware downloads.


Question: has anyone ever seen a *legit* (non-spam, non-phishing, 
non-malware) google redirect like that in an email? Maybe this rule is 
too restrictive and we should be suspicious of *all* google redirects?


I do it occasionally, if I am sending a link to someone and I 
right-click -> "copy link location" on the search results. I'd be 
suspicious of those sorts of links, but not too suspicious.


Re: Google redirects

2015-12-17 Thread Joe Quinn

On 12/17/2015 1:34 PM, Alex wrote:

Hi,

Can someone explain why spamassassin is allowing apparent google
redirects? Cryptolocker :-( This one's blocked now.

https://www.google.com/url?q=http://www.mediafire.com/download/izdqjzml6dz68t3/1Z4566W50325036.ups.doc_.wsf08137322366IlRiZxJtpLvPq78WySF33Y=D=AFQjCNG6PWyLVrbpnrMhn12glB2txWOUgA;
style="color: rgb(89, 143, 222);
outline: 0px;" target="_blank">1Z4566W50378875...

# href="https://www.google.com/url?q=http://www.mediafire.com/download/izdqjzml6
rawbody GOOG_VIEW1
m;https?://www\.google\.com/url\?(q=http(s)?|sa=t\\;url=http);
describeGOOG_VIEW1Using google url
score   GOOG_VIEW16.0

Ideas for improving the rule or making it more flexible would be appreciated.

http://pastebin.com/MY7mZkjs
It goes without saying, but if your mail client automatically highlights 
URLs be very careful about clicking in this email.


I think your rawbody rule should probably be a URI rule, especially 
since it's looking for a protocol anyway. Then you can probably get a 
bit more aggressive in using (.*) and such to handle URLs like 
"https://www.google.com/url?asdhlf=laskjdhflkjasdhf=laskjfhlasdf&...={payload};. 
I'm not sure about scoring it 6.0 for myself, but it's fine if it works 
for you. I'd also be interested to see what RuleQA thinks of it.


Re: More on T_SPF_PERMERROR

2015-12-15 Thread Joe Quinn

On 12/15/2015 7:19 AM, Martin Gregorie wrote:

On Mon, 2015-12-14 at 21:42 -0500, Alex wrote:

Many times the domain actually has something wrong with SPF, but
other times openspf.org/why and kittermans say there's nothing wrong
with the domain.

Other domains that fail, such as gmail.com and wellsfargo.com, report
softfail on kitterman when testing due to a redirect,


For wellsfargo dmarcian.com/spf-survey follows the redirect link and
reports errors in that TXT (too many IPs and partial IPs). If
wellsfargo can make this mistake, so can others.

Could T_SPF_HELO_TEMPERROR be trying to say "I can't parse this
particular SPF TXT record"?


Martin
Syntax errors are permanent errors, as they will never go away on their 
own. I highly recommend reading 
http://www.rfc-editor.org/rfc/rfc7208.txt - it's a fairly easy read and 
important to understand well.


4.6.  Record Evaluation

   The check_host() function parses and interprets the SPF record to
   find a result for the current test.  The syntax of the record is
   validated first, and if there are any syntax errors anywhere in the
   record, check_host() returns immediately with the result "permerror",
   without further interpretation or evaluation.


Re: More on T_SPF_PERMERROR

2015-12-14 Thread Joe Quinn

On 12/14/2015 1:47 PM, Alex wrote:

Hi,

I'm seeing quite a few T_SPF_PERMERROR entries in my logs and not sure
if it's a problem, or a misunderstanding, or perhaps I've just started
to notice it more often since I started looking for it...

I'm seeing T_SPF_PERMERROR entries in my logs for sites with valid and
working SPF records, like expedia.com, but when I test the domains
manually, they're okay.

Among the reasons I can think of that would cause this are that
they're currently updating their SPF record, of course. I also suppose
a problem with DNS on my side or theirs could be an issue (but
expedia?).

I was wondering if there were any other conditions under which I would
have a T_SPF_PERMERROR in my logs but querying just a few hours later
produces no error?

What is the reason the rule is T_?

I was also very surprised to see just how many domains had broken SPF
records. walmart, cintas, sodexo, wellsfargo, salesforce... all
currently broken.

Does anyone have a shell script of some kind that takes a domain as an
argument to determine whether their SPF record is operating normally?
I can definitely see a nagios plugin useful for this... Or a python or
perl script.

Thanks,
Alex
I like http://www.openspf.org/Why for looking at SPF issues by hand. 
It's a test rule at the moment because there's no clear reason for it to 
have a significant score, and it's not entirely clear if it's a useful 
indicator for meta rules either. I mainly wrote it just as a diagnostic 
"yes, SA did check SPF and this is why it didn't FAIL or PASS".


Given the issue is intermittent, looking at DNS is a good place to 
start. You might also try enabling debug output. Expedia's SPF looks 
correct on my end as well.


Re: Trying to understand how bayes works.

2015-12-11 Thread Joe Quinn

On 12/11/2015 1:24 PM, Reindl Harald wrote:



Am 11.12.2015 um 19:12 schrieb Axb:

On 12/11/2015 06:51 PM, Reindl Harald wrote:
well, how many of you trained chistmas spam this year while my bayes 
did

know it from last year?


I like my Bayes fresh like bread out of the oven, new guitar strings and
clean sheets.


well, i like my bayes catch spam at every point in time without repeat 
to slip things through once already caught - tell me one reason why i 
should let phishing pass through to customers which was already detected


96% of all milter-rejected mails got 3.5-7.5 points from bayes while 
at the same time 77% of all scanned mail got -3.5 points - in other 
words most ham has BAYES_00 most spam hast BAYES_80-BAYES_999 - that's 
what the bayes is supposed to do



Last years turkey doesn't appeal to me.


and what is last years spam making it now again through until relearn?

spammers would have so much more work if they didn't know that in a 
few months they can re-use their templates after a large enough break, 
as a spammer i would even schedule the usage of them automated


Agreed, and adding that we do see a large percentage of repeat seasonal 
spam templates. You need at least some of your data to carry over for at 
least a year, maybe two in order to stay effective.


Re: Very strange SA result!

2015-12-03 Thread Joe Quinn

On 12/3/2015 9:23 AM, Jari Fredriksson wrote:

On 3.12.2015 16.11, Kevin A. McGrail wrote:

You are using KAM.cf which isn't a project ruleset.

Please report the issue and a spample at
https://raptor.pccc.com/raptor.cgim?template=report_problem

We can likely look at it quickly and adjust.  However, the fact that SPF
failed makes me lean towards the fact that the rule fired correctly...

Regards,
KAM



There seems to be something in the spf detection. SPF claims that 
paypal is not allowed (by their sfp record) to send mail via my email 
relay. That relay IS in my trusted_networks. What am I missing now?


br. jarif

Probably this bug, which we are still working out a good solution for:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7182

The SPF RFC has a "MUST" constraint on 10 lookups per SPF check, which 
Paypal has broken before. The reasoning given is resistance to denial of 
service attacks via DNS traffic, which makes it a tricky fix. We'll 
discuss the KAM.cf issue privately, and bring it back on-list in dev@ if 
it comes back to new information on this issue.


Re: question re/ RDNS_NONE

2015-11-25 Thread Joe Quinn

On 11/25/2015 6:07 AM, Edda wrote:

Ouch, sorry, i tested it on 3.3.1 and "re-typed" that line in 3.4.1

Does the patch work for you?
Since we're currently developing in both 3.4.2 and 4.0 and now you have 
bumped into the same problem, I might as well share this:


repatch() {
  (cd $1 && svn diff) | (cd $2 && patch -p0)
}

Put that in your .bashrc - usage is "repatch ~/sa/trunk ~/sa/3.4.2" or 
whatever your directory structure is.


Re: Malware URI rule

2015-11-09 Thread Joe Quinn

On 11/9/2015 12:15 PM, Amir Caspi wrote:

On Nov 9, 2015, at 10:09 AM, John Hardin  wrote:

score  URI_MALWARE_CWALL6.000

Is your threshold higher than 5? Otherwise this is a poison pill for a 
"potential" hit.

--- Amir
thumbed via iPhone


There's a lot of things that can bring that down, like TxRep or white RBLs.

Definitely check the rule with a lower score first, but with the right 
S/O I would consider scoring it that high. If your users don't have 
backups, it might be worth some light FP potential to avoid giving the 
"you have to pay $600 or lose everything" talk.


Re: New SA install, configuring for retraining on false positives

2015-11-05 Thread Joe Quinn

On 11/5/2015 1:44 PM, Reindl Harald wrote:



Am 05.11.2015 um 19:24 schrieb Bill Cole:

On 5 Nov 2015, at 6:52, David Mehler wrote:

or SA as a milter called directly from my
MTA.


There is no such thing: SA is not a milter


tell that our spamass-milter setup running for more than a year now 
rejecting 99% of junk at MTA level (the piece making it through 
postscreen, spf, ptr/helo, greylisting and what not) with nearly zero 
false positives


Just to make sure it's clear, because this is an easy bit of authorship 
to be confused by:
David is correct in that spamass-milter isn't spamassassin. It's 
credited on their manpage to Georg C. F. Greve  and Dan 
Nelson  and GPLv2 licensed.


It falls under the snipped bit of the previous email, milters that can 
support SA. There's an overlap in concerns that makes it often on-topic 
here (much like with MD / Postfix / other elements of the mail stack), 
but we aren't the developers.


Re: New rules..

2015-11-02 Thread Joe Quinn

On 11/2/2015 12:00 PM, Richard Mealing wrote:


Hi there,

Would this be the best list to talk about new rules for spamassassin?

I'm new here..

Thanks,

Rich

This would be an excellent place, yes. The more technical discussion for 
things like bugs in eval rules will generally happen in dev@ but there 
can be some overlap.


Re: How to get rid of this spam? Spam assassin does not catch it

2015-11-02 Thread Joe Quinn

On 11/2/2015 11:25 AM, Reindl Harald wrote:



Am 02.11.2015 um 17:02 schrieb Benny Pedersen:

and why did he change spamd login permisson when using sa-learn :(


because *as he explained* the service user has /sbin/nologin as shell 
and so "su - username" won't work until you change that or as i 
explained create a user with a shell training the correct site wide bayes



use spamc, not spamd if spamc is not used

on does not need to login to apache for see a homepage, same goes for
spamd, it is using port 783 so it need to be started as root, but the
real work will happend as the user calling spamc



I would at least consider sudo or 'su -c' as well.


Re: SPF code change?

2015-10-16 Thread Joe Quinn

On 10/16/2015 10:18 AM, Benny Pedersen wrote:

Reindl Harald skrev den 2015-10-16 15:57:


and why the hell should a SPF test for mails coming with envelopes
from yahoo, google, hotmail care about *that* entry for *your* domain?


eh what ?

Slow your roll, guys.

Nick, can you give us a sample message and its debug output with -D?


Re: SpamAssassin Rules Regarding Abuse of New Top Level Domains

2015-10-14 Thread Joe Quinn

On 10/14/2015 12:00 PM, Bill Cole wrote:
Describe, in detail, the new SA technology which fights abuse of new 
TLDs.


Prior to v3.4.1, the mechanism for detecting and parsing hostnames to 
identify body URIs used an embedded array of hardcoded domains in 
Mail/SpamAssassin/Util/RegistrarBoundaries.pm. This resulted in many 
URIs in the new TLDs not being detected and filtered as URIs. In 
v3.4.1 there is the new Mail/SpamAssassin/RegistryBoundaries.pm and 
the file 20_aux_tlds.cf in the canonical rules set which now contains 
a comprehensive maintained list of TLDs and other registry-managed 
domains. 

A mention of why the list is even needed:

Most URLs are obvious and of the form 
"http://sub.domain.tld/blahblahblah; and easy to detect. However, mail 
clients will also accept things like "sub.domain.tld/blahblahblah" 
without the protocol. We want to detect as many URLs as possible and 
ideally zero non-URLs, because each can turn into multiple DNS lookups. 
The list of TLDs gives us a way to eliminate obvious non-URLs, but it 
was designed when the worst we had to deal with was 100-ish ccTLDs that 
rarely changed. Nowadays it's easy for spammers to buy up garbage 
domains like example.bacon / example.click / example.industries, making 
an up to date list of TLDs much more important.


Re: Investigating facebook spam

2015-10-06 Thread Joe Quinn

On 10/6/2015 1:38 PM, Alex wrote:

Hi,

I've received a handful of messages that appear to be facebook
notifications, but fail SPF. They otherwise look completely legit -
links to profiles, only URLs to facebook.com and CDN caching sites,
and even appears to have been routed through facebook's outgoing mail.

All of that could be faked, but it would mean the payload is in the
actual facebook profiles themselves. Has anyone else found this to be
the case?

http://pastebin.com/jE8G5LXJ

Thanks,
Alex
I would say that because it passes DKIM with a signature from 
facebookmail.com, it's likely legitimate and they just suck at SPF 
(wouldn't be the first time a multi-billion dollar company can't get 
anti-forgery right). The rDNS of cox.net seems odd for a CDN, but 
there's not really any standard and I don't know offhand if that's the 
hostname format they use or not.


Re: The word on messages w/ no Message-Id

2015-09-28 Thread Joe Quinn

On 9/28/2015 2:22 PM, Philip Prindeville wrote:

Though listed as optional in the table in section 3.6, every message
SHOULD have a "Message-ID:" field.  Furthermore, reply messages
SHOULD have "In-Reply-To:" and "References:" fields as appropriate
and as described below.
This is much more plain-english and clearly says SHOULD, so my 
interpretation of the rest would be what MUST be done IF "Message-ID" is 
present. In any event, RFC compliance is orthogonal to being spam or ham 
and at the end of the day, SA is an "I don't want this email" spam 
classifier and not an RFC validator.


If you don't want to be getting those emails, they are spam and you 
should score it something reasonable that doesn't prevent you getting 
other desired messages. While I don't have any specific examples of ham 
without Message-ID, it's not a stretch to imagine they exist. I 
personally wouldn't write that rule.


Re: Rule Help

2015-09-25 Thread Joe Quinn

On 9/25/2015 10:28 AM, Dianne Skoll wrote:

On Fri, 25 Sep 2015 14:21:50 +
Dave  wrote:


I am trying to create a rule that scores TLD's in received headers if
they are not certain TLD's. What I have so far:

Your logic is wrong.  And you can do it all with one regex:

header GC_TLD_COM Received !~/\.(?:com|net|org|edu|uk)\b/i

I won't comment on the advisability of such a rule; the policy is up to you.
Also beware that this will trigger on IPs with no reverse DNS.

Regards,

Dianne.

I'll comment, since I like these sorts of rules.

There's a ridiculous amount of TLDs, and their use is starting to 
becoming more common to the point of almost routine, between t.co, 
goo.gl, etc. That makes a rule like yours hard to justify most of the 
time, but I can see industries where it's valid to give a very low score 
on the order of 0.5 tops. Typically you will want to work backwards and 
write rules for "these TLDs are particularly bad". You should think very 
long and hard before blacklisting a TLD entirely, as well. The only 
scenario I can imagine being valid would be if you were running a school 
and blacklisting the .xxx TLD.


This might be a nice light spam indicator, but like all broad rules it's 
easy to end up with a 10% FP rate. You need a full understanding of the 
mail traffic it operates on, which nobody else on-list has but you.


Re: Recommendations for mail with only an image

2015-09-17 Thread Joe Quinn

On 9/17/2015 2:31 PM, Alex wrote:

Hi,


There are a few rules that seem to overlap in these instances:

*  2.3 EMPTY_MESSAGE Message appears to have no textual parts and no
*  Subject: text
*  1.0 FSL_EMPTY_BODY Message has completely empty body


Those two should probably be evaluated for overlap.


Or at best rescoring those for your personal installation.

Do we agree there is a potential overlap? Should I open a bug for
this, or just continue to adjust locally?

Thanks everyone,
Alex


Possibly a silly question, but where is FSL_EMPTY_BODY coming from? The 
string has no occurrences in the current trunk.


Re: Repository of rules

2015-09-09 Thread Joe Quinn

On 9/9/2015 5:43 AM, Sujit Acharyya-choudhury wrote:


Hi Joe,

I looked at the rule set and it was very interesting and I intend to 
use it. However, I did not see any *.pm file attached to it.  Is there 
any need for this?  Do you suggest, I increase the default score from 
5.0 to 6.0 if I include this rule?  I am interested in your view, 
especially the phishing rules will hit lot of mail which are coming 
through at present and causing mayhem.


Regards

Sujit

*From:*Joe Quinn [mailto:jqu...@pccc.com]
*Sent:* 08 September 2015 16:27
*To:* users@spamassassin.apache.org
*Subject:* Re: Repository of rules

On 9/8/2015 11:13 AM, Anthony Hoppe wrote:

Hey All,

This is likely a n00b question, so I apologize.

I've been a member of this list for a while.  Periodically, I see
rules develop based on submissions of samples from other members.
 Is there, by chance, a repository of rules like that somewhere I
can reference?  I'm not often able to keep up and would love to go
back and add rules that I think will benefit my environment.

Thanks!

~ Anthony

You can find some of them in 
http://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf which 
updates regularly, but it only includes rules for myself and Kevin. 
Off the top of my head, I can't think of a similar file for anyone 
else on the list. A lot of rules eventually end up committed and then 
it's up to RuleQA to decide if they merit going to sa-update.


Keep replies on-list. Just put it into /etc/mail/spamassassin and 
restart anything you need to, and it will be loaded. You can bump the 
threshold if you like, as the file notes:


#This cf file is designed for systems with a threshold of 5.0 or higher.




Re: Repository of rules

2015-09-08 Thread Joe Quinn

On 9/8/2015 11:13 AM, Anthony Hoppe wrote:

Hey All,

This is likely a n00b question, so I apologize.

I've been a member of this list for a while.  Periodically, I see 
rules develop based on submissions of samples from other members.  Is 
there, by chance, a repository of rules like that somewhere I can 
reference?  I'm not often able to keep up and would love to go back 
and add rules that I think will benefit my environment.


Thanks!

~ Anthony
You can find some of them in 
http://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf which updates 
regularly, but it only includes rules for myself and Kevin. Off the top 
of my head, I can't think of a similar file for anyone else on the list. 
A lot of rules eventually end up committed and then it's up to RuleQA to 
decide if they merit going to sa-update.


Re: phishing rules

2015-08-25 Thread Joe Quinn

On 8/25/2015 7:51 AM, RW wrote:

On Tue, 25 Aug 2015 09:55:57 +0200
Tom Hendrikx wrote:



Basically every MUA I know will label the message as a possible scam
when you use the BAD version, which why you actually never see it in
non-spam mail, unless the editor was a real noob.

That applies to spam too.

Would this really have a significant effect on modern phishes?
It still works against a lot of people, even those who know what to look 
for. It's easy to get complacent and click a link without checking it 
first when you go through a hundred emails a day.


That said, it also works because it's common in ham to the point that 
you just sometimes have to ignore it. Lots of questionable but 
consented-to mass marketing emails will use a tracker domain for 
embedded URLs, so when someone links to a 
href=http://apache.orgapache.org/a, it gets rewritten and now it hits 
this new rule. Or perhaps if you ever are told to go to a 
href=http://*www*.google.comgoogle.com/a and log into a 
href=http://*accounts.google.com*gmail.com/a you'll hit the rule too...


There's a lot of reasons to have such a rule and lots of reasons to not 
have it. Without any data, I would lean towards not having it, because 
there's usually a better pattern to match on.


But we can have data! Put the rule in a sandbox and see what RuleQA 
thinks of its stats.


Re: Hitting an address in the From:name

2015-08-20 Thread Joe Quinn

On 8/20/2015 2:42 PM, Olivier Coutu wrote:
I got a spearphishing e-mail the other day that had a From with the 
following form:


From: Mister President presid...@company.com
phish...@freemailer.com

I attempted to craft a SA rule to catch the @ in the From:name but I 
was unable to catch anything after the 

ex:
From:name =~ /Mister President/hits
From:name =~ /Mister President \/does not hit
From:name =~ /\@/  does not hit
From:name =~ /company/does not hit
From =~ /\@.*\@/  hits but is inefficient

I believe that SA may be removing the presid...@company.com part 
from the From:name, am I correct? Is there any efficient way to detect 
such an occurrence of an @ in the From:name?


Using SA version 3.4.1 on Ubuntu with debug
Good catch! If you are using a new enough perl you might try the 
following which should have zero backtracking (the + modifier on 
quantifiers works like a cut in prolog):


From =~ /\@[^@]*+\@/

That said, header fields are likely never going to be long enough for 
what you currently have to be a performance concern.


(I was about to say it was impossible, but then I saw there is no length 
limit on headers: 
http://stackoverflow.com/questions/2721605/maximum-size-of-email-x-headers)


Re: Hitting an address in the From:name

2015-08-20 Thread Joe Quinn

On 8/20/2015 2:56 PM, John Hardin wrote:

On Thu, 20 Aug 2015, Olivier Coutu wrote:
I believe that SA may be removing the presid...@company.com part 
from the From:name, am I correct?


Define this rule:

   header   __ALL_FROMNAME   From:name =~ /.*/

...and run spamassassin on a test message using:
   --debug area=all,rules,rules-all

You'll be able to see exactly what's available to match against.

I'd suggest for a From address like that, if it *is* dropping the 
email address within the comment a bug should be filed.


Already opened a bug. The fact that From: name =~ /\@/ didn't match is 
proof enough for me that something is wrong.


Re: Return Path (TM) whitelists

2015-07-10 Thread Joe Quinn

On 7/9/2015 6:07 PM, Dianne Skoll wrote:

On Fri, 10 Jul 2015 07:58:39 +1000
Noel Butler noel.but...@ausics.net wrote:


+1

I'll throw my +1 in on this also.  Almost by definition, the kinds of
organizations who buy into these certifications to get their mail
delivered are unlikely to be the kinds of organizations I want to
hear from.

Just as SPF pass is a mild spam indicator nowadays, so is a pass
on these kinds of certifications.

Regards,

Dianne.
I think your information on SPF is a bit out of date (though indeed when 
the spec was new, you could easily score it quite heavily).


http://ruleqa.spamassassin.org/?daterev=20150709-r1690028-nrule=SPF_PASSsrcpath=g=Change
http://ruleqa.spamassassin.org/?daterev=20150709-r1690028-nrule=SPF_HELO_PASSsrcpath=g=Change

It's not good enough to give a negative score all by itself, since it's 
still very easy to make useless SPF records, but it's not what it used 
to be.


Re: PerMsgStatus Util warnings

2015-05-15 Thread Joe Quinn

On 5/15/2015 9:49 AM, Kevin A. McGrail wrote:

On 5/15/2015 9:43 AM, Axb wrote:

Kartsten's GUDO plugin also uses uri_to_domain

What do we have to replace that function with?


The uri_to_domain is now in 
Mail::SpamAssassin::RegistryBoundaries::uri_to_domain.


Reiterating the announcement:

Notable Internal changes


Mail::SpamAssassin::Util::RegistrarBoundaries is being replaced by
Mail::SpamAssassin::RegistryBoundaries so that new TLDs can be updated
via 20_aux_tlds.cf delivered via sa-update.

The $VALID_TLDS_RE global in registrar boundaries is deprecated but kept
for third-party plugin compatibility.  It may be removed in a future
release. See Mail::SpamAssassin::Plugin::FreeMail for an example of the
new way of abtaining a valid list of TLDs.

The following functions and variables will be removed in the next
release after 3.4.1 excepting any emergency break/fix releases
immediately after 3.4.1:
  Mail::SpamAssassin::Util::RegistrarBoundaries::is_domain_valid
  Mail::SpamAssassin::Util::RegistrarBoundaries::trim_domain
  Mail::SpamAssassin::Util::RegistrarBoundaries::split_domain
  Mail::SpamAssassin::Util::uri_to_domain
  Mail::SpamAssassin::Util::RegistrarBoundaries::US_STATES
Mail::SpamAssassin::Util::RegistrarBoundaries::THREE_LEVEL_DOMAINS
  Mail::SpamAssassin::Util::RegistrarBoundaries::TWO_LEVEL_DOMAINS
  Mail::SpamAssassin::Util::RegistrarBoundaries::VALID_TLDS_RE
  Mail::SpamAssassin::Util::RegistrarBoundaries::VALID_TLDS

This change should only affect 3rd party plugin authors who will need
to update their code to utilize Mail::SpamAssassin::RegistryBoundaries.

Also from RegistrarBoundaries:

=item $domain = trim_domain($fqdn)
[snip]
This function has been moved !!! See 
Mail::SpamAssassin::RegistryBoundaries !!!

This is left as transition fallback for third party plugins.
It will be removed in the future.
=cut



Re: PerMsgStatus Util warnings

2015-05-15 Thread Joe Quinn

On 5/15/2015 10:00 AM, Joe Quinn wrote:

On 5/15/2015 9:49 AM, Kevin A. McGrail wrote:

On 5/15/2015 9:43 AM, Axb wrote:

Kartsten's GUDO plugin also uses uri_to_domain

What do we have to replace that function with?


The uri_to_domain is now in 
Mail::SpamAssassin::RegistryBoundaries::uri_to_domain.


Reiterating the announcement:

Notable Internal changes


Mail::SpamAssassin::Util::RegistrarBoundaries is being replaced by
Mail::SpamAssassin::RegistryBoundaries so that new TLDs can be updated
via 20_aux_tlds.cf delivered via sa-update.

The $VALID_TLDS_RE global in registrar boundaries is deprecated but kept
for third-party plugin compatibility.  It may be removed in a future
release. See Mail::SpamAssassin::Plugin::FreeMail for an example of the
new way of abtaining a valid list of TLDs.

The following functions and variables will be removed in the next
release after 3.4.1 excepting any emergency break/fix releases
immediately after 3.4.1:
  Mail::SpamAssassin::Util::RegistrarBoundaries::is_domain_valid
  Mail::SpamAssassin::Util::RegistrarBoundaries::trim_domain
  Mail::SpamAssassin::Util::RegistrarBoundaries::split_domain
  Mail::SpamAssassin::Util::uri_to_domain
  Mail::SpamAssassin::Util::RegistrarBoundaries::US_STATES
Mail::SpamAssassin::Util::RegistrarBoundaries::THREE_LEVEL_DOMAINS
Mail::SpamAssassin::Util::RegistrarBoundaries::TWO_LEVEL_DOMAINS
  Mail::SpamAssassin::Util::RegistrarBoundaries::VALID_TLDS_RE
  Mail::SpamAssassin::Util::RegistrarBoundaries::VALID_TLDS

This change should only affect 3rd party plugin authors who will need
to update their code to utilize Mail::SpamAssassin::RegistryBoundaries.

Also from RegistrarBoundaries:

=item $domain = trim_domain($fqdn)
[snip]
This function has been moved !!! See 
Mail::SpamAssassin::RegistryBoundaries !!!

This is left as transition fallback for third party plugins.
It will be removed in the future.
=cut
Er, misread AXB's email. In any event, it's pretty thoroughly documented 
so programmers should notice immediately when their function calls start 
vanishing.


Re: DNSWL fp and other problems

2015-05-11 Thread Joe Quinn

On 5/11/2015 9:42 AM, Alex Regan wrote:

Hi,

I have a fp that was passed through thomsonreuters, hitting 
RCVD_IN_DNSWL_HI, receiving -5 points, from an obvious hacked account.


http://pastebin.com/5LYS7s2v

This is with v3.4.1, but an older bayes database, so perhaps it needs 
to be rebuilt. Even with BAYES_99, it still wouldn't have been tagged 
properly, however.


I'm curious if there's anything further that could have been done to 
block this outside of a body rule matching this specific pattern?


Is it also interesting that thomsonreuters.com has no SPF information?

Thanks,
Alex
It's definitely common to find domains hitting on 
KAM_LAZY_DOMAIN_SECURITY. You might bump the score of that rule into the 
3-4 range in addition to fixing the Bayes classification and writing a 
specific rule, however it would depend heavily on what your ham is. The 
potential for FP is huge.


In an ideal world, KAM_LAZY_DOMAIN_SECURITY would be poison-pill but 
there's just too many legitimate places that pay no regard to 
anti-forgery mechanisms.


Re: Particularly annoying spam

2015-05-01 Thread Joe Quinn

On 5/1/2015 10:55 AM, Larry Rosenman wrote:

http://pastebin.com/4gck7uLD

This one and one's like it seem to get through multiple times/day.

Any help here?  Today's is WITH 3.4.1..

That's a variant on a pretty old campaign that I haven't seen get 
through in a long while.


I've updated KAM.cf so it hits your sample, which you can set a cronjob 
to download from here:

http://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf

The rule it will hit on is KAM_SALEA.


Re: TxRep $msgscore warning

2015-04-30 Thread Joe Quinn

On 4/30/2015 9:10 AM, Birta Levente wrote:

On 30/04/2015 15:55, Joe Quinn wrote:

On 4/30/2015 7:09 AM, Birta Levente wrote:

Hi

I saw the bug report about TxRep warning:
_WARN: Use of uninitialized value $msgscore in addition (+) at 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 
1415.

_WARN: Use of uninitialized value $msgscore in subtraction (-)


I just wonder if there is any workaroung for that or if exists any 
effect of this warning?


Thanks

We know about the issue but have been having trouble reproducing it 
in a way we can experiment on to find where the bug is.


Can you generate a reproduction or post samples and relevant txrep 
rows to this bug?

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7164


I will, but only next week.
What you mean in samples? Full email with body or just headers?

Full email would be best. It can be something contrived if it can 
reproduce the issue on your system. And since we're not sure what causes 
it, your TxRep configuration and any rows of your txrep table that match 
will be helpful too.


The warning should be harmless and won't affect your mail flow in the 
meantime.


Re: TxRep $msgscore warning

2015-04-30 Thread Joe Quinn

On 4/30/2015 9:22 AM, Joe Quinn wrote:

On 4/30/2015 9:10 AM, Birta Levente wrote:

On 30/04/2015 15:55, Joe Quinn wrote:

On 4/30/2015 7:09 AM, Birta Levente wrote:

Hi

I saw the bug report about TxRep warning:
_WARN: Use of uninitialized value $msgscore in addition (+) at 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 
1415.

_WARN: Use of uninitialized value $msgscore in subtraction (-)


I just wonder if there is any workaroung for that or if exists any 
effect of this warning?


Thanks

We know about the issue but have been having trouble reproducing it 
in a way we can experiment on to find where the bug is.


Can you generate a reproduction or post samples and relevant txrep 
rows to this bug?

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7164


I will, but only next week.
What you mean in samples? Full email with body or just headers?

Full email would be best. It can be something contrived if it can 
reproduce the issue on your system. And since we're not sure what 
causes it, your TxRep configuration and any rows of your txrep table 
that match will be helpful too.


The warning should be harmless and won't affect your mail flow in the 
meantime.
Can you also include how you call spamassassin? Are you using built-in 
glue from a milter like MD, or spamassassin, or spamc/spamd?


Re: TxRep $msgscore warning

2015-04-30 Thread Joe Quinn

On 4/30/2015 7:09 AM, Birta Levente wrote:

Hi

I saw the bug report about TxRep warning:
_WARN: Use of uninitialized value $msgscore in addition (+) at 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 1415.

_WARN: Use of uninitialized value $msgscore in subtraction (-)


I just wonder if there is any workaroung for that or if exists any 
effect of this warning?


Thanks

We know about the issue but have been having trouble reproducing it in a 
way we can experiment on to find where the bug is.


Can you generate a reproduction or post samples and relevant txrep rows 
to this bug?

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7164


Re: v=spf1 +all

2015-04-24 Thread Joe Quinn

On 4/24/2015 9:38 AM, Reindl Harald wrote:


Am 24.04.2015 um 15:22 schrieb Dianne Skoll:

On Fri, 24 Apr 2015 15:17:45 +0200
Reindl Harald h.rei...@thelounge.net wrote:


v=spf1 exists:gmail.com -all



makes no sense - the spammer don't own the domain in most cases and
if they do then they just don't add a SPF policy to use it with
infected clients


Spammers often register and use throwaway domains.  And check how the
exists: mechanism works


well, and how becomes SPF part of the game in case of a throw-away 
domain as long as score SPF_NONE 0 - why in the world should a 
spammer add a TXT record to a throw-away domain?



Because passing SPF causes other checks to not trigger. For instance, 
KAM.cf has a lot of rules that meta on KAM_LAZY_DOMAIN_SECURITY. The 
default spamassassin rules also meta extensively on SPF failure, via 
__NOT_SPOOFED.


Re: v=spf1 +all

2015-04-24 Thread Joe Quinn

On 4/24/2015 11:23 AM, Dianne Skoll wrote:

On Fri, 24 Apr 2015 16:20:41 +0100
Paul Stead paul.st...@zeninternet.co.uk wrote:


I've had thoughts of an extension which calculates the number of IP
addresses specified in an SPF record, then calculating the % of
world-wide addresses this SPF declares... I don't seem to be able to
bend the Perl SPF module to spit out any numbers etc so seems it would
have to be coded separately

Someone sent me off-list some Perl that does that.  I haven't looked closely
at it.  If that person is on this list, maybe he'll send it on-list?

Regards,

Dianne.
I suppose it's safe enough to post publicly. Be aware that it's just a 
proof of concept and not tested thoroughly enough to guarantee it's 
correct, performant, or even if it terminates in all cases.


Theoretically, it does the following
detect +all and ?all (both of which specify to deliver without marking)
detect coverage of the IPv4 and v6 address spaces (by /16)
detect when followed records exceed a max depth
detect when an SPF record loops on itself
detect uninterpolated exists
detect syntax errors in exists macros

It also stores IP coverage as a bitmask, so it should measure somewhere 
around 16k - 20k of memory consumption as well. Script is attached, 
anyone can feel free to adapt it for SA.
use strict;
use warnings;

use Net::DNS;
use Net::IP;

# fetch spf record for domain
my $argument_domain = $ARGV[0];

print check_domain($argument_domain) .\n;

# returns one of not useless, useless - $reason, gave up - $reason, 
invalid - $reason
# for SPF syntax, see http://www.openspf.org/SPF_Record_Syntax
# for macro syntax, see http://www.openspf.org/RFC_4408#macros
sub check_domain {
  my ($domain, %params) = @_;
  my $dns = Net::DNS::Resolver-new;
  my $query = $dns-search($domain, 'TXT') or die Error performing TXT query 
for $domain! . $dns-errorstring;

  if (not defined $params{'domains_seen'}) {
$params{'domains_seen'} = [];
  }

  if (grep {$_ eq $domain} @{$params{'domains_seen'}}) {
return invalid - detected domain loop beginning with $domain;
  }

  push(@{$params{'domains_seen'}}, $domain);

  $params{'iteration'} ||= 1;
  $params{'max_iterations'} ||= 40;

  # build array of /16s for ip range masking
  # an spf record is useless if it allows at least one ip address in every /16
  # this is a messy heuristic to avoid resource exhaustion, especially with ipv6
  # array is 2 ** 16 flags stored as 32-bit bitmasks (each mask holding 2 ** 5 
flags)
  if (not defined $params{'ipv4_coverage'}) {
$params{'ipv4_coverage'} = [];
$#{$params{'ipv4_coverage'}} = 2 ** (16 - 5) - 1;
  }
  if (not defined $params{'ipv6_coverage'}) {
$params{'ipv6_coverage'} = [];
$#{$params{'ipv6_coverage'}} = 2 ** (16 - 5) - 1;
  }

  if ($params{'iteration'}  $params{'max_iterations'}) {
return gave up - max dns query iteration limit ($params{'max_iterations'}) 
reached;
  }

  foreach my $result ($query-answer) {
next unless $result-type eq 'TXT';
my $spf_line = $result-txtdata;

if ($spf_line =~ /^v=spf[12]/i) {
  # split into clauses
  my @clauses = split / /, $spf_line;

  # first, search for replace and operate on that instead
  foreach my $clause (@clauses) {
if ($clause =~ /^redirect=(.*)$/) {
  my $domain = $1;
  if ($domain =~ /%[{_-]/) {
return gave up - macros in redirect modifier not supported 
($domain);
  } elsif ($domain =~ /%[^{_%-]/) {
return invalid - syntax error in macro interpolation for $domain;
  } else {
# format escaped percent literals
$domain =~ s/%%/%/g;

# return recursed result
return check_domain($domain, %params, 
iteration=$params{'iteration'} + 1);
  }
}
  }

  foreach my $clause (@clauses) {
# for each clause that is pass or neutral

# clauses default to +
# + (pass) and ? (neutral) both specify to deliver mail
# - (fail) and ~ (soft fail) specify to deliver or mark
# we don't care about - and ~ results because they can't be used to 
falsely improve score
next if $clause =~ /^[-~]/;

# if ip address or range, add to ip coverage
# track ipv4 and ipv6 separately by /16
if ($clause =~ /^.?ip4:(.*)/) {
  my $address = $1;
  mark_ip_ranges($params{'ipv4_coverage'}, $params{'ipv6_coverage'}, 
$address);
} elsif ($clause =~ /^.?ip6:(.*)/) {
  my $address = $1;
  mark_ip_ranges($params{'ipv4_coverage'}, $params{'ipv6_coverage'}, 
$address);
} elsif ($clause =~ /^.?all/) {
  # if +all, rule is clearly useless
  return useless - use of universal pass rule $clause;
} elsif ($clause =~ /^.?exists:(.*)/) {
  my $exists_domain = $1;

  # if using an exists rule without macros, rule is clearly useless
  if ($exists_domain !~ /%{/) {
 

Re: Awl on Redis

2015-04-17 Thread Joe Quinn

On 4/17/2015 7:58 AM, Kevin A. McGrail wrote:

On 4/17/2015 6:46 AM, ma...@nucleus.it wrote:

Hi to all,
a saw that from spamassassin 3.4 Bayes can be stored on a Redis
database.

Is it possible also for Awl (auto_whitelist) ?
Or maybe in the future ?
We are currently looking at TxRep as a replacement for AWL but no, 
neither of them lends themselves to a Redis backend.  Perhaps someone 
smarter than I can figure out how to do that!


Regards,
KAM
Or you could be that smarter someone! Submit a patch and it might be 
accepted. We're always looking for contributors.


Re: blacklist_uri_host

2015-04-03 Thread Joe Quinn

On 4/2/2015 4:23 PM, Axb wrote:

Gals (3?)  Guys

If you're being plagued by the new TLD spams AND using SA 3.4.x
don't forget blacklist_uri_host

per default it's scored
score URI_HOST_IN_BLACKLIST 100

but you may want to be less radical and just use a score butnot treat 
as a poison pill rule so:


___
# Adjust score to match the weather
ifplugin Mail::SpamAssassin::Plugin::WLBLEval
score URI_HOST_IN_BLACKLIST  3.0

blacklist_uri_host science
blacklist_uri_host work
blacklist_uri_host click

endif


and just add the abused TLD_du_jour

 Happy Easter - May the bunny be with you.

Axb


By the way, you can add .rocks to your list. We're doing something 
similar in KAM.cf as KAM_OTHER_BAD_TLD.


Re: RBL/SPF if header exists

2015-03-31 Thread Joe Quinn

On 3/31/2015 12:12 PM, Mike Cardwell wrote:

* on the Tue, Mar 31, 2015 at 11:59:39AM -0400, Joe Quinn wrote:


Is it possible to enable or disable RBL and/or SPF checks according to
the existence or lack of a header?

Without going into too many details, I need a way of transmitting to
SpamAssassin at scan-time that it should not run SPF or RBL checks on
a particular message, which isn't based on a hardcoded per user or
IP setting.

Do you need the actual testing disabled, or just the score?

Ideally I'd like to disable the tests, but if I can just remove the
score, that would be sufficient.


You can fairly easily write a meta that reverses the score of each RBL
and SPF rule if your condition fires.

Any chance you could point me to an example of how to do this?

Here's an example from when Yahoo's internal Received headers were 
hitting RCVD_ILLEGAL_IP, taken from here:

http://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf

  header __KAM_YAHOO_MISTAKE1 From =~ /\@yahoo\./i

  meta KAM_YAHOO_MISTAKE (SPF_PASS  __KAM_YAHOO_MISTAKE1  
RCVD_ILLEGAL_IP)
  describe KAM_YAHOO_MISTAKE Reversing score for some idiotic Yahoo 
received headers

  scoreKAM_YAHOO_MISTAKE -3.0

This rule undoes RCVD_ILLEGAL_IP, which has a score of 3.0.


Re: RBL/SPF if header exists

2015-03-31 Thread Joe Quinn

On 3/31/2015 12:23 PM, Mike Cardwell wrote:

* on the Tue, Mar 31, 2015 at 12:15:31PM -0400, Joe Quinn wrote:

Here's an example from when Yahoo's internal Received headers were
hitting RCVD_ILLEGAL_IP, taken from here:
http://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf

header __KAM_YAHOO_MISTAKE1 From =~ /\@yahoo\./i

meta KAM_YAHOO_MISTAKE (SPF_PASS  __KAM_YAHOO_MISTAKE1 
RCVD_ILLEGAL_IP)
describe KAM_YAHOO_MISTAKE Reversing score for some idiotic Yahoo
received headers
scoreKAM_YAHOO_MISTAKE -3.0

This rule undoes RCVD_ILLEGAL_IP, which has a score of 3.0.

Thanks for the example. The only problem with the above is that I believe
I would have to write a rule for every single RBL and keep those rules
up to date whenever a new RBL is added or score updated by upstream.
Is there any way of avoiding that?

Not an easy way that I know of offhand. Others might know, or if you 
have the coding ability you might try writing a plugin to automate at 
least tracking the RBL scores.


I remember there was a similar question asked a few months ago about 
canceling an AWL score or something similar which might be useful. I 
can't find it in Google, but you might have luck finding a better 
solution from that thread.


Re: RBL/SPF if header exists

2015-03-31 Thread Joe Quinn

On 3/31/2015 11:45 AM, Mike Cardwell wrote:

Is it possible to enable or disable RBL and/or SPF checks according to
the existence or lack of a header?

Without going into too many details, I need a way of transmitting to
SpamAssassin at scan-time that it should not run SPF or RBL checks on
a particular message, which isn't based on a hardcoded per user or
IP setting.

Do you need the actual testing disabled, or just the score? You can 
fairly easily write a meta that reverses the score of each RBL and SPF 
rule if your condition fires.


Re: Spamassassin not catching spam (Follow-up)

2015-03-26 Thread Joe Quinn

On 3/26/2015 9:19 AM, Reindl Harald wrote:



Am 26.03.2015 um 14:13 schrieb David F. Skoll:

On Thu, 26 Mar 2015 14:02:19 +0100
Robert Schetterer r...@sys4.de wrote:


Silent discard mail is mostly forbidden in the EU,


Is it?  Could you perhaps point me to the EU directive stating this?
I'm sure there must be lots of qualifications


in germany 2 years jail

§ 303a StGB -
Datenveränderung

(1) Wer rechtswidrig Daten (§ 202a Abs. 2) löscht, unterdrückt, 
unbrauchbar macht oder verändert, wird mit Freiheitsstrafe bis zu zwei 
Jahren oder mit Geldstrafe bestraft


That's just the penalty clause, it doesn't define what's considered 
unlawful deletion of data.


  1   2   >