Re: RCVD_IN_SORBS_SPAM and google IPs

2016-09-08 Thread RW
On Thu, 8 Sep 2016 15:53:00 -0500 (CDT)
Shane Williams wrote:

> Hey all,
> 
> I'm seeing google IP ranges hit the RCVD_IN_SORBS_SPAM rule, and in
> digging deeper, I realize that there are zero hits on this rule for
> the two weeks prior to Aug. 31, and now I'm seeing it thousands of
> times per week (not just against google IPs).
> 
> Was this rule added/changed/re-scored in a recent sa-update? 

It was commented out for a long time because it had a delisting fee,
but was recently re-enabled.

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=2221#c16


Re: RCVD_IN_SORBS_SPAM and google IPs

2016-09-08 Thread li...@rhsoft.net



Am 08.09.2016 um 22:53 schrieb Shane Williams:

I'm seeing google IP ranges hit the RCVD_IN_SORBS_SPAM rule, and in
digging deeper, I realize that there are zero hits on this rule for
the two weeks prior to Aug. 31, and now I'm seeing it thousands of
times per week (not just against google IPs).

Was this rule added/changed/re-scored in a recent sa-update?


rules are re-scoring all the time

2.397 is *way* too high because SORBS has a ton of different scorings 
and to land on "spam" is not hard for large providers which *in fact* 
all day long send some amount of spam sicne large freemail providers 
have no way to avoid it completly


"spam.dnsbl.sorbs.net" (127.0.0.6 response) has here 3 points on 
postscreen and 1.0 for SA - in both cases reject begins with 8.0




Re: RCVD_IN_SORBS_SPAM and google IPs

2016-09-08 Thread Zinski, Steve
I’m seeing the same thing here, I’ve had to adjust that score lower. Also 
seeing lots of RCVD_IN_SORBS_WEB false-positives.


On 9/8/16, 4:53 PM, "Shane Williams"  wrote:

Hey all,

I'm seeing google IP ranges hit the RCVD_IN_SORBS_SPAM rule, and in
digging deeper, I realize that there are zero hits on this rule for
the two weeks prior to Aug. 31, and now I'm seeing it thousands of
times per week (not just against google IPs).

Was this rule added/changed/re-scored in a recent sa-update?  I looked
at ruleqa.spamassassin.org, and just at a glance notice that the rule
doesn't seem to be in commits previous to Aug. 30, but I may totally
be reading the site's information wrong.

I've turned the score down to a tiny, but non-zero value for now,
because it seems to be pushing legit emails close (if not over) the
local threshold.

-- 
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew





RCVD_IN_SORBS_SPAM and google IPs

2016-09-08 Thread Shane Williams

Hey all,

I'm seeing google IP ranges hit the RCVD_IN_SORBS_SPAM rule, and in
digging deeper, I realize that there are zero hits on this rule for
the two weeks prior to Aug. 31, and now I'm seeing it thousands of
times per week (not just against google IPs).

Was this rule added/changed/re-scored in a recent sa-update?  I looked
at ruleqa.spamassassin.org, and just at a glance notice that the rule
doesn't seem to be in commits previous to Aug. 30, but I may totally
be reading the site's information wrong.

I've turned the score down to a tiny, but non-zero value for now,
because it seems to be pushing legit emails close (if not over) the
local threshold.

--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: New Mail::SpamAssassin::Plugin::HeadersEqual plugin

2016-09-08 Thread Amir Caspi
> On Sep 8, 2016, at 10:05 AM, apache.org+spamassas...@daniel-rudolf.de wrote:
> 
> As you can see, SA will increase the score by 0.5 when the From: and 
> Return-Path: headers don't match ("ne" for "not equal").

This particular rule will FP for most mailing list emails... including this 
one.  (Return-Path is to a special bounce-catching address.)  That's not to say 
the plugin isn't useful, but this particular rule is dangerous...

--- Amir



Re: drive-by malware customized to the From.RealName of actual Friends

2016-09-08 Thread John Hardin

On Thu, 8 Sep 2016, Chip M. wrote:


Last week, I sent John Hardin some spamples, and he very kindly
wrote & masschecked rules over the long weekend (Geek!). :)
He found a significant FP risk.


It's possible meta'ing with some of the conditions mentioned above would 
reduce the FPs.


Unfortunately there is very little of this in the masscheck spam corpus so 
even if we got a totally clean rule there might not be enough activity to 
get a (safe) rule published.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The question of whether people should be allowed to harm themselves
  is simple. They *must*.   -- Charles Murray
---
 9 days until the 229th anniversary of the signing of the U.S. Constitution


New Mail::SpamAssassin::Plugin::HeadersEqual plugin

2016-09-08 Thread apache . org+spamassassin

Hi,

I would like to share my (pretty simple) SA plugin I've developed 
recently to do a pretty basic task: Comparing message headers against 
each other.


It is mostly useful to compare the various address headers of an email, 
a frequent use case might be to compare the Return-Path: and From: headers:


  loadplugin Mail::SpamAssassin::Plugin::HeadersEqual 
[/path/to/HeadersEqual.pm]
  header SENDER_MISMATCH eval:headers_equal("ne", "From:addr", 
"Return-Path:addr")

  score  SENDER_MISMATCH 0.5

As you can see, SA will increase the score by 0.5 when the From: and 
Return-Path: headers don't match ("ne" for "not equal"). The plugin 
passes all headers to Mail::SpamAssassin::PerMsgStatus->get(). This 
particularly allows you to append :raw, :addr and :name to header names 
to adjust what is compared. :addr e.g. causes SA to remove everything 
except the first email address from the header field.


You're not limited to comparing two headers, you can use a arbitrary 
number of headers. Here's the full syntax:


  header HEADERS_EQUAL eval:headers_equal(header1, header2, ...)
  header HEADERS_EQUAL eval:headers_equal("eq", header1, header2, ...)
  header HEADERS_NOT_EQUAL eval:headers_equal("ne", header1, header2, ...)

This kind of functionality is traditionally realized using regexps, 
however, regexps are very inflexible and expensive for this particular 
use case. AFAIK there's no similar plugin yet.


You can find it on GitHub Gist:
https://gist.github.com/PhrozenByte/4af0045b6507ceb24e4231988c7d9fcf

Feedback is highly appreciated!

Cheers,
Daniel


Re: spample of "data" URL in well-crafted Phish

2016-09-08 Thread John Hardin

On Thu, 8 Sep 2016, Chip M. wrote:


On Sat, 3 Sep 2016, John Hardin wrote:

I've tweaked the FP avoidance a bit, maybe that will be enough
to get the S/O up high enough to publish it.


John, do you have any detailed info about the Ham hits?


It's possible to look up what rules hit those messages, but to see the 
content and judge what might need to be changed I'd have to get in touch 
with the corpus owner and ask them about the messages - whether they were 
correctly classified as ham or spam, and whether they'd be willing to 
share them. That may not be possible as ham corpora are often private and 
sensitive.


To view the rule hits in masscheck, assuming that's of interest:
1. go to the detail page for the rule you're interested in, e.g.:
http://ruleqa.spamassassin.org/20160907-r1759562-n/URI_DATA/detail

2. in the "set 0, broken down by contributor", click on any links in the 
HAM% column.


You'll see something like:
.  1 
/data/archive/ham-misc//1433183357.M606569P40031.fumail03.cleanmail.ch,S=39348,W=40036%3A2,S 
HTML_MESSAGE,T_DKIM_INVALID,T_FSL_RCVD_EX_3,T_FSL_RCVD_TR_2,T_FSL_RCVD_UT_3,T_KAM_HTML_FONT_INVALID,T_NOT_A_PERSON,T_REMOTE_IMAGE,URI_DATA,URI_TRUNCATED,__ANY_TEXT_ATTACH,__ANY_TEXT_ATTACH_DOC,__BODY_TEXT_LINE,__BODY_TEXT_LINE,__BODY_TEXT_LINE,__BUGGED_IMG,__CT,__CTYPE_CHARSET_QUOTED,__CTYPE_HAS_BOUNDARY,__CTYPE_MULTIPART_ALT,__CTYPE_MULTIPART_ANY,__DKIM_EXISTS,__DOS_HAS_ANY_URI,__DOS_HAS_LIST_UNSUB,__DOS_RCVD_MON,__DOS_RCVD_SUN,__DOS_RELAYED_EXT,__FROM_ENCODED_QP,__FROM_FULL_NAME,__FROM_NEEDS_MIME,__FSL_COUNT_EXTERN,__FSL_COUNT_EXTERN,__FSL_COUNT_EXTERN,__FSL_COUNT_TRUST,__FSL_COUNT_TRUST,__FSL_COUNT_UNTRUST,__FSL_COUNT_UNTRUST,__FSL_COUNT_UNTRUST,__FSL_HAS_LIST_UNSUB,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_CAMPAIGN,__HAS_DATE,__HAS_DKIM_SIGHD,__HAS_DOMAINKEY_SIG,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MSGID,__HAS_RCVD,__HAS_REPLY_TO,__HAS_SUBJECT,__HAS_TO,__HAS_URI,__HAVE_BOUNCE_RELAYS,__HTML_LINK_IMAGE,__JM_REACTOR_DATE,__LAST_EXTERNAL_RELAY_NO_AUTH,__LAST_UNTRUSTED_RELAY_NO_AUTH,_!
_LIST_PARTIAL,__LOCAL_PP_NONPPURL,__MIME_HTML,__MIME_VERSION,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__MSGID_OK_HOST,__NAKED_TO,__NONEMPTY_BODY,__NOT_A_PERSON,__RATWARE_0_TZ_DATE,__RCD_RDNS_MX_MESSY,__REMOTE_IMAGE,__REPLYTO_EXISTS,__SANE_MSGID,__SINGLE_WORD_LINE,__SINGLE_WORD_LINE,__SUBJ_2UPPER,__SUBJ_4LOWER,__SUBJ_HAS_WORDS,__SUBJ_NOT_SHORT,__TAG_EXISTS_BODY,__TAG_EXISTS_HEAD,__TAG_EXISTS_HTML,__TAG_EXISTS_META,__TOCC_EXISTS,__TO_NO_ARROWS_R,__TVD_BODY,__TVD_MIME_ATT_TP,__URI_DATA,__URI_DBL_DOM,__URI_MAILTO 
time=1433136576,scantime=0,format=f,reuse=no,set=0


...which is identification of the message in their corpora, and a list of 
all the rules that hit.



I just datamined my three best corpora, from the beginning of
2014 thru this weekend, and found zero FPs, except for two hits
on that "img" test.  My data does NOT prove it's impossible for
anybody else, but it does seem odd, so I'm wondering if the
SA MassCheck mechanism has some means for the contributor to
pull out the corpses of specific hits.


Yes. Given that ID on the first line the corpus owner can find the message 
in question, review it, potentially fix misclassifications (that has 
happened before), etc.


There's one more exclusion I can add that will take out the last of the 
FPs in masscheck.



If it doesn't, that would be a cool feature to add. :)


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The Constitution is a written instrument. As such its meaning does
  not alter. That which it meant when adopted, it means now.
-- U.S. Supreme Court
   SOUTH CAROLINA v. US, 199 U.S. 437, 448 (1905)
---
 9 days until the 229th anniversary of the signing of the U.S. Constitution

Re: Anyone else just blocking the ".top" TLD?

2016-09-08 Thread Lindsay Haisley
On Thu, 2016-09-08 at 13:44 +, Chip M. wrote:
> On Thu, 8 Sep 2016, "lists [at] rhsoft.net" wrote:
> > 
> > i get a diff-output per mail each time the mailserver configs
> > are changing
> That's a completely valid approach, and I am a big fan of
> pre-emptive first strike (only as applied to potentially evil
> email).
> 
> However, the vast majority of those TLDs will never
> "go rogue", so I prefer to block on actual abuse
> (Jason's approach), or likelihood of abuse, specifically, very
> low cost.  Jason appears to have much higher volume than I do,
> so he'd be a good source of data for me and others.

The issue is much more nuanced. There are registrars who offer what's
called "domain name tasting", on newly created TLDs. Under this policy,
a name may be registered and put into service _before_ payment is made
for the registration. At one time Network Solutions had this policy
even for the common TLDs, .com, .org, etc. Spammers pay nothing for the
use of such a name, and discard it for a new one before payment for the
name is required.

One of the choke-points for commercial spammers is the provision of an
authoritative name server for their domain names, and I've found it
very effective to do a recursive sequence of server look-ups on the DN
in the helo or ehelo addresses until a name server is found with a DN
for which the authoritative name server has the same DN. This boils
down to a list of less than 10 domain names. I apply a rather strict
form of rate limiting to messages originating from the same /24 IP
address group if the helo DN gets resolved to a name on this list. This
has so far been 100% effective with no evidence of false positives.

This may be out of the realm of SA. I apply this test using a python
program written to work with Gordon Messmer's courier-pythonfilter for
Courier-MTA.

-- 
Lindsay Haisley   | "We have met the enemy and he is us."
FMP Computer Services |
512-259-1190  |  -- Pogo
http://www.fmp.com|




Re: Anyone else just blocking the ".top" TLD?

2016-09-08 Thread @lbutlr
On 09 Jul 2016, at 08:32, jaso...@mail-central.com wrote:
> 
> Fwiw, atm I block all of the following TLDs

> [big list]

> That list is auto-generated.  Any & all TLDs that have sent > 100 messages 
> within the last year *AND* have a spam/reject rate >= 99% get blocked by TLD, 
> never get past by mail server's 'edge', and don't impose any further load on 
> my server.

That’s a good list, but I take a different approach, I block ALL tlds except 
for a few that I actually get mail from.

(com|net|org|edu|gov|mx|de|dk|uk|us|info|biz|eu|es|il|it|nl|name|jp)

(and I’m not sure about name anymore, I don’t think I get legit mail from that 
anymore.)

Of course, other people will have other lists, but this one works well for me.

.top is the biggest offender though, we get thousands of those.

I should write up an awk script that searches my maillog for all the tlds that 
try to connect. Well, I can throw something together in a 

Here are all the tlds that I’ve seen in the last week (only searching in 
from=<…> not helo):

.ae, .ar, .at, .au, .bd, .be, .bg, .bid, .biz, .bo, .br, .ca, .cc, .ch, .cl, 
.club, .cn, .co, .com, .coop, .cz, .date, .de, .dk, .ec, .edu, .es, .eu, .fi, 
.firewall, .fr, .gdn, .gov, .gr, .hk, .hr, .hu, .id, .ie, .il, .in, .info, .ir, 
.is, .it, .jp, .kh, .kornet, .kr, .lan, .localdomain, .lt, .lv, .ma, .mail, 
.md, .me, .men, .mk, .mobi, .mv, .mx, .my, .name, .net, .ng, .nl, .no, .nz, 
.online, .org, .orgt, .pa, .pe, .pl, .pt, .pw, .ro, .rs, .ru, .se, .sk, 
.stream, .tk, .tn, .top, .tr, .tw, .uk, .us, .vn, .website, .win, .xyz, .za

And this is the list from helo (ignoring all the IPs):

adsl, ae, ao, ar, arpa, au, bd, be, bg, bid, biz, bo, br, c, ca, cc, cl, club, 
cm, cn, co, com, cy, date, de, do, ec, edu, eg, es, eu, fi, firewall, gdn, gh, 
gov, gr, hu, id, il, in, info, internal, io, ir, it, jp, ke, kh, kornet, kr, 
la, lan, local, localdomain, lt, lv, ly, ma, mail, md, me, men, mobi, mv, mx, 
my, name, net, ni, nl, no, np, online, org, orgt, pe, pk, pl, pt, pw, rs, ru, 
sg, sk, so, space, stream, th, tk, top, tr, tv, tw, uk, us, uy, vn, website, 
win, ws, xyz, za, zw

How are people doing spam counts on a tld basis?




Re: Anyone else just blocking the ".top" TLD?

2016-09-08 Thread li...@rhsoft.net



Am 08.09.2016 um 15:44 schrieb Chip M.:

On Thu, 8 Sep 2016, "lists [at] rhsoft.net" wrote:

i get a diff-output per mail each time the mailserver configs
are changing


That's a completely valid approach, and I am a big fan of
pre-emptive first strike (only as applied to potentially evil
email).

However, the vast majority of those TLDs will never
"go rogue", so I prefer to block on actual abuse
(Jason's approach), or likelihood of abuse, specifically, very
low cost.  Jason appears to have much higher volume than I do,
so he'd be a good source of data for me and others.


we require at least SPF or DNSWL for them instead unconditonal reject 
and the reject text contains a link to wikipedia what SPF is


the other part of using that file is to "DUNNO" specific tld's in front 
of the checks and put a final line into helo-restrictions when no DUNNO 
at all matched


/.*\.*/ REJECT Unacceptable HELO (Invalid TLD) see 
https://www.ietf.org/rfc/rfc2821.txt and 
https://www.ietf.org/rfc/rfc1912.txt


 Weitergeleitete Nachricht 
Betreff: Cron /usr/local/bin/update-spamfilter.sh
Datum: Mon, 29 Aug 2016 16:30:03 +0200 (CEST)

UPDATED: /etc/postfix/blacklist_generic_ptr.cf
 1484a1485
 > /\.eco$/ DUNNO
 2375a2377
 > /\.vanguard$/ DUNNO
-
UPDATED: /etc/postfix/blacklist_helo.cf
 382a383
 > /\.eco$/ DUNNO
 1273a1275
 > /\.vanguard$/ DUNNO
-
UPDATED: /etc/postfix/blacklist_tld.cf
 271a272
 > /\.eco$/ REJECT Spam-TLD (SPF Required: .eco - see 
http://en.wikipedia.org/wiki/Sender_Policy_Framework)

 904a906
 > /\.vanguard$/ REJECT Spam-TLD (SPF Required: .vanguard - see 
http://en.wikipedia.org/wiki/Sender_Policy_Framework)

-

OK: /usr/bin/systemctl reload postfix.service



Re: Anyone else just blocking the ".top" TLD?

2016-09-08 Thread Chip M.
On Thu, 8 Sep 2016, "lists [at] rhsoft.net" wrote:
>i get a diff-output per mail each time the mailserver configs
>are changing

That's a completely valid approach, and I am a big fan of
pre-emptive first strike (only as applied to potentially evil
email).

However, the vast majority of those TLDs will never
"go rogue", so I prefer to block on actual abuse
(Jason's approach), or likelihood of abuse, specifically, very
low cost.  Jason appears to have much higher volume than I do,
so he'd be a good source of data for me and others.

IDIC... or to each his/her own preferred approach. :)
- "Chip"




Re: Anyone else just blocking the ".top" TLD?

2016-09-08 Thread li...@rhsoft.net


Am 08.09.2016 um 10:33 schrieb Chip M.:

On Sat, 09 Jul 2016, jasonsu wrote:

Fwiw, atm I block all of the following TLDs

...

men,

..

That list is auto-generated.  Any & all TLDs that have
sent > 100 messages within the last year *AND* have a


Great approach Jason! :)
".men" just recently appeared in my data, and is not showing up
on that Surbl tld page.

Please do share any more that you notice. :)


just download https://data.iana.org/TLD/tlds-alpha-by-domain.txt in a 
cronjob, compare it with the last version and re-generate your configs


i get a diff-output per mail each time the mailserver configs are changing



Re: Anyone else just blocking the ".top" TLD?

2016-09-08 Thread Chip M.
On Sat, 09 Jul 2016, jasonsu wrote:
>Fwiw, atm I block all of the following TLDs
...
>men,
..
>That list is auto-generated.  Any & all TLDs that have 
>sent > 100 messages within the last year *AND* have a 

Great approach Jason! :)
".men" just recently appeared in my data, and is not showing up
on that Surbl tld page.

Please do share any more that you notice. :)

".men" is going for as low as $1.49.
It's only appearing in some of my domains, but is running
between about 8% and 34% of their snowshoe spam.
- "Chip"



drive-by malware customized to the From.RealName of actual Friends

2016-09-08 Thread Chip M.
Spample:
http://puffin.net/software/spam/samples/0043_driveby_from-rn_in_url.txt
I removed 19 (of 20 original) email addresses out of the
To header, ST:TOS munged all remaining email addresses, and
munged the target URL to match the other mungings.
Everything else is exactly as received, immediately post-SA.

This campaign has been going on at a low but steady
rate (typically 0.2% to 0.4% of spam) since at least late May.
It uses very simple and effective social engineering which leads
the victim to a cracked legit-ish site, that redirects to a
drive-by malware site which is controlled by the miscreants.

*** Analysis:
The pattern is that the complete From.RealName is used as the
final subdir in the URL, with an underscore between each word
that was in the RealName.  The original cAsEs are always used
(e.g. "Montgomery Scott" goes to "/Montgomery_Scott/" and
"leonard mccoy" goes to "/leonard_mccoy/").

There's between zero and two trailing "/".
There is always a subhost, except for the earliest instances.
There are no parameters, so the final subdir STANDS OUT well,
looking like a personal/vanity website at a free provider.

All have those "Apple-Mail" boundaries.
They're usually To multiple people (20 being the most common),
but not always (particularly the early ones).
The body text is always brief with a general upbeat tone.
The Subject is almost always "Re:" (except in the beginning).


*** The impressive part is that the From.RN is always that of a
genuine Friend/correspondent, and often (about 64%) the
To.Realname is correct (otherwise it's blank, so it's never
"wrong").
The From.Address is always "wrong"/new/unknown.
The source of the data collection appears to be Yahoo account
cracks.

I've spot checked several of the URLs (using a raw HTTP tool),
and they always 302 to pure javascript booby-trapped pages at a
different domain.  I've substituted other subdir names, which
always 302 to the same (external) URL, so there's nothing 
sophisticated at that end.

The original URL is usually at a legit-ish semi-dormant GoDaddy
hosted domain.  I suspect GoDaddy must have a tool that makes it
easy to create subhosts, plus they're often targeted due to less
sophisticated endusers.  Until recently, most were never listed
on any Domain Blocklist.  Most of the redirects are eventually
taken down, though it often takes a couple of weeks.

Of the drive-by-malware sites I've checked, all have been recent
registrations (presumably by the miscreants), and typically
remain active long after the take downs of the "cracked" sites.

Today, I checked the URL in the spample, and both it and the
drive-by-malware redirect are still "live", in case any of you
would like to investigate further. :)

The very first one I spotted was only "To" me, from an old
friend.  When I saw it, my first reaction was delight and
I genuinely was drawn to visit the link... even though I was
viewing it in quarantine, and quickly spotted lots of Bad Stuff
(Received IPs tour-of-the-world).  It's simple yet VERY effective
social engineering, while being light-weight and so obvious it's
not. :\
I had noticed the pattern before, but had assumed the
Realnames/subdirs were random.  If I hadn't been sent any myself,
I probably would NOT have recognized the effectiveness of the
pattern.

I wrote a batch regression test to find these, not in real-time
but in old data so I could verify the algorithm & datamine.
Unfortunately, I've had some :( Kobayashi Maru scale "schedule
disruptions", so have NOT been able to do much testing other
than my primary Geek domains, and partial testing by one of my
best Volunteers with a highly-IDIC corpus (I'm desperate enough
I'm going to try a hotel, so I can complete this and other
critical testing).

So far, all but one FP occurred when I matched "anywhere" 
(soft match) in the URL, instead of doing a word-boundary match
on the last token.  The signature is always at the very end,
without any parameters, though it would be easy for them to
obfuscate with param(s).  Granted, that would (IMO) reduce the
efficacy of the social engineering. :)

The one exception was a Twitter URL.  Using an existing skip
domain list eliminates that case.  It's still possible to have
other FPs, so a simple match is unlikely to be a Poison Pill
candidate.

Last week, I sent John Hardin some spamples, and he very kindly
wrote & masschecked rules over the long weekend (Geek!). :)
He found a significant FP risk.

Depending on your environment (quarantines rock!), this may be
worth the risk.  The non-Bayes SA killrates for these are running
in the range of 0% to 18%. :(  Even with Bayes, most are getting
thru.  Mine are mostly being killed by Nation-of-IPs, and a few
pre-existing specialty tests (all post-SA).  I have not yet
needed to add custom rules, however I am considering it, due to
the malware risk.


I'm posting this in the hope that someone(s) will nudge GoDaddy
and other cheap hosts to scan for offsite redirects, then test
them.  

Re: spample of "data" URL in well-crafted Phish

2016-09-08 Thread Chip M.
On Sat, 3 Sep 2016, John Hardin wrote:
>I've tweaked the FP avoidance a bit, maybe that will be enough
>to get the S/O up high enough to publish it.

John, do you have any detailed info about the Ham hits?

I just datamined my three best corpora, from the beginning of
2014 thru this weekend, and found zero FPs, except for two hits
on that "img" test.  My data does NOT prove it's impossible for
anybody else, but it does seem odd, so I'm wondering if the
SA MassCheck mechanism has some means for the contributor to
pull out the corpses of specific hits.
If it doesn't, that would be a cool feature to add. :)


On Wed, 31 Aug 2016, Axb wrote:
>IMG src="data  can FP a lot.

AXB,
You are correct.
A few months ago, I had moved that rule in with my other "data"
rules, apparently because they had the token "data" in common.

I dug thru my notes, and the image rule was originally added to
combat a semi-subtle snowshoe campaign sent via Linode (as hosts,
they're much better than the other big-cheap-VPSs, so I've been
resisting scoring their IP blocks, which means that snow sent
thru them is sometimes harder to catch).

When I checked all data for 2014 to now in my three best corpora
(about 840 K-spam), I found that all the image spam hits were in
snow, and were NOT overtly dangerous, whereas all the non-image
"data" stuff has been in well-crafted Phish (UBER-dangerous).

There were exactly two Ham hits, and both were :grind-teeth:
ostensibly legitimate, albeit non-urgent.

Perhaps ironically or merely sadly, one was an 800 Kb monstrosity
of HTML badness (yes, all in one single Part), with several 
images and :cring: fonts inlined via "data" statements.  When I
tried to view it as an HTML page in my raw corpse viewer (using
an old-ish open source HTML rendering engine), it grinded away
for a while then died. :(
Who was the Sender?
Norton.
Yes, THAT Norton.
... and the Subject header was:
"ClubNorton Newsletter: Avoiding Social Engineering Tricks on Social Networks"

I've been scoring my data img rule at about 2.3 so it's well
below Poison Pill, and would not have caused either of those two
Hams to die.  Though I would not have lost sleep over a
Mercy Killing of the "ClubNorton" monstrosity. ;)

Bottom-line:
I strongly recommend a high scoring non-img "data" rule, and
gently recommend a modest scoring img "data" rule.
Everyone's mileage will vary, as always. :)
- "Chip"

P.S. Javascript... I agree 100% with John, while respecting AXB's
right to disagree and choose his own poison. ;)
I'll describe what I'm doing later, in a separate thread.
It's flexible enough to provide good protection, while letting
in all but the self-injurious Ham (e.g. someone at Amazon drank
some of the ClubNorton koolaid).