Re: Slipping through the cracks

2020-06-19 Thread John Hardin

On Fri, 19 Jun 2020, micah anderson wrote:


John Hardin  writes:


On Fri, 19 Jun 2020, micah anderson wrote:


So, what can I do to tweak these rules to score things up more,
specifically the rules that provide a low false positive rate[1]. This
seems something that should be done programmatically, and not
manually. It seems like what 'masscheck' maybe does generically for all
rules for all installations, but can I use that to just adjust our rules
for our particular breed of spam that comes through?


How about: analyze your spamtrap for recent source IP addresses on a
quick schedule (hourly?) and drive a local DNSBL from IPs seen more than
2-3 times in the last 24-48 hours?


Interesting possibility... but if I look at the current batch that made
it through, I see:

1. amazon aws
2. gmail (amusingly saying my amazon prime membership is going to
expire)
3. mailchimp
4. yahoo.com

all of those would not be good to block :(


Amazon AWS if not using a "real" (non-AWS) domain name might be safe to 
reject - there's been some discussion about that on the list lately.



Its not always like that, but it does happen.


Hm. Perhaps you'd need whitelists too, to avoid some known mixed sources.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  I’ve seen firsthand how an ideological hatred of guns and the
  people who own them is more important to some people than the
  actual goal of saving lives.
 -- Dan Gross, former president of the Brady Campaign
---
 138 days until the Presidential Election

Re: Slipping through the cracks

2020-06-19 Thread Martin Gregorie
On Fri, 2020-06-19 at 13:54 -0400, micah anderson wrote:

> 2. gmail (amusingly saying my amazon prime membership is going to
> expire)
>
That would make an obvious local rule if you're continuing to see
messages like that since a Prime expiry notice thats NOT from Amazon is
unlikely to be valid:

Score 5+ if:
 - body or subject mention amazon prime 
and
 - sender and/or Message-ID do not contain a valid Amazon host name.

Remember to keep 2-3 example messages for testing your new rule before
you adding it to your live system.

Martin





Re: Slipping through the cracks

2020-06-19 Thread micah anderson
John Hardin  writes:

> On Fri, 19 Jun 2020, micah anderson wrote:
>
>> So, what can I do to tweak these rules to score things up more,
>> specifically the rules that provide a low false positive rate[1]. This
>> seems something that should be done programmatically, and not
>> manually. It seems like what 'masscheck' maybe does generically for all
>> rules for all installations, but can I use that to just adjust our rules
>> for our particular breed of spam that comes through?
>
> How about: analyze your spamtrap for recent source IP addresses on a 
> quick schedule (hourly?) and drive a local DNSBL from IPs seen more than 
> 2-3 times in the last 24-48 hours?

Interesting possibility... but if I look at the current batch that made
it through, I see:

1. amazon aws
2. gmail (amusingly saying my amazon prime membership is going to
expire)
3. mailchimp
4. yahoo.com

all of those would not be good to block :(

Its not always like that, but it does happen.

-- 
micah


Re: Slipping through the cracks

2020-06-19 Thread John Hardin

On Fri, 19 Jun 2020, micah anderson wrote:


So, what can I do to tweak these rules to score things up more,
specifically the rules that provide a low false positive rate[1]. This
seems something that should be done programmatically, and not
manually. It seems like what 'masscheck' maybe does generically for all
rules for all installations, but can I use that to just adjust our rules
for our particular breed of spam that comes through?


How about: analyze your spamtrap for recent source IP addresses on a 
quick schedule (hourly?) and drive a local DNSBL from IPs seen more than 
2-3 times in the last 24-48 hours?


Potentially relax it a bit by collecting on /30 or /28 netblocks instead 
of individual /32 IP addresses.



--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Britain used to be the most powerful empire in the world.
  Now they're terrified of pocketknives.
  How the mighty have fallen.   -- Matt Walsh
---
 138 days until the Presidential Election


Re: How to write a rule to block phishing?

2020-06-19 Thread John Hardin

On Fri, 19 Jun 2020, Daryl Rose wrote:


I thought that a 5 was an average number and lowering it improves spam
hits, I may end up getting legitimate emails flagged as spam but I can add
the address to a whitefrom_list.  I read that in more than one location.

I believe that I have the required score set to 2.0 or 2.5, or somewhere
around that.  I'm not able to look at this moment.   But you're saying that
if I change it back to the default score of 5, then I'll catch more spam?


All of the base repo rule scores are assigned with the assumption that 
spams should score 5.0 points.


If you change the local spam threshold to less than 5 without also broadly 
adjusting the rule scores, more messages will hit - this may potentially 
tag more spams (i.e. lower FN rate), but it *will* also tag more hams 
(i.e. higher FP rate), which is generally considered worse than some 
spams leaking through.


If you change the local spam threshold to more than 5 without also broadly 
adjusting the rule scores, fewer messages will hit. This will tag fewer 
spams (i.e. higher FN rate) but will tag fewer hams as well (lower FP 
rate).


Generally if a given type of spam isn't scoring enough to be tagged as 
spam, you want to:


(1) make sure bayes is classifying it as "spammy" (e.g. BAYES_99) - train 
bayes using that message if it's not, and if it's being classified as 
"hammy" (e.g. BAYES < 30) then review your overall training, your bayes 
database is probably mistrained.


(2) try to find some common feature in the spams and develop a local rule 
to detect that feature. This can be difficult. If it works, suggest it 
here and we may add it to the base rules so that everyone benefits.


Note that there may be rules in the repo that detect that, but if that 
feature isn't fairly common in spam that makes it into the masscheck repos 
then the rule might not be performing well enough to be promoted for 
publication.


Adjusting the spam threshold is generally only something you do if you 
really understand what's going on.



--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Britain used to be the most powerful empire in the world.
  Now they're terrified of pocketknives.
  How the mighty have fallen.   -- Matt Walsh
---
 138 days until the Presidential Election


Slipping through the cracks

2020-06-19 Thread micah anderson


Hi folks,

I've spent a lot of time tuning our spamassassin setup over the
years. Channels, RBLs, pyzor, DCC, bayes, KAM rules, some home spun
rules, etc... and things do work fairly well, the rate is very high ,
but the ones that get through are the ones that are designed to get
around the defenses before they are shutdown. I get the feeling the
scores from many rules are too low, and I'm looking for the right way to
move forward.

The reason I say this is because I've got a spamtrap account, which is
comprised of several addresses that are heavily targeted by spam lists,
and these accounts seem to get the fast flux, rapid zone updates and ip
reputation burns (and other techniques) that are used to do initial spam
flooding before they are picked up by things. Once pyzor, dcc, and the
RBLs pick these up, they are usually scored high enough to get flagged
for everyone else, but without the RBLs, the scoring is too low to meet
that[0]. Of course I "learn" these messages when they come in.

I've been trying to analyze which are the techniques they use to try and
come up with rules that will stop them, but so far they are hard to come
up with something manually. i've taken several of these that got through
and later, after a day, checked them with network tests, and they are
all scored very high by the various lists, fuzzers, and checksums. Often
you will see these don't even hit rbls... but the ones that do, aren't
hitting enough of them to catch them... however usually, if an rbl is
hit, then it gets marked as spam, as most of the times several of the
RBLs all fire at once... but if they are not on rbls, they don't get
flagged as spam by the regular rules.

So, what can I do to tweak these rules to score things up more,
specifically the rules that provide a low false positive rate[1]. This
seems something that should be done programmatically, and not
manually. It seems like what 'masscheck' maybe does generically for all
rules for all installations, but can I use that to just adjust our rules
for our particular breed of spam that comes through?

Thanks for any ideas,
micah


0. with some notable exceptions, like KAM_DMARC_REJECT and
HELO_DYNAMIC_SPLIT_IP

1. like KAM_DMARC_STATUS, HTML_NO_CHARSET are possible ones, or mails
that do not have a To: have a score of 0.1

-- 
micah


Re: How to write a rule to block phishing?

2020-06-19 Thread LuKreme
On Jun 19, 2020, at 06:06, Daryl Rose  wrote:
> I thought that a 5 was an average number and lowering it improves spam hits, 
> I may end up getting legitimate emails flagged as spam but I can add the 
> address to a whitefrom_list.  I read that in more than one location.  
> 
> I believe that I have the required score set to 2.0 or 2.5, or somewhere 
> around that.  I'm not able to look at this moment.   But you're saying that 
> if I change it back to the default score of 5, then I'll catch more spam?

You said a message scored 5 and was not classified as spam. The only way this 
happens is if you INCREASE the score from 5.0 to a higher number.

Setting your score to 2 will mark a huge amount of perfectly legitimate email 
as spam, but that is not what you described above.



Re: How to write a rule to block phishing?

2020-06-19 Thread Daryl Rose
I thought that a 5 was an average number and lowering it improves spam
hits, I may end up getting legitimate emails flagged as spam but I can add
the address to a whitefrom_list.  I read that in more than one location.

I believe that I have the required score set to 2.0 or 2.5, or somewhere
around that.  I'm not able to look at this moment.   But you're saying that
if I change it back to the default score of 5, then I'll catch more spam?

Thanks

Daryl

On Thu, Jun 18, 2020 at 11:02 AM @lbutlr  wrote:

> On 15 Jun 2020, at 17:18, Daryl Rose  wrote:
> > I analyzed the headers, the message comes from a server here in the
> United States, the spam score is 5, and Spamassassian says "No Spam".
>
> SpamAssassin thinks the mail is spam if it scored 5. Someone (you?) has
> changed the default spam score from 5.0 to some other number.
>
> Doing this will result in spam being marked as not spam.
>
>
>
>
> --
> The whole thing that makes a mathematician's life worthwhile is that
> he gets the grudging admiration of three or four colleagues
>
>
>