Re: Dealing with links to malicious documents

2018-03-14 Thread Markus Clardy
I had created a plugin to read the headers (just make a HEAD request) of
all URIs in an email, and then you can make tests based off them. An
example of this would be to look for specific mimetypes at the end of the
link, so for example, see if the mimetype is Application/msword.

You can find it here: https://github.com/m50/spamassassin_uriheader

I will warn you, this may not necessarily be a great idea, as you may get
abuse notifications. I also need to make a few changes to it, but I haven't
touched it in a while.

First thing I want to do is whitelist certain domains from being header
checked.

Secondly, is replace the User Agent with one from a legitimate browser
(spoofing a browser), just in case.


But if you are interested in it, you can take a look.

On Tue, Mar 13, 2018 at 10:03 PM, John Hardin  wrote:

> On Tue, 13 Mar 2018, Bill Cole wrote:
>
> On 13 Mar 2018, at 14:21 (-0400), John Hardin wrote:
>>
>> d) Don't accept emails from outside your organization that link to hosted
>>> documents. The document needs to be attached, so that it can be scanned.
>>> Unfortunately this is not feasible if you're not a (at least
>>> semi-)monolithic organization where you can apply such policies.
>>>
>>
>> Also not feasible if any users subscribe to this list or most technical
>> discussion mailing lists. For example, here you are likely to get links
>> into the SA Wiki or to KAM's rules. On the Postfix list it is a rare week
>> that does not have multiple links to the DEBUG_README file posted.
>>
>
> I don't count a plain text file as a "document" in this context.
>
> The example provided was apparently to a directory (URL ending in '/') but
>> redirected to a .doc.
>>
>
> This of course is the weakness with that option.
>
>
> --
>  John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
>  jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
>  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
> ---
>   The problem is when people look at Yahoo, slashdot, or groklaw and
>   jump from obvious and correct observations like "Oh my God, this
>   place is teeming with utter morons" to incorrect conclusions like
>   "there's nothing of value here".-- Al Petrofsky, in Y! SCOX
>
> ---
>  Tomorrow: Albert Einstein's 139th Birthday
>



-- 
 - Markus


Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)

2018-03-14 Thread Rob McEwen

On 2/20/2018 9:42 PM, Rob McEwen wrote:
Google might easily start putting captchas in the way or otherwise 
consider such lookups to be abusive and/or mistake them for malicious 
bots...


This prediction turned out to be 100% true. Even though others have 
mentioned that they have been able to do high-volume lookups with no 
problems... And granted I wasn't implementing a multi-server or multi-ip 
lookup strategy... But I don't think I was doing nearly as many lookups 
as others have claimed that they were able to do. I took a batch of 
55,000 spams that I had collected from the past 4 weeks where those 
spams were maliciously using the Google shortener as a way to get their 
spam delivered via hiding their spammy domain names from spam filters. I 
started checking those by looking up the redirect from Google's 
redirector, but without actually visiting the site that the redirector 
was pointing to. Please note that I was doing the lookups one-at-a-time, 
not starting the next lookup until the last lookup had completed. After 
about ONLY 1,400 lookups, ALL of my following lookups started hitting 
captchas. See attached screenshot. Also, other than not sending from 
multiple IPs, I was otherwise doing everything correct to make my script 
look/act like a regular browser.


I'll try spreading it out between multiple IPs in order to try to avoid 
rate limits... However... This is still cause for concern about 
high-volume lookups in high production systems... those may have to be 
implemented a little more carefully if they're going to do these kind of 
lookups!


Just because small or medium production systems are able to do this... 
Or just because somebody went out of their way to get more sophisticated 
with it to get it to work out... doesn't mean that it's going to work in 
high production systems that are trying to use "canned" software or 
plugins. This is a particular challenge for anti-spam blacklists because 
they typically process a very high volume of spams. Hopefully, the 
randomness of the ones I process as they come in... will be sufficiently 
spread out enough to avoid rate limiting?


It was my hope to start processing these live with my own DNSBL engine, 
so that I could start blacklisting the domains that they redirect to... 
In those cases where they were not already blacklisted... Now I'm going 
to have to deal with constantly trying to make sure that I'm not hitting 
this captcha, along with implementing some other strategies to hopefully 
prevent that.


But this brings up a whole other issue... That is more of a policy or 
legal issue... is Google basically making a statement that automated 
lookups are not welcome? Or are considered abusive?


(btw, I could have collected order of magnitudes more than 55,000 of 
THESE types of spams, but this was merely what was left over in an 
after-the-fact search of my archives, after a lot of otherwise redundant 
spams had already been purged from my system.)


PS - Once I gather this information, I will submit more details about 
the results of this testing. But what is shocking right now is that less 
than four tenths of 1% of these redirect URLs has been terminated, even 
though they average two weeks old, with some almost a month old.


--
Rob McEwen
https://www.invaluement.com
+1 (478) 475-9032