Jason Haar wrote: >They aren't triggering (enough) network rule matches, contain a >bayes-killer, and even FuzzyOCR can't manage the swirly image trick >they pull. Has anyone come up with a way to fight these?
Jason, thanks for the cheerful Subject. I needed that today. :) I'm catching all of these, with decent scores (15+). Here's a few easy things you might score on (up to about 2.5 each): 1. non-huge image which does _NOT_ have an HTML part (this will also help with the "lonely girl" spams; it's highly unusual for images to be attached to pure text emails; usually only Nerds send pure text, and our most typical image attachment is a GIF/PNG screenshot, or a somewhat large JPEG) 2. metas for images that have hit any reliable blocklist (I have found Barracuda very helpful - it definitely has a high FP rate, so score low if you don't have a decent false positives pipeline) 3. botnet test 4. metas for images sent from/thru "unusual" nations These may not be as easy, however may be of :) interest to our resident developers: 5. all of these have a real name in the From header, with most being a single word, which is very unusual (note also that _NONE_ have a real name in the To header, which I do score, but that has a high FP rate so I can not recommend it unless you have a solid FP pipeline) 6. size of the JPEG header (this may be easy to add to ImageInfo) I just noticed #6 now, after dumping some image properties for wavy vs non-wavy spam images, and was surprised by it. It never occurred to me to export file hdr size - by now, I :) should have KNOWN better, and should have added export of ALL properties to my image properties test last time this sort of thing happened. I'll fix that next version. :) Here's the properties of my last few days of "wavy" images: 1 MP#2(jpeg): Area=100804 Density=9.85 bytes=10854(hdr:623,dat:10231) (319x316) 1 MP#2(jpeg): Area=103152 Density=9.32 bytes=11688(hdr:623,dat:11065) (336x307) 1 MP#2(jpeg): Area=103206 Density=5.05 bytes=21045(hdr:623,dat:20422) (309x334) 1 MP#2(jpeg): Area=104304 Density=5.33 bytes=20176(hdr:623,dat:19553) (318x328) 1 MP#2(jpeg): Area=107584 Density=5.58 bytes=19896(hdr:623,dat:19273) (328x328) 1 MP#2(jpeg): Area=108072 Density=9.51 bytes=11982(hdr:623,dat:11359) (342x316) 1 MP#2(jpeg): Area=109472 Density=5.24 bytes=21501(hdr:623,dat:20878) (352x311) 1 MP#2(jpeg): Area= 81104 Density=4.40 bytes=19067(hdr:623,dat:18444) (296x274) 1 MP#2(jpeg): Area= 87809 Density=5.69 bytes=16064(hdr:623,dat:15441) (317x277) 1 MP#2(jpeg): Area= 95142 Density=5.41 bytes=18223(hdr:623,dat:17600) (303x314) 1 MP#2(jpeg): Area= 97148 Density=4.96 bytes=20208(hdr:623,dat:19585) (326x298) The interesting column is "hdr:623". If you're using ImageInfo, the other numbers are useful for limiting your metas to the total size range typical of these. The first column is the number of occurrences. Here's the properties of all NON-wavy spam images from the same period: 3 MP#2(jpeg): Area=115062 Density= 4.36 bytes=27110(hdr:735,dat:26375) (254x453) 1 MP#2(jpeg): Area=120300 Density= 6.40 bytes=19185(hdr:387,dat:18798) (300x401) 2 MP#2(jpeg): Area=166410 Density=11.62 bytes=14700(hdr:383,dat:14317) (430x387) 1 MP#2(jpeg): Area=166704 Density= 8.55 bytes=19891(hdr:398,dat:19493) (453x368) 1 MP#2(jpeg): Area=197735 Density=13.10 bytes=15476(hdr:380,dat:15096) (355x557) 1 MP#2(jpeg): Area=240800 Density=14.59 bytes=16901(hdr:392,dat:16509) (700x344) 1 MP#3(jpeg): Area=197735 Density=13.10 bytes=15476(hdr:380,dat:15096) (355x557) 17 MP#3(jpeg): Area=239500 Density= 5.53 bytes=43685(hdr:406,dat:43279) (479x500) I dumped the last month's worth of ham image properties from my most diverse domain, and did find a handful which had that same hdr size ("623"), however they all had vastly different areas and/or occurred with multiple images. I'll check a few more domains and months' worth, before using that for real. I expect to score this in the 2 to 3 range. Mike Cardwell wrote: >Presently it renders them as plain text. I'm fully aware of the >potential problems with it. Ideally I'd like to be able to render >those parts as HTML, but I need to be 100% sure that I've stripped >out anything dangerous (including embedded remote content by >default) first. It's on the "ToDo List" page. Nice job Mike! :) I wrestled with that same issue when I added direct viewing of HTML content to my offline analysis/FP-pipeline/MassChecks tool. Originally, I was using an ActiveX wrapper around IE, which (of course) made me nervous. I added some VERY simple, crude tag stripping (script, iframe, style), but was never happy with it. I ended up switching to an open source HTML rendering component which :) lacked support for all the scary stuff. Whatever you decide to do, please do post more about it, and q'pla! >I'm also aware of the issues surrounding people potentially >uploading images and then linking to them from spam websites or >spam. That's why I've put http referer restrictions in place. Perhaps redirecting to an image saying something like "this is spam"? :) What about requiring registration? Yes, it's not enough to stop the most determined, but will whittle it down to the least stupid. - "Chip"