Freshly caught Spample: http://puffin.net/software/spam/samples/0042_data_embedded_phish.txt The only munging was inserting ".EXAMPLE" between "wellsfargo" and ".com".
Four years ago, I read this fascinating article: http://isc.sans.edu/diary/%22Data%22+URLs+used+for+in-URL+phishing/13996 and promptly added a simple word test to score these. At the time, I had no idea whether these occurred in Ham (seemed unlikely, but some Hammers are stunningly "thick" (*cough, iframe, un-cough*)). Since then, I've seen a steady, very low volume of spam hits, with zero Ham hits (volume of about a quarter million emails per month). Yes, "ZERO" Ham. :) Most of them have followed the same pattern that's in this spample: The MIME encoded "data" URL decodes to a classic Phish page. Inside that, there's usually a small encoded bit of javascript, typically starting with: document.write(unescape(' In this case, it decodes to (target URL munged/replaced): <form action="http://EXAMPLE.COM/wp-content/uploads/vrr.php" class="button" method="post" name="submit" id="submit">' I just did a raw HTTP GET on the actual final URL, and it returned a 302 with a Location of (genuine) Wellsfargo, with a parameter starting with: /login?ERROR_CODE= followed by a 36-character-long code. I did another raw GET of that Location, and it returned a 302 with a Location of WF's plain URL (no parameters), and the document body was a terse, semi-"offshore"-speak: This document you requested has moved temporarily. It appears someone reported it to WF, who successfully did a take down, but instead of providing a pedagogic page that explained that the victim would have been toast, they chose just to passively track accesses. :( ** Mitigation: The easiest way to catch these is with a simple body word match. Here's the exact matches I am currently using (some of them are recent additions, listed in date of addition order): href="data: href='data: http://data: data:text/html;base64 <img src="data: hta:application *** Do any of you HTML gurus have additional suggestions? :) I also recommend at least medium scoring: http-equiv="refresh" which typically occurs in these, and many unrelated campaigns. I have been thoroughly tempted to do the Klingon Coding Thing and de-MIME these in real-time, then further decode the javascript to get the URL... but the volume is so low, my anti-Phish system is also nailing these, and sometimes a blaster really is a more suitable weapon than a lightsaber. ;) - "Chip"