http://bugzilla.spamassassin.org/show_bug.cgi?id=2878

           Summary: Identify when plain text and HTML are different in
                    multipart/alternative
           Product: Spamassassin
           Version: 2.61
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: Rules
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


Recently, I have received a lot of spam with the multipart/alternative MIME
type. There are some random words in the plain/text version and some other
random words in the HTML version, the information is mainly contained in an
image which is linked. 

RFC 1521 (IIRC) says that the contents of parts in multipart/alternative should
be essentially the same, so it should be a pretty good rule if it was possible
to compare the contents of the plain text and HTML versions to see if the same
words can be found in each. Comments can be ignored, and the words can be
compared. I don't know what kind of algorithms will be used, but surely
something exists for the purpose of comparing texts...?

I'm getting the same spam as in bug #2875, but I'll include a bit more of the
most relevant stuff:

----ALT--TCEF13321957421304
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit

swab companionway bagpipe elephant cucumber regal 
birmingham shuck soothe plethora arrogate phenolic lieu zombie 
cherub denote leland urania basket blight fairfield eat conqueror imposture 

----ALT--TCEF13321957421304
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 8bit

<HTML><HEAD>
<BODY>
<p>Fr</battlefront>ee Ca</courtyard>bleTV!N</histamine>o mo</bovine>re
p</consumptive>ay!&</p>
<a href="http://www.2004hosting.net/cable/";>
<img border="0" src="http://www.2004hosting.net/fiter3.jpg";></a>
nature borealis chastity cow debra checkpoint ascribe deferring tabulate
marketeer lob eaton sophistry blockade eyepiece benthic exhibit oatmeal bacon
keen buckwheat champagne turtleback intoxicant defunct crewcut <BR>


Also quite common, and even easier to catch are cases where text/plain is empty.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to