http://bugzilla.spamassassin.org/show_bug.cgi?id=3271

           Summary: new MIME parser FPs much more often on Mailman admin
                    messages
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: major
          Priority: P5
         Component: Libraries
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


Mailman 2.1.x has a (nifty) new feature.  When a list is set to require admin
approval for non-members to post, it'll send the moderation-required message in
this format:

From: [EMAIL PROTECTED]
Subject: blah post from [EMAIL PROTECTED] requires approval
Content-type: multipart/mixed ...

The multipart/mixed parts are:

   [text/plain]: a brief "please authorize this posting" msg
   [message/rfc822]: the original message
   [message/rfc822]: an approval message suitable for use as response

This is great for list moderation to fend off spam.

Now, the problem is -- in 2.63 this was fine, and got through no problem,
presumably because of limitations in the 2.6x MIME parser.  However, I've *just*
installed 3.0.0svn on my server for dogfooding, and it doesn't handle them at
all well; every single 'requires approval' message that related to a spam has
been caught as spam.

It looks like the new MIME parser is descending into the message/rfc822 part. 
Here's the rules hit from one msg:

X-spam-report: 
        *  0.2 NO_REAL_NAME From: does not include a real name
        *  1.0 HTML_OBFUSCATE_20_30 BODY: Message is 20% to 30% HTML obfuscation
        *  0.0 HTML_10_20 BODY: Message is 10% to 20% HTML
        *  1.2 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html MIME
        *  1.0 HTML_BADTAG_40_50 BODY: HTML message is 40% to 50% bad tags
        * -0.0 BAYES_44 BODY: Bayesian spam probability is 44 to 50%
        *      [score: 0.5000]
        *  3.0 MPART_ALT_DIFF BODY: HTML and text parts are different
        *  1.0 HTML_NONELEMENT_60_70 BODY: 60% to 70% of HTML elements are 
non-standard
        *  0.1 HTML_MESSAGE BODY: HTML included in message
        *  0.6 MIME_HTML_NO_CHARSET RAW: Message text in HTML without charset
        *  1.0 URIBL_SBL Contains a URL listed in the SBL blocklist
        *      [URIs: monnsid.com]
        *  1.0 LONGWORDS Long string of long words
        * -1.8 AWL AWL: From: address is in the auto white-list
X-spam-status: Yes, score=8.2 required=5.0 tests=AWL,BAYES_44,HTML_10_20,
        HTML_BADTAG_40_50,HTML_MESSAGE,HTML_NONELEMENT_60_70,
        HTML_OBFUSCATE_20_30,LONGWORDS,MIME_HTML_MOSTLY,MIME_HTML_NO_CHARSET,
        MPART_ALT_DIFF,NO_REAL_NAME,URIBL_SBL autolearn=no version=3.0.0-r9952

(msg attached)

I've manually whitelisted my list admin addresses to work around this, but I do
get a stack of spam directly to those addrs as well, so that's nonoptimal,
kludgy, requires user configuration, therefore not good.

IMO it'd be better to just not descend into message/rfc822 parts.  After all,
*WE* use message/rfc822 as a "safe" encapsulation format, ourselves!



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to