Proposal: SpamAssassin Mailet enhancements

Jerry Malcolm Mon, 07 Oct 2019 12:05:21 -0700

As a bit of background for this proposal, I personally have no problemtrading a small overhead of some additional headers in a delivered emailif I can see the serviceability info I need directly in the email in 2seconds vs scanning/filtering through massive logs (assuming the logswere even set to 'debug' level when the mail came through). I realizethat other admins may or may not be willing to make this tradeoff. Sothe new function in this proposal is completely optional to the admin. Not enabling it will result in no changes for the user.

Problem: SpamAssassin is an incredible tool for controlling spam. Butthe current version of the mailet hands an email to SA where it runs abillion rules and comes back with "yup it's spam" or "nope it'sclean". That's perfectly acceptable for every single email receivedwhere we agree with the SA result. The problem comes in when a clientcomplains about false positives or false negatives. Currently, the onlypossible response we can give the client is "because SpamAssassin saidso" with no analysis data. When SA needs tuning, I need a fightingchance at seeing how the (possibly incorrect) score was derived so I canmake adjustments to the SA rules as needed.

Solution Proposal Background: The current implementation of theSpamAssassin mailet and SpamAssassinInvoker hardcodes the command tospamd as "CHECK" which returns yes or no with a bit of threshold info. Another valid command option to spamd is "REPORT". It gives back thesame info as "CHECK". But it also returns analysis data. Example:


SPAMD/1.1 0 EX_OK
Spam: False ; 3.9 / 5.0

Spam detection software, running on the system "p5353013",
has NOT identified this incoming email as spam.  The original
message has been attached to this so you can view it or label
similar future email.  If you have any questions, see
webmas...@jwmhosting.com for details.

Content preview:  ======================================== View on Facebook
  https://www.facebook.com/nd/?Fox32Chic...
   [...]

Content analysis details:   (3.9 points, 5.0 required)

 pts rule name              description

---- ------------------------------------------------------------------------

-2.5 RCVD_IN_HOSTKARMA_W    RBL: Sender listed in HOSTKARMA-WHITE

[69.171.232.132 listed inhostkarma.junkemailfilter.com]

-0.0 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
                            [69.171.232.132 listed in wl.mailspike.net]
-0.1 RCVD_IN_DNSWL_NONE     RBL: Sender listed at https://www.dnswl.org/,
                             no trust
                            [69.171.232.132 listed in list.dnswl.org]
 1.0 CK_HELO_DYNAMIC_SPLIT_IP Relay HELO'd using suspicious hostname
                            (Split IP)
 0.0 TVD_RCVD_IP            Message was received from an IP address
-0.0 SPF_HELO_PASS          SPF: HELO matches SPF record
 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or
                            identical to background
 0.0 HTML_MESSAGE           BODY: HTML included in message
 1.1 KAM_REALLYHUGEIMGSRC   RAW: Spam with image tags with ridiculously
                             huge http urls
 0.5 JAM_SMALL_FONT_SIZE    RAW: Body of mail contains parts with very
                            small font
 3.9 HELO_DYNAMIC_IPADDR2   Relay HELO'd using suspicious hostname (IP
                            addr 2)
 0.0 UNPARSEABLE_RELAY      Informational: message has unparseable relay
                            lines

Solution Proposal:

a) Add a new SpamAssassin mailet parameter (in addition to spamdPort,spamdHost) in mailetContainer.xml named "spamdCommand". Absence of thisparameter will default to the current "CHECK" command. The two validoptions are CHECK and REPORT.


b) Pass the specified spamdCommand (or default) to spamd in Invoker.

c) If spamdCommand is REPORT, add the report data as headers to theemail using the following procedure: parse full response into a TreeMapusing "X-SpamAssassin_nnn" as keys. nnn is an incrementing number incase the headers get jumbled and/or alphabetized downstream.

d) The input stream from SA must be reset after REPORT processing. Soadd mark()/reset() to BufferedReader to rewind the reader so existingdownstream processing is not affected. A limit (currently 2500characters) must be set on mark(..). Check to ensure that limit is notexceeded during REPORT processing. If it gets close to the limit, stopand add a "more..." header and exit.

e) Pass the reportData TreeMap to SpamAssassinResult on both empty(..)and build(..) methods. SpamAssassinResult will walk the TreeMap dataand add to HeadersPerRecipient in the same way existing processing addsthe Spam flags. SpamAssassinResult will also log the TreeMap headerdata. Why on 'empty()'? empty() is called if the parser can't find aspecific key in the string. There still may be some kind of error outputeven if the key is missing that could be very useful in determining whythe expected key wasn't returned. So I recommend we dump what we gotback from SA no matter what.

f) The final step is to process the HeadersPerRecipient data into realheaders. It turns out this function is missing downstream and a defectreport has been opened by Tellier. So when the code is available thatprocesses HeadersPerRecipients into actual headers is available, noadditional work is required.

g) Currently if REPORT, the reportData automatically goes both to thelog and to headers. Not currently in my implementation, but we couldadd one more mailet parm "reportAsHeaders= true/false" so they couldstill get the report in the logs but not as headers.

Summary: Admin can change the mailet parameter command to REPORT. Whena user reports false positive or false negative on an email. Open theemail in Thunderbird, hit Ctrl-U to view raw source/headers, andimmediately see the scoring details from SpamAssassin. Obviously thenext step for the admin would be to do something in SA to alter thescoring. But that's beyond the scope of this proposal.

I currently have this function coded and tested (currently with a hackto get around the bug in (f) above. I have had an old implementationrunning in v3b5 for several years. I can give personal testimony to thehours it has saved me.


Comments/Suggestions welcome

Jerry


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Proposal: SpamAssassin Mailet enhancements

Reply via email to