Jerry Malcolm created JAMES-2923:
------------------------------------

             Summary: Add Option to Log More SpamAssassin Scoring Info to Log 
and Headers
                 Key: JAMES-2923
                 URL: https://issues.apache.org/jira/browse/JAMES-2923
             Project: James Server
          Issue Type: Improvement
          Components: Matchers/Mailets (bundled)
    Affects Versions: master
         Environment: Update existing SpamAssassin mailet and 
SpamAssassinInvoker.java
            Reporter: Jerry Malcolm
             Fix For: master


Problem: SpamAssassin is an incredible tool for controlling spam. But the 
current version of the mailet hands an email to SA where it runs a billion 
rules and comes back with "yup it's spam" or "nope it's clean". That's 
perfectly acceptable for every single email received where we agree with the SA 
result. The problem comes in when a client complains about false positives or 
false negatives. Currently, the only possible response we can give the client 
is "because SpamAssassin said so" with no analysis data. When SA needs tuning, 
I need a fighting chance at seeing how the (possibly incorrect) score was 
derived so I can make adjustments to the SA rules as needed.

Solution Proposal Background: The current implementation of the SpamAssassin 
mailet and SpamAssassinInvoker hardcodes the command to spamd as "CHECK" which 
returns yes or no with a bit of threshold info. Another valid command option to 
spamd is "REPORT". It gives back the same info as "CHECK". But it also returns 
analysis data. Example:

{{SPAMD/1.1 0 EX_OK}}
{{Spam: False ; 3.9 / 5.0}}{{Spam detection software, running on the system 
"p5353013",}}
{{has NOT identified this incoming email as spam. The original}}
{{message has been attached to this so you can view it or label}}
{{similar future email. If you have any questions, see}}
{{webmas...@jwmhosting.com for details.}}{{Content preview: 
======================================== View on Facebook}}
{{ https://www.facebook.com/nd/?Fox32Chic...}}
{{ [...]}}{{Content analysis details: (3.9 points, 5.0 required)}}{{pts rule 
name description}}
{{---- ---------------------- 
--------------------------------------------------}}
{{-2.5 RCVD_IN_HOSTKARMA_W RBL: Sender listed in HOSTKARMA-WHITE}}
{{ [69.171.232.132 listed in hostkarma.junkemailfilter.com]}}
{{-0.0 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2)}}
{{ [69.171.232.132 listed in wl.mailspike.net]}}
{{-0.1 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/,}}
{{ no trust}}
{{ [69.171.232.132 listed in list.dnswl.org]}}
{{ 1.0 CK_HELO_DYNAMIC_SPLIT_IP Relay HELO'd using suspicious hostname}}
{{ (Split IP)}}
{{ 0.0 TVD_RCVD_IP Message was received from an IP address}}
{{-0.0 SPF_HELO_PASS SPF: HELO matches SPF record}}
{{ 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or}}
{{ identical to background}}
{{ 0.0 HTML_MESSAGE BODY: HTML included in message}}
{{ 1.1 KAM_REALLYHUGEIMGSRC RAW: Spam with image tags with ridiculously}}
{{ huge http urls}}
{{ 0.5 JAM_SMALL_FONT_SIZE RAW: Body of mail contains parts with very}}
{{ small font}}
{{ 3.9 HELO_DYNAMIC_IPADDR2 Relay HELO'd using suspicious hostname (IP}}
{{ addr 2)}}
{{ 0.0 UNPARSEABLE_RELAY Informational: message has unparseable relay}}
{{ lines}}

Solution Proposal:

a) Add a new SpamAssassin mailet parameter (in addition to spamdPort, 
spamdHost) in mailetContainer.xml named "spamdCommand". Absence of this 
parameter will default to the current "CHECK" command. The two valid options 
are CHECK and REPORT.

b) Pass the specified spamdCommand (or default) to spamd in Invoker.

c) If spamdCommand is REPORT, add the report data as headers to the email using 
the following procedure: parse full response into a TreeMap using 
"X-SpamAssassin_nnn" as keys. nnn is an incrementing number in case the headers 
get jumbled and/or alphabetized downstream.

d) The input stream from SA must be reset after REPORT processing. So add 
mark()/reset() to BufferedReader to rewind the reader so existing downstream 
processing is not affected. A limit (currently 2500 characters) must be set on 
mark(..). Check to ensure that limit is not exceeded during REPORT processing. 
If it gets close to the limit, stop and add a "more..." header and exit.

e) Pass the reportData TreeMap to SpamAssassinResult on both empty(..) and 
build(..) methods. SpamAssassinResult will walk the TreeMap data and add to 
HeadersPerRecipient in the same way existing processing adds the Spam flags. 
SpamAssassinResult will also log the TreeMap header data. Why on 'empty()'? 
empty() is called if the parser can't find a specific key in the string. There 
still may be some kind of error output even if the key is missing that could be 
very useful in determining why the expected key wasn't returned. So I recommend 
we dump what we got back from SA no matter what.

f) The final step is to process the HeadersPerRecipient data into real headers. 
It turns out this function is missing downstream and a defect report has been 
opened by Tellier. So when the code is available that processes 
HeadersPerRecipients into actual headers is available, no additional work is 
required.

g) Currently if REPORT, the reportData automatically goes both to the log and 
to headers. Not currently in my implementation, but we could add one more 
mailet parm "reportAsHeaders= true/false" so they could still get the report in 
the logs but not as headers.

Summary: Admin can change the mailet parameter command to REPORT. When a user 
reports false positive or false negative on an email. Open the email in 
Thunderbird, hit Ctrl-U to view raw source/headers, and immediately see the 
scoring details from SpamAssassin. Obviously the next step for the admin would 
be to do something in SA to alter the scoring. But that's beyond the scope of 
this proposal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to