I am trying to figure out why almost all spam continues to get through. I use Fedora 8, Evolution 2.12.3, and spamassassin 3.2.4
I have marked as junk, respectively as non-junk, more than 100 mails of each kind. Probably more than 200 by now. I have saved to a file the source of one typical example spam. This mail contains sequences like <span style="FONT-SIZE: 2px; FLOAT: right; COLOR: white"> rqz </span> embedded in the middle of "sensitive" words. That makes the word look like spa massa ssin (substitute your favorite merchandise). The sequence above selects white letters on a white background, and in addition, makes the letters rather small, two pixels high. In this way the words that would otherwise trigger a filter rule, get split and the pieces are separated by other words or letter combinations; yet those other words do not show up on the screen. Googling around I found a list of Spamassassin tests, including Area tested: body Description: HTML font color similar to background Test name: HTML_FONT_LOW_CONTRAST Default score: local: 0.131 net: 0.543 bayes: 0.663 bayes + net: 0.124 (I do not understand these scores. Why are they different? When do they apply - eg. does the 'local' value apply if I run "spamassassin --local"? But if so, why is a low font contrast less significant when --local is used? etc.) There was also another test named HTML_FONT_INVISIBLE, but I later found this test appears to be assiociated with earlier versions of spamassassin. Since Evolution runs "spamc --local", I tried "spamassassin --local" and looked at the output. Here is one: X-Spam-Status: No, score=3.4 required=5.0 tests=AWL,DATE_IN_PAST_24_48, HS_INDEX_PARAM,HTML_MESSAGE,RDNS_NONE autolearn=no version=3.2.4 There is no indication of the low-contrast rule having been triggered. Should this be so? Is this header supposed to show all tests with non- zero scores? How can I have spamassassin give me a complete list of tests with nonzero scores? I added lines to my .spamassassin/user_prefs score HTML_FONT_INVISIBLE 9.99 score HTML_FONT_LOW_CONTRAST 9.99 but could not see any change. Then I tried to look at the source code. I found a function "html_font_invisible", which starts by computing the foreground and background colors. I inserted an extra line of code to have the function log its determinations. Here is some of the output: backgroud:#ffffff foreground:#000000 backgroud:#ffffff foreground:#ffffff backgroud:#ffffff foreground:#000000 backgroud:#ffffff foreground:#ffffff backgroud:#ffffff foreground:#000000 backgroud:#ffffff foreground:#ffffff backgroud:#ffffff foreground:#000000 backgroud:#ffffff foreground:#000000 backgroud:#ffffff foreground:#000000 backgroud:#ffffff foreground:#ffffff backgroud:#ffffff foreground:#000000 That is, the function assumes the background is white, and correctly finds that the text color is sometimes black, sometimes white. This shows that Spamassassin does run that code, and does correctly determine that some of the text has the same color as the background. However, finding one's way through all of spamassassin's code is likely to be a monumental task, so I wish to ask if somebody knows anything about this problem. Further googling turned up some discussions showing that the combination fedora+evolution+junk-filtering had more complaints than e.g. ubuntu. However, I did not see any resolution (the web server went offline). Any ideas? Any pointers? Thanks