Re: [Razor-users] Spamassassin's Razor scores
-Original Message- On Tuesday, 5. February 2008 18:15:24 you wrote: I used to have higher SA scores for 95-100% spam confidence. However, I found that I could not increase the score very much. Occasionally, I would get a false positive for a blank email, no text with a few HTML tags and just attachments. The Razor database regularly contains data that indicates that a blank email is 100% known spam. There was no way to prevent the false positive because the whitelist feature for hash values was removed. I also tried combining scores for messages with a small amount of text and positive razor hits, but that allows too much spam. Hmm, that would be a little show stopper. What did the other tests of SpamAssassin report for such mails? I can imagine they report it as spam, too. Thomas Here is an example of the SpamAssassin report for a blank email with Word attachment: pts rule name description -- - -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record 0.1 HTML_90_100BODY: Message is 90 HTML -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1 [cf: 100] 6.1 RAZOR2_CF_RANGE_91_100 Razor2 gives confidence between 91 and 100 [cf: 100] 1.5 RAZOR2_CF_RANGE_E4_51_100 Razor2 gives engine 4 confidence level above 50% [cf: 100] 1.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/) -1.5 RAZOR2_IGNORE Message in RAZOR2 and has very little text - The RAZOR2_CF_RANGE_91_100 and RAZOR2_IGNORE were my custom RAZOR rules. I could not get RAZOR2_IGNORE consistently to recognize when to ignore the RAZOR2_CF_RANGE_91_100 results. meta RAZOR2_IGNORE RAZOR2_CHECK + HTML_90_100 1 describe RAZOR2_IGNORE Message in RAZOR2 and has very little text tflags RAZOR2_IGNORE net meta RAZOR2_IGNORE2 RAZOR2_CHECK + MIME_HTML_MOSTLY 1 describe RAZOR2_IGNORE2 Message in RAZOR2 and has very little text2 tflags RAZOR2_IGNORE2net fullRAZOR2_CF_RANGE_00_01 eval:check_razor2_range('','00','01') tflags RAZOR2_CF_RANGE_00_01 net describe RAZOR2_CF_RANGE_00_01 Razor2 gives confidence between 00 and 01 fullRAZOR2_CF_RANGE_02_10 eval:check_razor2_range('','02','10') tflags RAZOR2_CF_RANGE_02_10 net describe RAZOR2_CF_RANGE_02_10 Razor2 gives confidence between 02 and 10 fullRAZOR2_CF_RANGE_11_50 eval:check_razor2_range('','11','50') tflags RAZOR2_CF_RANGE_11_50 net describe RAZOR2_CF_RANGE_11_50 Razor2 gives confidence between 11 and 50 fullRAZOR2_CF_RANGE_51_90 eval:check_razor2_range('','51','90') tflags RAZOR2_CF_RANGE_51_90 net describe RAZOR2_CF_RANGE_51_90 Razor2 gives confidence between 51 and 90 fullRAZOR2_CF_RANGE_91_100 eval:check_razor2_range('','91','100') tflags RAZOR2_CF_RANGE_91_100 net describe RAZOR2_CF_RANGE_91_100 Razor2 gives confidence between 91 and 100 Jim - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Razor-users mailing list Razor-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/razor-users
Re: [Razor-users] Spamassassin's Razor scores
-Original Message- On Thu, Feb 07, 2008 at 01:51:47PM -0600, Jim Hermann - UUN Hostmaster wrote: The RAZOR2_CF_RANGE_91_100 and RAZOR2_IGNORE were my custom RAZOR rules. I could not get RAZOR2_IGNORE consistently to recognize when to ignore the RAZOR2_CF_RANGE_91_100 results. meta RAZOR2_IGNORE2 RAZOR2_CHECK + MIME_HTML_MOSTLY 1 a) Eww. RAZOR2_CHECK MIME_HTML_MOSTLY b) MIME_HTML_MOSTLY probably doesn't help you with non-HTML mails (I'd have to look at the rule to figure out what it does) c) this feels like something razor-whitelist could help with, at least if it's a consistent checksum. Vipul told me that razor-whitelist does not support checksums anymore. Even if it did, the blank checksum changes with each version of MS Outlook and each default FONT. MIME_HTML_MOSTLY works some of the time because text length is zero or 1. Here is the content of the blank message from MS Outlook: end of headers This is a multi-part message in MIME format. --=_NextPart_000_0001_01C69F5D.B03731E0 Content-Type: multipart/alternative; boundary==_NextPart_001_0002_01C69F5D.B03731E0 --=_NextPart_001_0002_01C69F5D.B03731E0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit --=_NextPart_001_0002_01C69F5D.B03731E0 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: quoted-printable !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN HTMLHEAD META HTTP-EQUIV=3DContent-Type CONTENT=3Dtext/html; charset=3Dus-ascii TITLEMessage/TITLE META content=3DMSHTML 6.00.2900.2912 name=3DGENERATOR/HEAD BODYFONT face=3DMaiandra GD/FONT/BODY/HTML --=_NextPart_001_0002_01C69F5D.B03731E0-- --=_NextPart_000_0001_01C69F5D.B03731E0 Content-Type: application/msword; [snip] Jim - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Razor-users mailing list Razor-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/razor-users
Re: [Razor-users] Spamassassin's Razor scores
Hello Jim, On Tuesday, 5. February 2008 18:15:24 you wrote: I used to have higher SA scores for 95-100% spam confidence. However, I found that I could not increase the score very much. Occasionally, I would get a false positive for a blank email, no text with a few HTML tags and just attachments. The Razor database regularly contains data that indicates that a blank email is 100% known spam. There was no way to prevent the false positive because the whitelist feature for hash values was removed. I also tried combining scores for messages with a small amount of text and positive razor hits, but that allows too much spam. Hmm, that would be a little show stopper. What did the other tests of SpamAssassin report for such mails? I can imagine they report it as spam, too. Thomas - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Razor-users mailing list Razor-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/razor-users
Re: [Razor-users] Spamassassin's Razor scores
-Original Message- From: 'Thomas Jarosch'; '' Subject: RE: [Razor-users] Spamassassin's Razor scores Hello together, I'm wondering if it would make sense to add additional rules for 95% to 100% spam confidence? Is anybody already using a setup like that? Any drawbacks? Cheers, Thomas Thomas, I used to have higher SA scores for 95-100% razor spam confidence. However, I found that I could not increase the score very much. Occasionally, I would get a false positive for a blank email, no text with a few HTML tags and just attachments. The Razor database regularly contains data that indicates that a blank email is 100% known spam. There was no way to prevent the false positive because the whitelist feature for hash values was removed. I also tried combining scores for messages with a small amount of text and positive razor hits, but that allows too much spam. Jim - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Razor-users mailing list Razor-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/razor-users
Re: [Razor-users] Spamassassin's Razor scores
-Original Message- From: 'Thomas Jarosch'; '' Subject: RE: [Razor-users] Spamassassin's Razor scores Hello together, I'm wondering if it would make sense to add additional rules for 95% to 100% spam confidence? Is anybody already using a setup like that? Any drawbacks? Cheers, Thomas Thomas, I used to have higher SA scores for 95-100% spam confidence. However, I found that I could not increase the score very much. Occasionally, I would get a false positive for a blank email, no text with a few HTML tags and just attachments. The Razor database regularly contains data that indicates that a blank email is 100% known spam. There was no way to prevent the false positive because the whitelist feature for hash values was removed. I also tried combining scores for messages with a small amount of text and positive razor hits, but that allows too much spam. Jim - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Razor-users mailing list Razor-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/razor-users
Re: [Razor-users] Spamassassin's Razor scores
Hello Matt, On Thursday, 31. January 2008 17:18:35 you wrote: e4 total hits: 981 e4 cf 100: 857 e4 cf 90-99: 12 e4 cf 80-89: 2 e4 cf 70-79: 10 e4 cf 60-69: 21 e4 cf 50-59: 16 e4 cf 40-49: 8 e4 cf 30-39: 30 e4 cf 20-29: 25 e4 cf 10-19: 0 e4 cf 0-9: 0 --- e8 total hits:1532 e8 cf 100:1334 e8 cf 90-99: 22 e8 cf 80-89: 16 e8 cf 70-79: 29 e8 cf 60-69: 23 e8 cf 50-59: 33 e8 cf 40-49: 38 e8 cf 30-39: 24 e8 cf 20-29: 13 e8 cf 10-19: 0 e8 cf 0-9: 0 Interesting results! A separate category for 100% could improve things I guess. Could you make another run with the spam-data from the weekend? We have a busy mailserver here, if you send me your patch I'll try to gather some data, too. Cheers, Thomas - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Razor-users mailing list Razor-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/razor-users
Re: [Razor-users] Spamassassin's Razor scores
On Wednesday, 30. January 2008 20:36:09 Theo Van Dinter wrote: On Wed, Jan 30, 2008 at 01:11:53PM -0500, Matt Kettler wrote: You can try it if you like. The existing rules are the result of some testing that was done several years ago. I think it was Theo that did it.. Yeah, I wrote the rules + code way back when... I've been trying to find some stats for this stuff, but didn't come up with anything useful. My recollection was that w/ e8 the cf was either really low or really high, and we just took the 51_100 values from the older pre-e8 rules and made it all consistent. I don't recall e4 stats. Maybe Vipul the great can provide some statistics if there is such a thing like 80% or 90% cf level and if it's worth expanding the SpamAssassin rules. As Theo noted there is probably more diversity for e4 than for e8, if at all. Thomas - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Razor-users mailing list Razor-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/razor-users
Re: [Razor-users] Spamassassin's Razor scores
Thomas Jarosch wrote: On Wednesday, 30. January 2008 20:36:09 Theo Van Dinter wrote: My recollection was that w/ e8 the cf was either really low or really high, and we just took the 51_100 values from the older pre-e8 rules and made it all consistent. I don't recall e4 stats. Maybe Vipul the great can provide some statistics if there is such a thing like 80% or 90% cf level and if it's worth expanding the SpamAssassin rules. As Theo noted there is probably more diversity for e4 than for e8, if at all. I'm currently seeing both e8 and e4 with 87% of their matches being cf=100, which matches what I started to see yesterday. My samples are still pretty small, but I can definitely see a trend. Based on the numbers I'm seeing below, it *might* be valuable to split SA up into three cf ranges ie: 0-50, 51-99, 100-100. I'm not sure if there's more FPs in that 51-99 range that may be detracting from 100's performance, but it seems sensible to me to let the 100 grouping stand by itself since it is such a large percentage of hits. I wrote a quick little grep and wc -l shell script that greps through my razor-agent.log to so I can monitor it really quick (note: ac is currently 21, hence the 0's at the low end.) e4 total hits: 981 e4 cf 100: 857 e4 cf 90-99: 12 e4 cf 80-89: 2 e4 cf 70-79: 10 e4 cf 60-69: 21 e4 cf 50-59: 16 e4 cf 40-49: 8 e4 cf 30-39: 30 e4 cf 20-29: 25 e4 cf 10-19: 0 e4 cf 0-9: 0 --- e8 total hits:1532 e8 cf 100:1334 e8 cf 90-99: 22 e8 cf 80-89: 16 e8 cf 70-79: 29 e8 cf 60-69: 23 e8 cf 50-59: 33 e8 cf 40-49: 38 e8 cf 30-39: 24 e8 cf 20-29: 13 e8 cf 10-19: 0 e8 cf 0-9: 0 - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Razor-users mailing list Razor-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/razor-users
Re: [Razor-users] Spamassassin's Razor scores
Thomas Jarosch wrote: Hello together, I've been using Razor and SpamAssassin quite a while now using the standard rules distributed with SpamAssassin. SpamAssassin normally evalutes Razor's spam confidence level between 51% and 100%. This results in the following score: RAZOR2_CF_RANGE_51_100=0.5, RAZOR2_CF_RANGE_E4_51_100=1.5, RAZOR2_CF_RANGE_E8_51_100=1.5, RAZOR2_CHECK=0.5 - 4 points at maximum. I'm wondering if it would make sense to add additional rules for 95% to 100% spam confidence? Is anybody already using a setup like that? Any drawbacks? You can try it if you like. The existing rules are the result of some testing that was done several years ago. I think it was Theo that did it.. Regardless, last time I looked I found that e8 tends to strongly gravitate towards 100, if its listed. There are some hits below 100, but they're comparatively rare. This is probably due to really fast prorogation of reports for this signature type. I just re-tweaked my Core.pm to make cf comparison logging a lower-level event so I can check if this has changed. So far (only a minute or two) I've gotten 5 e8's and 1 e4, all cf=100. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Razor-users mailing list Razor-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/razor-users
Re: [Razor-users] Spamassassin's Razor scores
On Wed, Jan 30, 2008 at 01:11:53PM -0500, Matt Kettler wrote: You can try it if you like. The existing rules are the result of some testing that was done several years ago. I think it was Theo that did it.. Yeah, I wrote the rules + code way back when... I've been trying to find some stats for this stuff, but didn't come up with anything useful. My recollection was that w/ e8 the cf was either really low or really high, and we just took the 51_100 values from the older pre-e8 rules and made it all consistent. I don't recall e4 stats. I just re-tweaked my Core.pm to make cf comparison logging a lower-level event so I can check if this has changed. So far (only a minute or two) I've gotten 5 e8's and 1 e4, all cf=100. Yeah, unfortunately we don't log the actual cf values anywhere by default, so it's hard to runs some stats w/out rerunning all messages and pounding on the razor servers. We have the NetCache plugin which was an initial attempt I was working on to grab all network-related test results and store them in an X-Spam-* header for later use via the --reuse option, but a) Razor2 is the only thing in there, and b) no one enables it by default since nothing actually uses the resulting header. -- Randomly Selected Tagline: ... are you nuts? Well, yeah, I got references ... - Prof. Michaelson pgpntMbb5RYZ2.pgp Description: PGP signature - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ Razor-users mailing list Razor-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/razor-users
Re: [Razor-users] Spamassassin's Razor scores
Theo Van Dinter wrote: On Wed, Jan 30, 2008 at 01:11:53PM -0500, Matt Kettler wrote: You can try it if you like. The existing rules are the result of some testing that was done several years ago. I think it was Theo that did it.. Yeah, I wrote the rules + code way back when... I've been trying to find some stats for this stuff, but didn't come up with anything useful. My recollection was that w/ e8 the cf was either really low or really high, and we just took the 51_100 values from the older pre-e8 rules and made it all consistent. I don't recall e4 stats. I just re-tweaked my Core.pm to make cf comparison logging a lower-level event so I can check if this has changed. So far (only a minute or two) I've gotten 5 e8's and 1 e4, all cf=100. Yeah, unfortunately we don't log the actual cf values anywhere by default, so it's hard to runs some stats w/out rerunning all messages and pounding on the razor servers. If you want some quick stats, I can post you a patch to Razor2's Core.pm that enables logging of cf levels in your razor-agent.log without flooding you. That's probably not useful for mass-checks, but can be useful for a little grep-based statistics gathering. By default, if you set your debuglevel=6 it will log which engines matched, at what cf level, and what signature hash. However, there's a lot of other junk that's completely uninteresting to anyone outside the razor team (ie: byte counts on a per-read/write basis come in at debuglevel=4) The patch I made moves the byte counts up to level 5, and the engine match lines down the level 4. But some quick stats for the past hour and 15 mins: e8 - 140 matches, 16 of which were less than cf 100 (11.4% of hits). e4 - 92 matches, 12 of which were less than cf 100 (13.0% of hits). Admittedly the sample is small, but you do get the idea. There is a pretty strong gravitation towards 100, so differentiating between 51-95 and 95-100 isn't much different than 51-100. We have the NetCache plugin which was an initial attempt I was working on to grab all network-related test results and store them in an X-Spam-* header for later use via the --reuse option, but a) Razor2 is the only thing in there, and b) no one enables it by default since nothing actually uses the resulting header. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ Razor-users mailing list Razor-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/razor-users