On 01/09/2015 01:23 AM, Adam Katz wrote:
Ran these against my corpus.  Here are the worst performers (lots in
common with RW's complaints):

*SPAM%   HAM%    S/O  NAME*
0.013  0.153  0.080  __RULEGEN_PHISH_BLR6YY
0.006  0.286  0.022  __RULEGEN_PHISH_0ATBRI
0.008  0.334  0.023  __RULEGEN_PHISH_L3I0Z5
0.002  0.300  0.006  __RULEGEN_PHISH_LGYG7Q
0.017  1.387  0.012  __RULEGEN_PHISH_QVS6GE
0.045  2.490  0.018  __RULEGEN_PHISH_UNQ4VP
0.027  2.011  0.013  __RULEGEN_PHISH_B9HL3A

body __RULEGEN_PHISH_UNQ4VP  / may contain information that is /
body __RULEGEN_PHISH_QVS6GE  / or entity to which it is addressed/
body __RULEGEN_PHISH_B9HL3A  /The information contained in this /
body __RULEGEN_PHISH_0ATBRI  / it is addressed\. If you are n/
body __RULEGEN_PHISH_LGYG7Q  / you have received it in error. /
body __RULEGEN_PHISH_BLR6YY  /uthorised and regulated by the /
body __RULEGEN_PHISH_L3I0Z5  / is intended solely for the ..d/

A large number of the FPs come from Paypal and similar services.

Agreed, the rules are not close to ideal.
The spam corpus is ancient, the ham corpus is too small.


Even controlling for those, I haven't found the phishing ruleset useful
at all.  The fraud rules do have limited utility.

Agreed - blam bad & stale data.

What relationship does this have to the 10+ year-old SARE stuff?

I was part of the SARE group, and saved the rules (for historical reasons) to SF before the web site was shutdown for good.

As I don't have the means to set up a SA update channel, putting the RULEGEN rules on SF was the only option I had left.

Reply via email to