Daniel Quinlan <[EMAIL PROTECTED]> writes:

>   0.380   0.7601   0.0000    1.000   0.93    0.45  LOTS_OF_STUFF:1
>   0.043   0.0867   0.0000    1.000   0.93    0.45  LOTS_OF_STUFF:2
>
> bad

I figured this out.  It's really a URI rule which worked as a body rule
before because we stuffed URIs into the body.  I have some test rules
that out-do the original now and I'll check 'em in.

>   0.510   0.9801   0.0401    0.961   0.82    0.49  TO_HAS_SPACES:1
>   0.160   0.2934   0.0267    0.917   0.72    0.49  TO_HAS_SPACES:2
>
> bad

Ah, this broke because we changed the :addr code.  I'll try to
resurrect...

>   1.024   2.0401   0.0067    0.997   0.92    2.53  TRACKER_ID:1
>   0.814   1.6201   0.0067    0.996   0.92    2.53  TRACKER_ID:2
>
> bad

Hmmm... the rule is unchanged, so I think it's just the loss of URIs in
the body again.  The difference is not as big as LOTS_OF_STUFF, though,
so I'm not as inclined to pursue the missing 0.4% of spam hits.  Maybe
it's worth it, though.

> There goes a perfectly good rule for me, even in STATISTICS.txt it was
> pretty good:
>
>   4.432   6.7680   0.0560    0.992   0.94    3.67  MSGID_FROM_MTA_SHORT

I have a replacement rule that's just about as good (possibly more
correct) and is 12 lines long instead of 96 lines for the original set
of MSGID_FROM_MTA* eval rules:

  4.473   5.2540   0.1261    0.977   0.87    3.67  MSGID_FROM_MTA_SHORT
  5.436   6.3822   0.1646    0.975   0.86    0.01  T_MSGID_FROM_MTA_1

I think the original really high scores was mostly luck due to being
able to parse some lines and not others.  This rule uses the trusted
Received header code so I think it will also solve some of the FP
problems that some sites had with MSGID_FROM_MTA_SHORT.

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

Reply via email to