Bob Proulx schrieb am 02.11.2007 18:24:

  body FRT_OPPORTUN1 /<inter SP2><post P2>(?!opportun)<O><P><P><O><R><T><U><N>/I
  body FRT_OPPORTUN2 /<inter W0><post P2>(?!opportun)<O><P><P><O><R><T><U><N>/I

Huh?  How are those rules matching?  I am missing something.  That
can't the right rule that is being hit here.  Can someone educate me
as to what is happening here?

This rule is preprocessed by the ReplaceTags plugin. This plugin is kind of a simple macro expander. Words between <> are macros which are expanded by this plugin. <P> expands to [p\xfe] according to line 2808 in 72_active.cf, for example. This is done to ease rule creation for obfuscated words.

I don't know if or how it is possible to output the processed rule, but I guess the <post P2> expands after every normal expansion. So <P> becomes <P><P2>, and since P2 expands to {1,2}, <P> finally expands to [p\xfe]{1,2}. That matches one or two p or \xfe. There are two <P><P>, so pp, ppp and pppp match this term.

On the other hand, I don't know if "oppertun" matches this rule, although it is given this description:
describe FRT_OPPORTUN1          ReplaceTags: Oppertun (1)
The second O expands to [go0\xd2\xd3\xd4\xd5\xd6\xd8\xf0\xf2\xf3\xf4\xf5\xf6\xf8] and there is no e in it.

This rule will match only an obfuscated "opportun" due to the negative look-ahead (?!opportun) never a plain "opportun" like in "opportunity". An "oppportunity" (3p) doesn't match the look-ahead, so it matches the pattern.

Since these rules were assigned such a high score, only very few ham from the score-generating corpus (if any) seem to contain this misspelling. If I understand this process correctly, the scores are not manually determined but by a lengthy automatic analysis process for a big message corpus that tries to minimize scores for known ham and maximize scores for known spam as a whole.

What you can do:
- lower the score for these rules manually
- and perhaps give the SA developers your FP to include it into their corpus.

Reply via email to