Re: Crazy nonsensical white-space within words

Morgan Bishop Wed, 08 Jun 2011 16:46:17 -0700

On 6/8/2011 7:11 PM, John Hardin wrote:

On Wed, 8 Jun 2011, Martin Gregorie wrote:

On Wed, 2011-06-08 at 07:53 -0700, John Hardin wrote:

How about this (untested):
header __SUBJ_BROKEN_WORD Subject =~/\s(?!i[PT])[a-z]{1,3}[A-Z][a-z]{2}/
tflags  __SUBJ_BROKEN_WORD  multiple
meta    __SUBJ_BROKEN_WORDS __SUBJ_BROKEN_WORD > 2


I tested this as well as my own variant:

describe MG_SPLIT322 Two or more words obfuscated with a "xxx xx xx"
split
body     __MG_SPL322 /\b[a-z]{3} [a-z]{2} [a-z]{2}\b/i
tflags   __MG_SPL322 multiple
meta     MG_SPLIT322 __MG_SPL322 > 2
score    MG_SPLIT322 4

against a private collection of 491 spam messages which I use to test my
private rules.

I got 8 FPs (1.6%) with either regex because both hit on fairly common
text such as "Log in to", "rolling out up to", "want you to be" and "and
so on",

My version shouldn't hit on _any_ of those examples, it'sintentionally case-sensitive.

Oops, I realize I replied to to Martin and not the mailing list. I ranyour expression against a random corpus of 500 ham and only got 2 hits(<0.01%) for messages with "eBay" and "eTrade" in the subjects. It wassomething like "Your eTrade statement is now available" and "eBayexpired notice..." but nothing else such as WiFi, iPod, or similarstrings. Hit frequencies in a random corpus of spam were 100% for thespam I'm targeting and 3% for unrelated spam.

Everything is working well, and I have a very happy user (my old Man).My default score for the new rule is 1.5 as that will bump the SPAMscore to 5 for all my bayes trained accounts. I've bumped up theuser_pref score to 5 for the account that is receiving hotmail forwards,with a filter to correct legitimate eBay messages, since that is theONLY problem account (no other accounts receive this spam). Once theaccount is trained after a week or so I'm confident Bayes will correctlymark tag it as SPAM, so the problem will not be long lived and I canremove the user preference score that could create false positives.


Thank you for your help.

Re: Crazy nonsensical white-space within words

Reply via email to