mouss wrote: > Loren Wilton wrote: > >>> what is the \b for? >>> >> >> >> Word break. There has to be a space or some other "non-word" character >> following the things in parends. Which is why peinss manages to not >> be hit. >> >> >> Word breaks are usually used to keep from hitting on unexpected >> things, like >> the middle of a word that is benign. Offhand I'm not positive that it is >> needed in the given rule. But I suspect that it was probably put in >> because >> something there would hit where it shouldn't if the \b wasn't there. >> >> >> >> > I may hit "The penist", "Lepeniste" and probably other words, but is > this really important? how many ham would contain those? I'd be > interested in seeing the results of different mass checks, with and > without \b. > > otherwise, it may be intersting to add the same rule without breaks but > with a lower score?
SARE_ADLTSUB2 Subject =~ /\b(?:blow|climax |enlarg(e|ment)|fuck|inter+acial|lick|porn|penis|pervert|pussy|tits|tight|vagina|virgins?)\b/i Without the trailing \b this rule would also match: tighten, tightened, tighter, tightening. "I broke the bolt tightening it down." blower, blown, blowing, blows "Winds blowing over 50mph" Virginia "Virginia state legislature passes new spam bill" Enlarger, enlarges. Leaving off the starting \b adds things like: slick And that's just off the top of my head... Leaving off the \b's is generally bad, as the number of words you can hit explodes rapidly. Especially with a multi-possibility regex like this one. Fix the rule, don't ditch the \b's for such a broad rule.. Besides, the whole rule is subject to all kinds of obfuscation tricks. P.e.n.i.s still won't match, nor any other character-insertion obfuscation. Removing the \b's fixes only a few obfuscation cases, but adds many extra undesirable FP cases. I'd suggest creating obfu rules to detect obfuscations, and don't try to expand the scope of this already over-broad rule. (which will match a few FP cases as-is such as "your photo enlargement is ready")