Re: Underscores
I am mainly using the rule to check the header subject, I haven't added it to a body check. So, between the 3 choices: 1. /(?:[^_]{1,30}_+){5}/ 2. /\S+_+\S+_+\S+/ 3. R02 /^\S{30,}$/m Which covers the most territory given the example I submitted? I'm basically interested in identifying those garbage subject lines laced with characters like underscores, periods, hyphens, semi-colons, etc; so rather than use several rules to trap those individual characters, maybe there is a more effective way to resolve this. Thanks, Wes
Re: Underscores
On Sat, 18 Jul 2009, twofers wrote: I am mainly using the rule to check the header subject, I haven't added it to a body check. ? So, between the 3 choices: 1. /(?:[^_]{1,30}_+){5}/ 2. /\S+_+\S+_+\S+/ 3. R02 /^\S{30,}$/m ?Which covers the most territory given the example I submitted? I'm basically interested in identifying those garbage subject lines laced with characters like underscores, periods, hyphens, semi-colons, etc; so rather than use several rules to trap those individual characters, maybe there is a more effective way to resolve this. Your original example only included underscores. Try this: header XX Subject =~ /(?:[[:alnum:]]{1,30}[^[:alnum:]\s]{1,5}){5}/i -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
Underscores
How can I pattern match when every word has an underscore after it. Example: This_sentenance_has_an_underscore_after_every_word I'm not really good at Perl pattern matching, but \w and \W see an underscore as a word character, so I'm just not sure what might work. body =~ /^([a-z]+_+)+/i Is that something that will work effectively? Thanks. Wes
Re: Underscores
twofers wrote: How can I pattern match when every word has an underscore after it. Example: This_sentenance_has_an_underscore_after_every_word I'm not really good at Perl pattern matching, but \w and \W see an underscore as a word character, so I'm just not sure what might work. body =~ /^([a-z]+_+)+/i Is that something that will work effectively? Thanks. Wes I'd do something like this: body MY_UNDERSCORES/\S+_+\S+_+\S+/ Unless you really want to restrict it to A-Z. Regardless, ending any regex in + in a SA rule is redundant. Since + allows a one-instance match, it will devolve to that. You don't need to match the entire line with your rule, so the extra matches are redundant. It will match the first instance, and that's all it needs to be a match. Also any regex ending in * should just have it's last element removed, as that will devolve to a zero-count match.
Re: Underscores
On Thu, 2009-07-16 at 08:52 -0400, Matt Kettler wrote: twofers wrote: How can I pattern match when every word has an underscore after it. Example: This_sentenance_has_an_underscore_after_every_word body =~ /^([a-z]+_+)+/i I'd do something like this: body MY_UNDERSCORES/\S+_+\S+_+\S+/ That's quite a lot of backtracking, no? How about: /(?:[^_]{1,30}_+){1,5}/ -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
Re: Underscores
From: Matt Kettler mkettler...@verizon.net Date: Thu, 16 Jul 2009 08:52:50 -0400 twofers wrote: How can I pattern match when every word has an underscore after it. Example: This_sentenance_has_an_underscore_after_every_word I'm not really good at Perl pattern matching, but \w and \W see an underscore as a word character, so I'm just not sure what might work. body =~ /^([a-z]+_+)+/i Is that something that will work effectively? Is this for a spam rule? I'd do something like this: body MY_UNDERSCORES/\S+_+\S+_+\S+/ Unless you really want to restrict it to A-Z. Regardless, ending any regex in + in a SA rule is redundant. Since + allows a one-instance match, it will devolve to that. You don't need to match the entire line with your rule, so the extra matches are redundant. It will match the first instance, and that's all it needs to be a match. Also any regex ending in * should just have it's last element removed, as that will devolve to a zero-count match. The /\S+_+\S+_+\S+/ rule will lots of technical email, for example discussions on shell environment variables like LD_LIBRARY_PATH. -jeff
Re: Underscores
On Thu, 2009-07-16 at 06:27 -0700, John Hardin wrote: How about: /(?:[^_]{1,30}_+){1,5}/ Whoops! Make that: /(?:[^_]{1,30}_+){5}/ -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
Re: Underscores
Whoops! Make that: /(?:[^_]{1,30}_+){5}/ Better. ;) However, while that indeed eliminates excessive backtracking as \S or \w results in (since they contain the underscore), this doesn't match words ending in underscores. A non-underscore [^_] includes space, punctuation, and any other unwanted char. Exactly _five_ occurrences of an '_' underscore, with up to 30 _random_ chars in between. This paragraph matches. :) -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: [sa] Re: Underscores
On Thu, 16 Jul 2009, Karsten Bräckelmann wrote: /(?:[^_]{1,30}_+){5}/ Better. ;) However, while that indeed eliminates excessive backtracking as \S or \w results in (since they contain the underscore), this doesn't match words ending in underscores. A non-underscore [^_] includes space, punctuation, and any other unwanted char. Given that OP said the entire *line* was word-underscore-word-underscore, then why not just: body R01 /^\w{30,}$/m Or perhaps the OP wasn't clear on whether 'word' might contain other punctuation, and so we might simply use: body R02 /^\S{30,}$/m I might add \s* at the end of the rule, just in case of trailing spaces... - C
Re: [sa] Re: Underscores
On Thu, 2009-07-16 at 11:08 -0400, Charles Gregory wrote: Given that OP said the entire *line* was word-underscore-word-underscore, then why not just: body R01 /^\w{30,}$/m Indeed, it really depends on what *exactly* the rule should match. Or perhaps the OP wasn't clear on whether 'word' might contain other punctuation, and so we might simply use: body R02 /^\S{30,}$/m This one also matches a long-ish URL on a line of its own. I might add \s* at the end of the rule, just in case of trailing spaces... Keep in mind, that with body rules, the body is *rendered*. Whitespace normalized, and *paragraphs* re-flowed to a single string with embedded newlines stripped. For instance, this very paragraph is a single ^line$ as far as body REs are concerned. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Underscores
On Thu, 16 Jul 2009, Karsten Br?ckelmann wrote: Whoops! Make that: /(?:[^_]{1,30}_+){5}/ Better. ;) However, while that indeed eliminates excessive backtracking as \S or \w results in (since they contain the underscore), this doesn't match words ending in underscores. A non-underscore [^_] includes space, punctuation, and any other unwanted char. Exactly _five_ occurrences of an '_' underscore, with up to 30 _random_ chars in between. This paragraph matches. :) Sorry. I lost sight of that part... /(?:[^_\s]{1,30}_+){5}/ -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- You know things are bad when Pravda says we [the USA] have gone too far to the left. -- Joe Huffman --- Today: the 64th anniversary of the dawn of the Atomic Age
2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??
Why does SUBJ_HAS_UNIQ_ID fire on this subject: Subject: =?iso-8859-1?Q?MIGROL_Heiz=F6l-Angebot_mit_Cumulus-Bonuspunkten?= It looks as SA mistakenly interprets the underscores as underscores - which in an RFC2047 encoded string, they're not - http://rfc.net/rfc2047.html , Is this a bug in the RFC2047 decoding in SA 2.64? -- Per Jessen, Zurich Let your spam stop here -- http://www.spamchek.com
Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??
On Fri, Dec 10, 2004 at 01:25:43PM +0100, Per Jessen wrote: Why does SUBJ_HAS_UNIQ_ID fire on this subject: Subject: =?iso-8859-1?Q?MIGROL_Heiz=F6l-Angebot_mit_Cumulus-Bonuspunkten?= Is this a bug in the RFC2047 decoding in SA 2.64? No. The issue is that cumulus-bonuspunkten looks like an ID tag. -- Randomly Generated Tagline: Any similarity to person/persons now living to anyone or thing, dead or undead, is entirely accidental and just one more irrefutable proof of the paranormal. - From the 7th Guest pgp3wtKYoXpsM.pgp Description: PGP signature
Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??
Theo Van Dinter wrote: On Fri, Dec 10, 2004 at 01:25:43PM +0100, Per Jessen wrote: Why does SUBJ_HAS_UNIQ_ID fire on this subject: Subject: =?iso-8859-1?Q?MIGROL_Heiz=F6l-Angebot_mit_Cumulus-Bonuspunkten?= Is this a bug in the RFC2047 decoding in SA 2.64? No. The issue is that cumulus-bonuspunkten looks like an ID tag. Should SUBJ_HAS_UNIQ_ID really fire on that - simply a hyphenated word? There are plenty of those around (although less in german then in english). -- Per Jessen, Zurich Let your spam stop here -- http://www.spamchek.com
Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??
On Fri, Dec 10, 2004 at 08:31:57PM +0100, Per Jessen wrote: No. The issue is that cumulus-bonuspunkten looks like an ID tag. Should SUBJ_HAS_UNIQ_ID really fire on that - simply a hyphenated word? There are plenty of those around (although less in german then in english). It's not simply a hyphenated word. It looks like two long sets of characters with a hyphen in the middle, which is the exact same thing as a unique id. The rule doesn't do very well anyway: 1.039 1.1433 0.11900.906 0.730.90 SUBJ_HAS_UNIQ_ID Hence the 1 score it receives. -- Randomly Generated Tagline: Linux poses a real challenge for those with a taste for late-night hacking (and/or conversations with God). (By Matt Welsh) pgpQIdHBQJj0N.pgp Description: PGP signature