Re: Underscores

2009-07-18 Thread twofers
I am mainly using the rule to check the header subject, I haven't added it to a 
body check.
 
So, between the 3 choices:
1.  /(?:[^_]{1,30}_+){5}/
2. /\S+_+\S+_+\S+/
3. R02 /^\S{30,}$/m
 
Which covers the most territory given the example I submitted? I'm basically 
interested in identifying those garbage subject lines laced with characters 
like underscores, periods, hyphens, semi-colons, etc; so rather than use 
several rules to trap those individual characters, maybe there is a more 
effective way to resolve this.
 
Thanks, Wes


  

Re: Underscores

2009-07-18 Thread John Hardin

On Sat, 18 Jul 2009, twofers wrote:

I am mainly using the rule to check the header subject, I haven't added 
it to a body check.

?
So, between the 3 choices:
1.  /(?:[^_]{1,30}_+){5}/
2. /\S+_+\S+_+\S+/
3. R02 /^\S{30,}$/m

?Which covers the most territory given the example I submitted? I'm
basically interested in identifying those garbage subject lines laced 
with characters like underscores, periods, hyphens, semi-colons, etc; so 
rather than use several rules to trap those individual characters, maybe 
there is a more effective way to resolve this.


Your original example only included underscores.

Try this:

  header XX Subject =~ /(?:[[:alnum:]]{1,30}[^[:alnum:]\s]{1,5}){5}/i

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79

Underscores

2009-07-16 Thread twofers
How can I pattern match when every word has an underscore after it.
Example:
This_sentenance_has_an_underscore_after_every_word

I'm not really good at Perl pattern matching, but \w and \W see an underscore 
as a word character, so I'm just not sure what might work.

body =~ /^([a-z]+_+)+/i

Is that something that will work effectively?

Thanks.

Wes


  

Re: Underscores

2009-07-16 Thread Matt Kettler


twofers wrote:
 How can I pattern match when every word has an underscore after it.
 Example:
 This_sentenance_has_an_underscore_after_every_word

 I'm not really good at Perl pattern matching, but \w and \W see an
 underscore as a word character, so I'm just not sure what might work.

 body =~ /^([a-z]+_+)+/i

 Is that something that will work effectively?

 Thanks.

 Wes



I'd do something like this:

body  MY_UNDERSCORES/\S+_+\S+_+\S+/

Unless you really want to restrict it to A-Z.

Regardless, ending any regex in + in a SA rule is redundant. Since +
allows a one-instance match, it will devolve to that. You don't need to
match the entire line with your rule, so the extra matches are
redundant. It will match the first instance, and that's all it needs to
be a match.

Also any regex ending in * should just have it's last element removed,
as that will devolve to a zero-count match.




Re: Underscores

2009-07-16 Thread John Hardin
On Thu, 2009-07-16 at 08:52 -0400, Matt Kettler wrote:
 
 twofers wrote:
  How can I pattern match when every word has an underscore after it.
  Example:
  This_sentenance_has_an_underscore_after_every_word
 
  body =~ /^([a-z]+_+)+/i

 I'd do something like this:
 
 body  MY_UNDERSCORES/\S+_+\S+_+\S+/

That's quite a lot of backtracking, no?

How about:

  /(?:[^_]{1,30}_+){1,5}/

-- 
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79



Re: Underscores

2009-07-16 Thread Jeff Mincy
   From: Matt Kettler mkettler...@verizon.net
   Date: Thu, 16 Jul 2009 08:52:50 -0400
   
   twofers wrote:
How can I pattern match when every word has an underscore after it.
Example:
This_sentenance_has_an_underscore_after_every_word
   
I'm not really good at Perl pattern matching, but \w and \W see an
underscore as a word character, so I'm just not sure what might work.
   
body =~ /^([a-z]+_+)+/i
   
Is that something that will work effectively?

Is this for a spam rule?

   I'd do something like this:
   
   body  MY_UNDERSCORES/\S+_+\S+_+\S+/
   
   Unless you really want to restrict it to A-Z.
   
   Regardless, ending any regex in + in a SA rule is redundant. Since +
   allows a one-instance match, it will devolve to that. You don't need to
   match the entire line with your rule, so the extra matches are
   redundant. It will match the first instance, and that's all it needs to
   be a match.
   
   Also any regex ending in * should just have it's last element removed,
   as that will devolve to a zero-count match.

The /\S+_+\S+_+\S+/ rule will lots of technical email, for example
discussions on shell environment variables like LD_LIBRARY_PATH.

-jeff


Re: Underscores

2009-07-16 Thread John Hardin
On Thu, 2009-07-16 at 06:27 -0700, John Hardin wrote:

 How about:
 
   /(?:[^_]{1,30}_+){1,5}/

Whoops! Make that:

  /(?:[^_]{1,30}_+){5}/

-- 
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79



Re: Underscores

2009-07-16 Thread Karsten Bräckelmann
 Whoops! Make that:
 
   /(?:[^_]{1,30}_+){5}/

Better. ;)  However, while that indeed eliminates excessive backtracking
as \S or \w results in (since they contain the underscore), this doesn't
match words ending in underscores. A non-underscore [^_] includes
space, punctuation, and any other unwanted char.

Exactly _five_ occurrences of an '_' underscore, with up to 30 _random_
chars in between. This paragraph matches. :)


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: [sa] Re: Underscores

2009-07-16 Thread Charles Gregory

On Thu, 16 Jul 2009, Karsten Bräckelmann wrote:

  /(?:[^_]{1,30}_+){5}/

Better. ;)  However, while that indeed eliminates excessive backtracking
as \S or \w results in (since they contain the underscore), this doesn't
match words ending in underscores. A non-underscore [^_] includes
space, punctuation, and any other unwanted char.


Given that OP said the entire *line* was word-underscore-word-underscore,
then why not just:

body R01 /^\w{30,}$/m

Or perhaps the OP wasn't clear on whether 'word' might contain other 
punctuation, and so we might simply use:


body R02 /^\S{30,}$/m

I might add \s* at the end of the rule, just in case of trailing spaces...

- C

Re: [sa] Re: Underscores

2009-07-16 Thread Karsten Bräckelmann
On Thu, 2009-07-16 at 11:08 -0400, Charles Gregory wrote:
 Given that OP said the entire *line* was word-underscore-word-underscore,
 then why not just:
 
 body R01 /^\w{30,}$/m

Indeed, it really depends on what *exactly* the rule should match.

 Or perhaps the OP wasn't clear on whether 'word' might contain other 
 punctuation, and so we might simply use:
 
 body R02 /^\S{30,}$/m

This one also matches a long-ish URL on a line of its own.

 I might add \s* at the end of the rule, just in case of trailing spaces...

Keep in mind, that with body rules, the body is *rendered*. Whitespace
normalized, and *paragraphs* re-flowed to a single string with embedded
newlines stripped. For instance, this very paragraph is a single ^line$
as far as body REs are concerned.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Underscores

2009-07-16 Thread John Hardin

On Thu, 16 Jul 2009, Karsten Br?ckelmann wrote:


Whoops! Make that:

  /(?:[^_]{1,30}_+){5}/


Better. ;)  However, while that indeed eliminates excessive backtracking
as \S or \w results in (since they contain the underscore), this doesn't
match words ending in underscores. A non-underscore [^_] includes
space, punctuation, and any other unwanted char.

Exactly _five_ occurrences of an '_' underscore, with up to 30 _random_
chars in between. This paragraph matches. :)


Sorry. I lost sight of that part...

  /(?:[^_\s]{1,30}_+){5}/

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  You know things are bad when Pravda says we [the USA] have gone
  too far to the left. -- Joe Huffman
---
 Today: the 64th anniversary of the dawn of the Atomic Age

2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

2004-12-10 Thread Per Jessen
Why does SUBJ_HAS_UNIQ_ID fire on this subject:

Subject: =?iso-8859-1?Q?MIGROL_Heiz=F6l-Angebot_mit_Cumulus-Bonuspunkten?=

It looks as SA mistakenly interprets the underscores as underscores - which in
an RFC2047 encoded string, they're not - http://rfc.net/rfc2047.html ,

Is this a bug in the RFC2047 decoding in SA 2.64? 


-- 
Per Jessen, Zurich
Let your spam stop here -- http://www.spamchek.com




Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

2004-12-10 Thread Theo Van Dinter
On Fri, Dec 10, 2004 at 01:25:43PM +0100, Per Jessen wrote:
 Why does SUBJ_HAS_UNIQ_ID fire on this subject:
 
 Subject: =?iso-8859-1?Q?MIGROL_Heiz=F6l-Angebot_mit_Cumulus-Bonuspunkten?=
 
 Is this a bug in the RFC2047 decoding in SA 2.64? 

No.  The issue is that cumulus-bonuspunkten looks like an ID tag.

-- 
Randomly Generated Tagline:
Any similarity to person/persons now living to anyone or thing, dead or 
 undead, is entirely accidental and just one more irrefutable proof of the 
 paranormal.  - From the 7th Guest


pgp3wtKYoXpsM.pgp
Description: PGP signature


Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

2004-12-10 Thread Per Jessen
Theo Van Dinter wrote:

 On Fri, Dec 10, 2004 at 01:25:43PM +0100, Per Jessen wrote:
 Why does SUBJ_HAS_UNIQ_ID fire on this subject:
 
 Subject: =?iso-8859-1?Q?MIGROL_Heiz=F6l-Angebot_mit_Cumulus-Bonuspunkten?=
 
 Is this a bug in the RFC2047 decoding in SA 2.64?
 
 No.  The issue is that cumulus-bonuspunkten looks like an ID tag.

Should SUBJ_HAS_UNIQ_ID really fire on that - simply a hyphenated word?  There
are plenty of those around (although less in german then in english). 
 

-- 
Per Jessen, Zurich
Let your spam stop here -- http://www.spamchek.com




Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

2004-12-10 Thread Theo Van Dinter
On Fri, Dec 10, 2004 at 08:31:57PM +0100, Per Jessen wrote:
  No.  The issue is that cumulus-bonuspunkten looks like an ID tag.
 
 Should SUBJ_HAS_UNIQ_ID really fire on that - simply a hyphenated word?  There
 are plenty of those around (although less in german then in english). 

It's not simply a hyphenated word.  It looks like two long sets of characters
with a hyphen in the middle, which is the exact same thing as a unique id.

The rule doesn't do very well anyway:

  1.039   1.1433   0.11900.906   0.730.90  SUBJ_HAS_UNIQ_ID

Hence the 1 score it receives.

-- 
Randomly Generated Tagline:
Linux poses a real challenge for those with a taste for late-night
 hacking (and/or conversations with God).
 (By Matt Welsh)


pgpQIdHBQJj0N.pgp
Description: PGP signature