[Declude.JunkMail] phone regex/pcre help

2007-07-03 Thread Scott Fisher
I'm looking to replace these lines with a pcre but it doesn't seem to be
working. Any suggestions?

 

BODY 175 CONTAINS 206 888-2083

BODY 175 CONTAINS 206.8882083

BODY 175 CONTAINS 2068882083

BODY 175 CONTAINS 206-8882083

BODY 175 CONTAINS 206 8882083

 

BODY   175   PCRE
(?i:[\(\{]?2[0o]6[\)\}]?{\-\_\.\s}?888{\-\_\.\s}?2[0o]83)

 

Scott Fisher

Dir of IT

Farm Progress Companies

191 S Gary Ave

Carol Stream, IL 60188

Tel: 630-462-2323

 

This email message, including any attachments, is for the sole use of the
intended recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or distribution is
prohibited. If you are not the intended recipient, please contact the sender
by reply email and destroy all copies of the original message. Although Farm
Progress Companies has taken reasonable precautions to ensure no viruses are
present in this email, the company cannot accept responsibility for any loss
or damage arising from the use of this email or attachments.

 



---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.


RE: [Declude.JunkMail] phone regex/pcre help

2007-07-03 Thread Colbeck, Andrew
Scot, my eyes water when I look at a long regexp.
 
So without trying to work out that specific PCRE syntax, I'll suggest
two things:
 
1) Make a generic detection that finds zero or more junk characters
between the text you're looking for.  The longer the parent string is,
the less likely you are to have a false positive, e.g.
 
finding filler between ab
 
BAD:
 
a.*b
 
This is bad because it is too greedy and matches the longest line that
has a then zero or any amount of characters up to the buffer size, and
then a b.
 
LESS BAD:
 
a.{0,2}b
 
This is less bad because we're restricting the count of the wildcard to
0 through 2 characters between the a and the b, but it's still bad
because the string is so short.  Even if this were gibberish, you will
likely hit it eventually as a false positive when finding it in the MIME
encoding of a binary file.
 
AWESOME:
 
Taking a long string like a phone number and dropping the:
 
.{0,2}
 
between each of the bits of text you think the bad guy will try to stuff
with junk, including whitespace.  Replace the 2 with however many
characters you think are sensible. I think Declude wants the brace
characters escaped, e.g.:
 
.\{0,2\}
 
is the syntax to use in a PCRE.
 
2) A while back I had to fix some ugly regexp that plain old didn't
work, and I used a Windows shareware app called The Regex Coach and it
worked for me.
 
http://weitz.de/regex-coach/
 
 
Andrew.
 
 
 




From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
Of Scott Fisher
Sent: Tuesday, July 03, 2007 12:34 PM
To: declude.junkmail@declude.com
Subject: [Declude.JunkMail] phone regex/pcre help



I'm looking to replace these lines with a pcre but it doesn't
seem to be working. Any suggestions?

 

BODY 175 CONTAINS 206 888-2083

BODY 175 CONTAINS 206.8882083

BODY 175 CONTAINS 2068882083

BODY 175 CONTAINS 206-8882083

BODY 175 CONTAINS 206 8882083

 

BODY   175   PCRE
(?i:[\(\{]?2[0o]6[\)\}]?{\-\_\.\s}?888{\-\_\.\s}?2[0o]83)

 

Scott Fisher

Dir of IT

Farm Progress Companies

191 S Gary Ave

Carol Stream, IL 60188

Tel: 630-462-2323

 

This email message, including any attachments, is for the sole
use of the intended recipient(s) and may contain confidential and
privileged information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended recipient,
please contact the sender by reply email and destroy all copies of the
original message. Although Farm Progress Companies has taken reasonable
precautions to ensure no viruses are present in this email, the company
cannot accept responsibility for any loss or damage arising from the use
of this email or attachments.

 


---
This E-mail came from the Declude.JunkMail mailing list. To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail. The archives can be found
at http://www.mail-archive.com. 



---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.


Re: [Declude.JunkMail] phone regex/pcre help

2007-07-03 Thread Matt

Scott,

The following should do the same.  Note that I do not know if Declude 
requires the whole match to be placed in parenthesis.


   2[0Oo]6[\s\r\n\-\.]*888[\s\r\n\-\.]*2[0Oo]83

Matt



Scott Fisher wrote:


I'm looking to replace these lines with a pcre but it doesn't seem to 
be working. Any suggestions?


 


BODY 175 CONTAINS 206 888-2083

BODY 175 CONTAINS 206.8882083

BODY 175 CONTAINS 2068882083

BODY 175 CONTAINS 206-8882083

BODY 175 CONTAINS 206 8882083

 

BODY   175   PCRE   
(?i:[\(\{]?2[0o]6[\)\}]?{\-\_\.\s}?888{\-\_\.\s}?2[0o]83)


 


Scott Fisher

Dir of IT

Farm Progress Companies

191 S Gary Ave

Carol Stream, IL 60188

Tel: 630-462-2323

 

/This email message, including any attachments, is for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information. Any unauthorized review, use, disclosure or distribution 
is prohibited. If you are not the intended recipient, please contact 
the sender by reply email and destroy all copies of the original 
message. Although Farm Progress Companies has taken reasonable 
precautions to ensure no viruses are present in this email, the 
company cannot accept responsibility for any loss or damage arising 
from the use of this email or attachments./


 



---
This E-mail came from the Declude.JunkMail mailing list. To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail. The archives can be found
at http://www.mail-archive.com. 



---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.

Re: [Declude.JunkMail] phone regex/pcre help

2007-07-03 Thread David Barker
This would match on all you have provided, the . meaning any character 
including a space {0,1} means min of 0 max of 1

(206.{0,1}888.{0,1}2083)

If you wanted to use detect O as well as the 0 [o0] also you could use the ?i: 
meaning case insensitive:

(?i:2[o0]6.{0,1}888.{0,1}2[o0]83)

David B


From: Matt [EMAIL PROTECTED]
Sent: Tuesday, July 03, 2007 4:08 PM
To: declude.junkmail@declude.com
Subject: Re: [Declude.JunkMail] phone regex/pcre help 

Scott,

The following should do the same.  Note that I do not know if Declude
requires the whole match to be placed in parenthesis.

2[0Oo]6[\s\r\n\-\.]*888[\s\r\n\-\.]*2[0Oo]83

Matt

Scott Fisher wrote:

  I'm looking to replace
these lines with a pcre but it
doesn't seem to be working. Any suggestions? 

  BODY 175 CONTAINS 206
888-2083 
  BODY 175 CONTAINS
206.8882083 
  BODY 175 CONTAINS
2068882083 
  BODY 175 CONTAINS
206-8882083 
  BODY 175 CONTAINS 206
8882083 

  BODY  
175   PCRE  
(?i:[\(\{]?2[0o]6[\)\}]?{\-\_\.\s}?888{\-\_\.\s}?2[0o]83) 

  Scott Fisher 
  Dir of IT 
  Farm Progress Companies 
  191 S Gary Ave 
  Carol Stream ,  IL   60188 
  Tel: 630-462-2323 

  This
email message,
including any attachments, is for the sole use of the intended
recipient(s) and
may contain confidential and privileged information. Any unauthorized
review,
use, disclosure or distribution is prohibited. If you are not the
intended
recipient, please contact the sender by reply email and destroy all
copies of
the original message. Although Farm Progress Companies has taken
reasonable
precautions to ensure no viruses are present in this email, the company
cannot
accept responsibility for any loss or damage arising from the use of
this email
or attachments. 


---

This E-mail came from the Declude.JunkMail mailing list. To

unsubscribe, just send an E-mail to [EMAIL PROTECTED], and

type unsubscribe Declude.JunkMail. The archives can be found

at http://www.mail-archive.com.

---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.



---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.


Re: [Declude.JunkMail] phone regex/pcre help

2007-07-03 Thread Matt

Dave,

{0,1} = ?
{0,} = *
{1,} = +

Also note that beginning a sub-match with a (? improves PCRE's 
performance because it tells it not to track the sub-matches, and the 
engine likely has a hard limit in order to prevent an expression from 
causing itself to become overly complicated with sub-matches that don't 
need to be tracked (which can result in missing matches).  So never 
start a sub-match with just a parenthesis, always use a (?, or other 
more specific argument (or whatever they call it).


A good thing to remember when dealing with regex and E-mail is that 
there can be both code breaks, CODE888/CODE, line breaks, and also 
quoted printable encoding.  For instance, between every two characters 
that display immediately together and that you are attempting to match 
without normalizing, you would need to test for:


   (?=\r\n|(?[^]+)+)

It gets a lot worse when you start trying to apply spaces because of all 
the ways that this can appear.  If Declude wants to get serious about 
applying regular expressions to the bodies of E-mail, you would need to 
normalize the data otherwise you would end up with too many 
permutations.  When I do this programatically, I produce a range of 
variables, for instance one that is the full original source, one that 
strips out all line breaks, removes quoted-printable encoding, removes 
HTML, and combinations there-of.  If you are going to try to use regular 
expressions for finding phrases, it is the only way to do this without 
leaving a huge gaping hole that even standard E-mail clients will 
produce source that would be missed.  If you are going after E-mail 
format and not the content, then what you have is perfect.


Matt




David Barker wrote:
This would match on all you have provided, the . meaning any character 
including a space {0,1} means min of 0 max of 1


(206.{0,1}888.{0,1}2083)

If you wanted to use detect O as well as the 0 [o0] also you could use 
the ?i: meaning case insensitive:


(?i:2[o0]6.{0,1}888.{0,1}2[o0]83)

David B


*From*: Matt [EMAIL PROTECTED]
*Sent*: Tuesday, July 03, 2007 4:08 PM
*To*: declude.junkmail@declude.com
*Subject*: Re: [Declude.JunkMail] phone regex/pcre help

Scott,

The following should do the same.  Note that I do not know if Declude 
requires the whole match to be placed in parenthesis.


2[0Oo]6[\s\r\n\-\.]*888[\s\r\n\-\.]*2[0Oo]83

Matt



Scott Fisher wrote:


I'm looking to replace these lines with a pcre but it doesn't seem to 
be working. Any suggestions?


 


BODY 175 CONTAINS 206 888-2083

BODY 175 CONTAINS 206.8882083

BODY 175 CONTAINS 2068882083

BODY 175 CONTAINS 206-8882083

BODY 175 CONTAINS 206 8882083

 

BODY   175   PCRE   
(?i:[\(\{]?2[0o]6[\)\}]?{\-\_\.\s}?888{\-\_\.\s}?2[0o]83)


 


Scott Fisher

Dir of IT

Farm Progress Companies

191 S Gary Ave

Carol Stream, IL 60188

Tel: 630-462-2323

 

/This email message, including any attachments, is for the sole use 
of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of 
the original message. Although Farm Progress Companies has taken 
reasonable precautions to ensure no viruses are present in this 
email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments./


 



---
This E-mail came from the Declude.JunkMail mailing list. To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail. The archives can be found
at http://www.mail-archive.com. 


---
This E-mail came from the Declude.JunkMail mailing list. To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail. The archives can be found
at http://www.mail-archive.com.

---
This E-mail came from the Declude.JunkMail mailing list. To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail. The archives can be found
at http://www.mail-archive.com. 



---
This E-mail came from the Declude.JunkMail mailing list.  To
unsubscribe, just send an E-mail to [EMAIL PROTECTED], and
type unsubscribe Declude.JunkMail.  The archives can be found
at http://www.mail-archive.com.