RE: uri regex

Bret Miller Wed, 15 Jun 2005 10:38:54 -0700

> >> I flunked the IQ test so I need some help. I want to match
> all domains
> >> in the body that are not in .com,.org.us,.edu,.gov and .mil. But
> >> there's more. I need to match some characters at the end
> of the URI
> >> that can often be found there such as >.?)*!"';
> >>
> >> The rule would match http://www.go.za and
> http://www.go.za), but not
> >> match http://www.go.com
> >>
> >> Here's my regex that does not work...
> >>
> >>
> m{https?://[^\s/:"')!?>*]+(?<!\.com)(?<!\.net)(?<!\.org)(?<!\.
> gov)(?<!\.us)(?<!\.edu)(?<!\.mil)(?:"|'|:|\?|!|>|\*|\)|$)}
> >>
> >>
> >>
> >> It works for all of the characters except for an ending
> "." such as
> >> http://www.go.com.
> >>
> >> I have grappled with this for some time and read the
> pcrepattern.txt
> >> accompanying Exim source, but damn if I can get it to
> work. Anybody
> >> want to spit out the answer?
> >
> >
> > Assuming that you are creating a SA rule, have you
> considered using a
> > uri test?  That way you wouldn't have to worry about the extra
> > characters at the end.  SA would take care of it for you.
> >
> Yes, it is a uri test which I patterned after WEIRD_PORTS in 20_uri
>
> Mine is like this...
>
> uri SUSPECT_DOM_CJ =~ <expression>
> score SUSPECT_DOM_CJ <score>
>
> I didn't know that SA took care of the ending characters in
> uri tests. I'll take another look to consider this. Thanks.



That I do know a little about. The developers have been working on
handling extra characters on the end of URIs. I think the fix got into
3.0.4 so you should probably upgrade if you haven't.

Bret

RE: uri regex

Reply via email to