> >> I flunked the IQ test so I need some help. I want to match > all domains > >> in the body that are not in .com,.org.us,.edu,.gov and .mil. But > >> there's more. I need to match some characters at the end > of the URI > >> that can often be found there such as >.?)*!"'; > >> > >> The rule would match http://www.go.za and > http://www.go.za), but not > >> match http://www.go.com > >> > >> Here's my regex that does not work... > >> > >> > m{https?://[^\s/:"')!?>*]+(?<!\.com)(?<!\.net)(?<!\.org)(?<!\. > gov)(?<!\.us)(?<!\.edu)(?<!\.mil)(?:"|'|:|\?|!|>|\*|\)|$)} > >> > >> > >> > >> It works for all of the characters except for an ending > "." such as > >> http://www.go.com. > >> > >> I have grappled with this for some time and read the > pcrepattern.txt > >> accompanying Exim source, but damn if I can get it to > work. Anybody > >> want to spit out the answer? > > > > > > Assuming that you are creating a SA rule, have you > considered using a > > uri test? That way you wouldn't have to worry about the extra > > characters at the end. SA would take care of it for you. > > > Yes, it is a uri test which I patterned after WEIRD_PORTS in 20_uri > > Mine is like this... > > uri SUSPECT_DOM_CJ =~ <expression> > score SUSPECT_DOM_CJ <score> > > I didn't know that SA took care of the ending characters in > uri tests. I'll take another look to consider this. Thanks.
That I do know a little about. The developers have been working on handling extra characters on the end of URIs. I think the fix got into 3.0.4 so you should probably upgrade if you haven't. Bret
