RFC 2396 includes a regular expression for parsing URIs, as well as the Backus-Naur description of a URI from which you could write a parser or a regular expression.

Charles Yeomans

On Apr 18, 2006, at 1:44 AM, Christer Olsson wrote:

I'm trying to write a regex for extracting URLs, and have one tricky case I can't catch. I'm now using the following (greedy) regex

(http|https)://(((.*):(.*)@)?)(@)?([^:>/\s""]*)

which (as far I can see) will catch URLs like

http://www.foo.com
http://www.foo.com/bar
http://www.foo.com:80/bar
http://www.foo.com:80/bar:bar
http://user:[EMAIL PROTECTED]:80/bar
http://@www.foo.com:80/bar

but will fail on this

http://@www.foo.com:80/[EMAIL PROTECTED]

Any help very much appreciated (and I would prefer to stay pure regex, with no post processing of the match)

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to