Re: Use Regular Expressions to extract URL's

2010-05-01 Thread Walter Overby
A John Gruber post from November seems relevant.  I have not tried his
regex in any language.

http://daringfireball.net/2009/11/liberal_regex_for_matching_urls

Regards,

Walter.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Use Regular Expressions to extract URL's

2010-04-30 Thread Steven D'Aprano
On Thu, 29 Apr 2010 23:53:06 -0700, Jimbo wrote:

 Hello
 
 I am using regular expressions to grab URL's from a string(of HTML
 code). I am getting on very well  I seem to be grabbing the full URL
 [b]but[/b]
 I also get a '' character at the end of it. Do you know how I can get
 rid of the '' char at the end of my URL

Live dangerously and just drop the last character from string s no matter 
what it is:

s = s[:-1]


Or be a little more cautious and test first:

if s.endswith(''):
s = s[:-1]


Or fix the problem at the source. Using regexes to parse HTML is always 
problematic. You should consider using a proper HTML parser. Otherwise, 
try this regex:

r'(http://(?:www)?\..*?)'



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Use Regular Expressions to extract URL's

2010-04-30 Thread Novocastrian_Nomad
Or perhaps more generically:

 import re

 string = 'scatter http://.yahoo.com quotes and text anywhere 
 www.google.com www.bing.com or not'

 print re.findall(r'(?:http://|www.)[^\s]+',string)

['http://.yahoo.com', 'www.google.com', 'www.bing.com']
-- 
http://mail.python.org/mailman/listinfo/python-list