[Tutor] RE expressions

Johan Nilsson Fri, 15 Aug 2008 13:59:36 -0700

Hi all python experts


I am trying to work with BeautifulSoup and re and running into one problem.

What I want to do is open a webpage and get some information. This isworking fineI then want to follow some of links on this page and proces them. Imanage to get links that I am interested in filtered out with by simple reexpressions. My problem is that I now have a number of string that looklike


'text  "http:\123\interesting_adress\etc\etc\" more text'

I have figured out that if it wasn't for the \ a simple

p=re.compile('\"\w+\"') would do the trick. From what I understand \w onlycovers the set [a-zA-Z0-9_] and hence not the "\".I assume the solution is just in front of my eyes, and I have been lookingon the screen for too long. Any hints would be appreciated.



In [72]: p=re.compile('"\w+\"')

In [73]: p.findall('asdsa"123abc123"jggfds')
Out[73]: ['"123abc123"']

In [74]: p.findall('asdsa"123abc\123"jggfds')
Out[74]: ['"123abcS"']

/Johan

--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

[Tutor] RE expressions

Reply via email to