Hi all python experts

I am trying to work with BeautifulSoup and re and running into one problem.

What I want to do is open a webpage and get some information. This is working fine I then want to follow some of links on this page and proces them. I manage to get links that I am interested in filtered out with by simple re expressions. My problem is that I now have a number of string that look like

'text  "http:\123\interesting_adress\etc\etc\" more text'

I have figured out that if it wasn't for the \ a simple
p=re.compile('\"\w+\"') would do the trick. From what I understand \w only covers the set [a-zA-Z0-9_] and hence not the "\". I assume the solution is just in front of my eyes, and I have been looking on the screen for too long. Any hints would be appreciated.


In [72]: p=re.compile('"\w+\"')

In [73]: p.findall('asdsa"123abc123"jggfds')
Out[73]: ['"123abc123"']

In [74]: p.findall('asdsa"123abc\123"jggfds')
Out[74]: ['"123abcS"']

/Johan

--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to