On 01/04/15 20:22, Abdullah Al Imran wrote:
I have some HTML content where there are many links as the following pattern:
<a href="http://example.com/2013/01/problem1.html">Problem No-1</a><br />
I want to filter all the links into a list as:
['http://example.com/2013/01/problem1.html',
'http://example.com/2013/02/problem2.html']
How to do it using Python Regular Expression?
You can try, but regular expressions are not a reliable way
to parse HTML.
You are much better to use a dedicated HTML parser such
as the one found in htmllib in the standard library or
a third party tool like BeautifulSoup.
These recognise the different tag types and separate the content
and data for you. You can then just ask for the parser to
find <a...> tags and then fetch the data from each tag.
HTH
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor