On 01/04/15 20:22, Abdullah Al Imran wrote:
I have some HTML content where there are many links as the following pattern:

<a href="http://example.com/2013/01/problem1.html";>Problem No-1</a><br />

I want to filter all the links  into a list as:
['http://example.com/2013/01/problem1.html', 
'http://example.com/2013/02/problem2.html']

How to do it using Python Regular Expression?

You can try, but regular expressions are not a reliable way
to parse HTML.

You are much better to use a dedicated HTML parser such
as the one  found in  htmllib in the standard library or
a third party tool like BeautifulSoup.

These recognise the different tag types and separate the content
and data for you. You can then just ask for the parser to
find <a...> tags and then fetch the data from each tag.

HTH
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to