Santosh Kumar, 16.09.2012 09:20: > I want to extract (no I don't want to download) all links that end in > a certain extension. > > Suppose there is a webpage, and in the head of that webpage there are > 4 different CSS files linked to external server. Let the head look > like this: > > <link rel="stylesheet" type="text/css" href="http://foo.bar/part1.css"> > <link rel="stylesheet" type="text/css" href="http://foo.bar/part2.css"> > <link rel="stylesheet" type="text/css" href="http://foo.bar/part3.css"> > <link rel="stylesheet" type="text/css" href="http://foo.bar/part4.css"> > > Please note that I don't want to download those CSS, instead I want > something like this (to stdout): > > http://foo.bar/part1.css > http://foo.bar/part1.css > http://foo.bar/part1.css > http://foo.bar/part1.css > > Also I don't want to use external libraries.
That's too bad because lxml.html would make this really easy. See the iterlinks() method here: http://lxml.de/lxmlhtml.html#working-with-links Note this this also handles links in embedded CSS code etc., although you might not be interested in that, if the example above is representative for your task. Stefan _______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
