Re: [Tutor] list all links with certain extension in an html file python

Stefan Behnel Fri, 28 Sep 2012 07:01:34 -0700

Santosh Kumar, 16.09.2012 09:20:
> I want to extract (no I don't want to download) all links that end in
> a certain extension.
> 
> Suppose there is a webpage, and in the head of that webpage there are
> 4 different CSS files linked to external server. Let the head look
> like this:
> 
>     <link rel="stylesheet" type="text/css" href="http://foo.bar/part1.css";>
>     <link rel="stylesheet" type="text/css" href="http://foo.bar/part2.css";>
>     <link rel="stylesheet" type="text/css" href="http://foo.bar/part3.css";>
>     <link rel="stylesheet" type="text/css" href="http://foo.bar/part4.css";>
> 
> Please note that I don't want to download those CSS, instead I want
> something like this (to stdout):
> 
>     http://foo.bar/part1.css
>     http://foo.bar/part1.css
>     http://foo.bar/part1.css
>     http://foo.bar/part1.css
> 
> Also I don't want to use external libraries.


That's too bad because lxml.html would make this really easy. See the
iterlinks() method here:

http://lxml.de/lxmlhtml.html#working-with-links

Note this this also handles links in embedded CSS code etc., although you
might not be interested in that, if the example above is representative for
your task.

Stefan


_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] list all links with certain extension in an html file python

Reply via email to