Re: Retrieve url's of all jpegs at a web page URL

Stefan Behnel Tue, 15 Sep 2009 21:37:27 -0700

Chris Rebert wrote:
> page_url = "http://the.url.here";
> 
> with urllib.urlopen(page_url) as f:
>     soup = BeautifulSoup(f.read())
> for img_tag in soup.findAll("img"):
>     relative_url = img_tag.src
>     img_url = make_absolute(relative_url, page_url)
>     save_image_from_url(img_url)
> 
> 2. Write make_absolute() and save_image_from_url()


Note that lxml.html provides a make_links_absolute() function.

Also untested:

        from lxml import html
        
        doc = html.parse(page_url)
        doc.make_links_absolute(page_url)

        urls = [ img.src for img in doc.xpath('//img') ]

Then use e.g. urllib2 to save the images.

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Retrieve url's of all jpegs at a web page URL

Reply via email to