subject:"downloading from links within a webpage"

downloading from links within a webpage

2014-10-14 Thread Shiva

Hi,

Here is a small code that I wrote that downloads images from a webpage url
specified (you can limit to how many downloads you want). However, I am
looking at adding functionality and searching external links from this page
and downloading the same number of images from that page as well.(And
limiting the depth it can go to)

Any ideas?  (I am using Python 3.4  I am a beginner)

import urllib.request
import re
url=http://www.abc.com;

pagehtml = urllib.request.urlopen(url)
myfile = pagehtml.read()
matches=re.findall(r'http://\S+jpg|jpeg',str(myfile))


for urltodownload in matches[0:50]:
  imagename=urltodownload[-12:]
  urllib.request.urlretrieve(urltodownload,imagename)

print('Done!')
 
Thanks,
Shiva

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: downloading from links within a webpage

2014-10-14 Thread Chris Angelico

On Wed, Oct 15, 2014 at 1:42 AM, Shiva
shivaji...@yahoo.com.dmarc.invalid wrote:
 Here is a small code that I wrote that downloads images from a webpage url
 specified (you can limit to how many downloads you want). However, I am
 looking at adding functionality and searching external links from this page
 and downloading the same number of images from that page as well.(And
 limiting the depth it can go to)

 Any ideas?  (I am using Python 3.4  I am a beginner)

First idea: Use wget, it does all this for you :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: downloading from links within a webpage

2014-10-14 Thread Joel Goldstick

On Tue, Oct 14, 2014 at 10:54 AM, Chris Angelico ros...@gmail.com wrote:
 On Wed, Oct 15, 2014 at 1:42 AM, Shiva
 shivaji...@yahoo.com.dmarc.invalid wrote:
 Here is a small code that I wrote that downloads images from a webpage url
 specified (you can limit to how many downloads you want). However, I am
 looking at adding functionality and searching external links from this page
 and downloading the same number of images from that page as well.(And
 limiting the depth it can go to)

 Any ideas?  (I am using Python 3.4  I am a beginner)

 First idea: Use wget, it does all this for you :)

You might look at Requests and BeautifulSoup python modules.  Requests
is easier for many things than urllib.  BS is an HTML parser that may
be easier, and more powerful than what you can do with regex

 ChrisA
 --
 https://mail.python.org/mailman/listinfo/python-list



-- 
Joel Goldstick
http://joelgoldstick.com
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: downloading from links within a webpage

2014-10-14 Thread Rustom Mody

On Tuesday, October 14, 2014 8:12:56 PM UTC+5:30, Shiva wrote:
 Hi,

 Here is a small code that I wrote that downloads images from a webpage url
 specified (you can limit to how many downloads you want). However, I am
 looking at adding functionality and searching external links from this page
 and downloading the same number of images from that page as well.(And
 limiting the depth it can go to)

 Any ideas?  (I am using Python 3.4  I am a beginner)

 import urllib.request
 import re

Read this:
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
-- 
https://mail.python.org/mailman/listinfo/python-list

downloading from links within a webpage

Re: downloading from links within a webpage

Re: downloading from links within a webpage

Re: downloading from links within a webpage

4 matches

Site Navigation

Mail list logo

Footer information