downloading from links within a webpage
Hi, Here is a small code that I wrote that downloads images from a webpage url specified (you can limit to how many downloads you want). However, I am looking at adding functionality and searching external links from this page and downloading the same number of images from that page as well.(And limiting the depth it can go to) Any ideas? (I am using Python 3.4 I am a beginner) import urllib.request import re url=http://www.abc.com; pagehtml = urllib.request.urlopen(url) myfile = pagehtml.read() matches=re.findall(r'http://\S+jpg|jpeg',str(myfile)) for urltodownload in matches[0:50]: imagename=urltodownload[-12:] urllib.request.urlretrieve(urltodownload,imagename) print('Done!') Thanks, Shiva -- https://mail.python.org/mailman/listinfo/python-list
Re: downloading from links within a webpage
On Wed, Oct 15, 2014 at 1:42 AM, Shiva shivaji...@yahoo.com.dmarc.invalid wrote: Here is a small code that I wrote that downloads images from a webpage url specified (you can limit to how many downloads you want). However, I am looking at adding functionality and searching external links from this page and downloading the same number of images from that page as well.(And limiting the depth it can go to) Any ideas? (I am using Python 3.4 I am a beginner) First idea: Use wget, it does all this for you :) ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: downloading from links within a webpage
On Tue, Oct 14, 2014 at 10:54 AM, Chris Angelico ros...@gmail.com wrote: On Wed, Oct 15, 2014 at 1:42 AM, Shiva shivaji...@yahoo.com.dmarc.invalid wrote: Here is a small code that I wrote that downloads images from a webpage url specified (you can limit to how many downloads you want). However, I am looking at adding functionality and searching external links from this page and downloading the same number of images from that page as well.(And limiting the depth it can go to) Any ideas? (I am using Python 3.4 I am a beginner) First idea: Use wget, it does all this for you :) You might look at Requests and BeautifulSoup python modules. Requests is easier for many things than urllib. BS is an HTML parser that may be easier, and more powerful than what you can do with regex ChrisA -- https://mail.python.org/mailman/listinfo/python-list -- Joel Goldstick http://joelgoldstick.com -- https://mail.python.org/mailman/listinfo/python-list
Re: downloading from links within a webpage
On Tuesday, October 14, 2014 8:12:56 PM UTC+5:30, Shiva wrote: Hi, Here is a small code that I wrote that downloads images from a webpage url specified (you can limit to how many downloads you want). However, I am looking at adding functionality and searching external links from this page and downloading the same number of images from that page as well.(And limiting the depth it can go to) Any ideas? (I am using Python 3.4 I am a beginner) import urllib.request import re Read this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 -- https://mail.python.org/mailman/listinfo/python-list