downloading from links within a webpage

2014-10-14 Thread Shiva
Hi,

Here is a small code that I wrote that downloads images from a webpage url
specified (you can limit to how many downloads you want). However, I am
looking at adding functionality and searching external links from this page
and downloading the same number of images from that page as well.(And
limiting the depth it can go to)

Any ideas?  (I am using Python 3.4  I am a beginner)

import urllib.request
import re
url=http://www.abc.com;

pagehtml = urllib.request.urlopen(url)
myfile = pagehtml.read()
matches=re.findall(r'http://\S+jpg|jpeg',str(myfile))


for urltodownload in matches[0:50]:
  imagename=urltodownload[-12:]
  urllib.request.urlretrieve(urltodownload,imagename)

print('Done!')
 
Thanks,
Shiva

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: downloading from links within a webpage

2014-10-14 Thread Chris Angelico
On Wed, Oct 15, 2014 at 1:42 AM, Shiva
shivaji...@yahoo.com.dmarc.invalid wrote:
 Here is a small code that I wrote that downloads images from a webpage url
 specified (you can limit to how many downloads you want). However, I am
 looking at adding functionality and searching external links from this page
 and downloading the same number of images from that page as well.(And
 limiting the depth it can go to)

 Any ideas?  (I am using Python 3.4  I am a beginner)

First idea: Use wget, it does all this for you :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: downloading from links within a webpage

2014-10-14 Thread Joel Goldstick
On Tue, Oct 14, 2014 at 10:54 AM, Chris Angelico ros...@gmail.com wrote:
 On Wed, Oct 15, 2014 at 1:42 AM, Shiva
 shivaji...@yahoo.com.dmarc.invalid wrote:
 Here is a small code that I wrote that downloads images from a webpage url
 specified (you can limit to how many downloads you want). However, I am
 looking at adding functionality and searching external links from this page
 and downloading the same number of images from that page as well.(And
 limiting the depth it can go to)

 Any ideas?  (I am using Python 3.4  I am a beginner)

 First idea: Use wget, it does all this for you :)

You might look at Requests and BeautifulSoup python modules.  Requests
is easier for many things than urllib.  BS is an HTML parser that may
be easier, and more powerful than what you can do with regex

 ChrisA
 --
 https://mail.python.org/mailman/listinfo/python-list



-- 
Joel Goldstick
http://joelgoldstick.com
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: downloading from links within a webpage

2014-10-14 Thread Rustom Mody
On Tuesday, October 14, 2014 8:12:56 PM UTC+5:30, Shiva wrote:
 Hi,

 Here is a small code that I wrote that downloads images from a webpage url
 specified (you can limit to how many downloads you want). However, I am
 looking at adding functionality and searching external links from this page
 and downloading the same number of images from that page as well.(And
 limiting the depth it can go to)

 Any ideas?  (I am using Python 3.4  I am a beginner)

 import urllib.request
 import re

Read this:
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
-- 
https://mail.python.org/mailman/listinfo/python-list