cool1...@gmail.com writes: > Here are some scripts, how do I put them together to create the script > I want? (to search a online document and download all the links in it) > p.s: can I set a destination folder for the downloads?
You can use os.chdir to go to the desired folder. > > urllib.urlopen("http://....") > > possible_urls = re.findall(r'\S+:\S+', text) > > import urllib2 > response = urllib2.urlopen('http://www.example.com/') > html = response.read() If you insist on not using wget, here is a simple script with BeautifulSoup (v4): ######################################################################## from bs4 import BeautifulSoup from urllib2 import urlopen from urlparse import urljoin import os import re os.chdir('OUT') def generate_filename(url): url = re.sub('^[a-zA-Z0-9+.-]+:/*', '', url) return url.replace('/', '_') URL = "http://www.example.com/" soup = BeautifulSoup(urlopen(URL).read()) links = soup.select('a[href]') for link in links: url = urljoin(URL, link['href']) print url html = urlopen(url).read() fn = generate_filename(url) with open(fn, 'wb') as outfile: outfile.write(html) ######################################################################## You should add a more intelligent filename generator, filter out mail: urls and possibly others and add exception handling for HTTP errors. -- Piet van Oostrum <p...@vanoostrum.org> WWW: http://pietvanoostrum.com/ PGP key: [8DAE142BE17999C4] -- http://mail.python.org/mailman/listinfo/python-list