On Tue, Sep 22, 2009 at 2:39 PM, prasad rao <prasadarao...@gmail.com> wrote: > hello friends > I am trying to write a class to save a url.page. > But it is not working.It is saving the html.page.But not getting > images.I am unable to show the list (object.links). > Please take a look at it and show me how to rectify it. > > import urllib2,ftplib,re > class Collect: > def __init__(self,parent): > self.parent=parent > self.links=[] > self.ims=[] > s=urllib2.urlopen(self.parent) > data=s.read() > self.data=data > a=re.compile ('<[aA].*[\'"](.*)[\'"].*>'); b=re.compile('<src > img[\'"](.+)[\'"].*') > try: > z=re.search(a,self.data).group(1) > self.links.extend(z) > except:pass > try: > y=re.search(b,self.data).group(1) > self.ims.extend(y) > except:pass > return > > def save(self,data): > d=open('C:/%s .html'%self.parent[10:15],'w') > d.write(data) > return > def bring(self): > ftp=ftplib.FTP(self.parent) > ftp.login() > for x in self.ims: > data=ftp.retlines(x) > d=open('C:/%s'%x,'w') > d.write(data) > return > > def show(self,z): > for x in z: > print x > return > > > c=Collect('http://www.asstr.org') > c.save(c.data) > c.bring() > #c.show(c.ims) > c.links > > Thanks in advance. > _______________________________________________ > Tutor maillist - tu...@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > >
Might be nice to mention to all who access these emails from work that the site this script is scraping is not safe for work. -Mal _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor