Hello, I tried it and it works, though if I change the web site from http://finance.blog.lemonde.fr to http://www. ...something else it doesn't work.
DO I have to change the '([\S]+)' in x=re.findall(r"<img\s+src='([\S]+)'",page)? but into what? Thanks a lot 2011/4/28 naheed arafat <naheed...@gmail.com> > Observing the page source i think : > > page=urllib.urlopen('http://finance.blog.lemonde.fr').read() > > x=re.findall(r"<img\s+src='([\S]+)'",page) > #matches image source of the pattern like: > #<img src=' > http://finance.blog.lemonde.fr/filescropped/7642_300_400/2011/04/1157.1301668834.jpg > ' > y=re.findall(r"<img\s+src=\"([\S]+)\"",page) > # matches image source of the pattern like: > # <img src=" > http://s2.lemde.fr/image/2011/02/16/87x0/1480844_7_87fe_bandeau-lycee-electrique.jpg > " > x.extend(y) > x=list(set(x)) > for img in x: > image=img.split('.')[-1] > if image=='jpg': > print img > > >
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor