Observing the page source i think : page=urllib.urlopen('http://finance.blog.lemonde.fr').read()
x=re.findall(r"<img\s+src='([\S]+)'",page) #matches image source of the pattern like: #<img src=' http://finance.blog.lemonde.fr/filescropped/7642_300_400/2011/04/1157.1301668834.jpg ' y=re.findall(r"<img\s+src=\"([\S]+)\"",page) # matches image source of the pattern like: # <img src=" http://s2.lemde.fr/image/2011/02/16/87x0/1480844_7_87fe_bandeau-lycee-electrique.jpg " x.extend(y) x=list(set(x)) for img in x: image=img.split('.')[-1] if image=='jpg': print img
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor