Alfonso wrote: > I'm writing a script to retrieve and print some links of a page. These > links begin wiht "/dog/", so I use a regular expresion to try to find > them. The problem is that the script only retrieves a link per line in > the page. I mean, if the line hat several links, the script only reports > the first. I can't find where is the mistake. Does anyone hat a idea, > what I have false made?
You are reading the data by line using readlines(). You only search each line once. regex.findall() or regex.finditer() would be a better choice than regex.search(). You might also be interested in sgmllib-based solutions to this problem, which will generally be more robust than regex-based searching. For example, see http://diveintopython.org/html_processing/extracting_data.html http://www.w3journal.com/6/s3.vanrossum.html#MARKER-9-26 Kent > > Thank you very much for your help. > > > import re > from urllib import urlopen > > fileObj = urlopen("http://name_of_the_page") > links = [] > regex = re.compile ( "((/dog/)[^ \"\'<>;:,]+)",re.I) > > for a in fileObj.readlines(): > result = regex.search(a) > if result: > print result.group() > > > > > ______________________________________________ > LLama Gratis a cualquier PC del Mundo. > Llamadas a fijos y móviles desde 1 céntimo por minuto. > http://es.voice.yahoo.com > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > > _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor