Kent Johnson wrote: > Alfonso wrote: > >> I'm writing a script to retrieve and print some links of a page. These >> links begin wiht "/dog/", so I use a regular expresion to try to find >> them. The problem is that the script only retrieves a link per line in >> the page. I mean, if the line hat several links, the script only reports >> the first. I can't find where is the mistake. Does anyone hat a idea, >> what I have false made? >> > > You are reading the data by line using readlines(). You only search each > line once. regex.findall() or regex.finditer() would be a better choice > than regex.search(). > > You might also be interested in sgmllib-based solutions to this problem, > which will generally be more robust than regex-based searching. For > example, see > http://diveintopython.org/html_processing/extracting_data.html > http://www.w3journal.com/6/s3.vanrossum.html#MARKER-9-26 > > Kent > > >> Thank you very much for your help. >> >> >> import re >> from urllib import urlopen >> >> fileObj = urlopen("http://name_of_the_page") >> links = [] >> regex = re.compile ( "((/dog/)[^ \"\'<>;:,]+)",re.I) >> >> for a in fileObj.readlines(): >> result = regex.search(a) >> if result: >> print result.group() >> >> >> >> >> ______________________________________________ >> LLama Gratis a cualquier PC del Mundo. >> Llamadas a fijos y móviles desde 1 céntimo por minuto. >> http://es.voice.yahoo.com >> _______________________________________________ >> Tutor maillist - Tutor@python.org >> http://mail.python.org/mailman/listinfo/tutor >> >> >> > > > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > > Thank you very much, Kent, it works with findall(). I will also have a look at the links about sgmllib.
______________________________________________ LLama Gratis a cualquier PC del Mundo. Llamadas a fijos y móviles desde 1 céntimo por minuto. http://es.voice.yahoo.com _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor