Re: [Tutor] printing the links of a page (regular expressions)

Alfonso Sat, 06 May 2006 08:02:29 -0700

Kent Johnson wrote:
> Alfonso wrote:
>   
>> I'm writing a script to retrieve and print some links of a page. These 
>> links begin wiht "/dog/", so I use a regular expresion to try to find 
>> them. The problem is that the script only retrieves a link per line in 
>> the page. I mean, if the line hat several links, the script only reports 
>> the first. I can't find where is the mistake. Does anyone hat a idea, 
>> what I have false made? 
>>     
>
> You are reading the data by line using readlines(). You only search each 
> line once. regex.findall() or regex.finditer() would be a better choice 
> than regex.search().
>
> You might also be interested in sgmllib-based solutions to this problem, 
> which will generally be more robust than regex-based searching. For 
> example, see
> http://diveintopython.org/html_processing/extracting_data.html
> http://www.w3journal.com/6/s3.vanrossum.html#MARKER-9-26
>
> Kent
>
>   
>> Thank you very much for your help.
>>
>>
>> import re
>> from urllib import urlopen
>>
>> fileObj = urlopen("http://name_of_the_page";)
>> links = []
>> regex = re.compile ( "((/dog/)[^ \"\'<>;:,]+)",re.I)
>>
>> for a in fileObj.readlines():
>>         result = regex.search(a)
>>         if result:
>>                 print result.group()
>>
>>
>>
>>              
>> ______________________________________________ 
>> LLama Gratis a cualquier PC del Mundo. 
>> Llamadas a fijos y móviles desde 1 céntimo por minuto. 
>> http://es.voice.yahoo.com
>> _______________________________________________
>> Tutor maillist  -  [email protected]
>> http://mail.python.org/mailman/listinfo/tutor
>>
>>
>>     
>
>
> _______________________________________________
> Tutor maillist  -  [email protected]
> http://mail.python.org/mailman/listinfo/tutor
>
>   
Thank you very much, Kent, it works with findall(). I will also have a
look at the links about
sgmllib.


                
______________________________________________ 
LLama Gratis a cualquier PC del Mundo. 
Llamadas a fijos y móviles desde 1 céntimo por minuto. 
http://es.voice.yahoo.com
_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] printing the links of a page (regular expressions)

Reply via email to