Benji99 wrote:
I've managed to load the html source I want into an object called htmlsource using:


import urllib
sock = urllib.urlopen("URL Link")
htmlSource = sock.read()
sock.close()


I'm assuming that htmlSource is a string with \n at the end of each line.
NOTE: I've become very accustomed with the TStringList class in Delphi so forgive me if I'm trying to work in that way with Python...


Basically, I want to search through the whole string( htmlSource), for a specific keyword, when it's found, I want to know which line it's on so that I can retrieve that line and then I should be able to parse/extract what I need using Regular Expressions (which I'm getting quite confortable with). So how can this be accomplished?

The Pythonic way to do this is to iterate through the lines of htmlSource and process them one at a time.
htmlSource = htmlSource.split('\n') # Split on newline, making a list of lines
for line in htmlSource:
# Do something with line - check to see if it has the text of interest


You might want to look at Beautiful Soup. If you can find the links of interest by the tags around them it might do what you want:
http://www.crummy.com/software/BeautifulSoup/


Kent
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to