[Tutor] extract hosts from html write to file

sacha rook Tue, 11 Sep 2007 09:19:10 -0700

Hi I wonder if anyone can help with the following
 
I am trying to read a html page extract only fully qualified hostnames from the 
page and output these hostnames to a file on disk to be used later as input to 
another program.
 
I have this so far
 
import urllib2f=open("c:/tmp/newfile.txt", "w")for line in 
urllib2.urlopen("http://www.somedomain.uk";):    if "href" in line and "http://"; 
in line:        print line        
f.write(line)f.close()fu=open("c:/tmp/newfile.txt", "r")    for line in 
fu.readlines():    print line       
 
so i have opened a file to write to, got a page of html, printed and written 
those to file that contain href & http:// references.
closed file opened file read all the lines from file and printed out
 
Can someone point me in right direction please on the flow of this program, the 
best way to just extract the hostnames and print these to file on disk?
 
As you can see I am newish to this
 
Thanks in advance for any help given!
 
s
_________________________________________________________________
Feel like a local wherever you go.
http://www.backofmyhand.com

_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

[Tutor] extract hosts from html write to file

Reply via email to