Hi I wonder if anyone can help with the following
I am trying to read a html page extract only fully qualified hostnames from the
page and output these hostnames to a file on disk to be used later as input to
another program.
I have this so far
import urllib2f=open("c:/tmp/newfile.txt", "w")for line in
urllib2.urlopen("http://www.somedomain.uk"): if "href" in line and "http://"
in line: print line
f.write(line)f.close()fu=open("c:/tmp/newfile.txt", "r") for line in
fu.readlines(): print line
so i have opened a file to write to, got a page of html, printed and written
those to file that contain href & http:// references.
closed file opened file read all the lines from file and printed out
Can someone point me in right direction please on the flow of this program, the
best way to just extract the hostnames and print these to file on disk?
As you can see I am newish to this
Thanks in advance for any help given!
s
_________________________________________________________________
Feel like a local wherever you go.
http://www.backofmyhand.com
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor