hi. i have a ton of html files from which i want to extract the plain english words, and then write those words into a single text file.
example: <html> <head> <... all kinds html tags ...> <font color=99cccc size=5> this is text </font> from the above, i want to extract the string 'this is text' and write it out to a text file. note that all of the html files have the same format, i.e. the text is always surrounded by the same html tags. also, i am sorting through thousands of html files, so whatever i do needs to be fast. any ideas? marc --------------------------------------------------------------------------------------- The apocalyptic vision of a criminally insane charismatic cult leader http://www.marcbuehler.net ---------------------------------------------------------------------------------------- __________________________________ Yahoo! Music Unlimited Access over 1 million songs. Try it free. http://music.yahoo.com/unlimited/ _______________________________________________ Tutor maillist - [email protected] http://mail.python.org/mailman/listinfo/tutor
