hi.

i have a ton of html files from which i want to
extract the plain english words, and then write
those words into a single text file.

example:
<html>
<head>
<... all kinds html tags ...>
<font color=99cccc size=5>
this is text
</font>

from the above, i want to extract the string 
'this is text' and write it out to a text file.
note that all of the html files have the same 
format, i.e. the text is always surrounded by the same
html tags.
also, i am sorting through thousands of
html files, so whatever i do needs to be
fast.

any ideas?

marc


---------------------------------------------------------------------------------------
The apocalyptic vision of a criminally insane charismatic cult leader 

   http://www.marcbuehler.net
----------------------------------------------------------------------------------------


                
__________________________________ 
Yahoo! Music Unlimited 
Access over 1 million songs. Try it free.
http://music.yahoo.com/unlimited/
_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Reply via email to