Wow, thanks for all the great responses! Here's my summary:
- demoronizer (from John Walker) is designed to solve some very particular problems that could be considered bugs. However, it doesn't remove the unnecessary html generated by Word. http://www.fourmilab.ch/webtools/demoroniser/ - The tool from Microsoft can be used in two ways: you can copy html to the clipboard or export to "compact html". The former results in slightly cleaner html but doesn't include the style sheet and so the rendering isn't as nice; the latter does include the style sheet but it's got slightly more junk in it. Both approaches preserve the "blank" paragraphs (basically, <p> </p>) for spacing, which is unnecessary and clutters up the html. This tool did properly preserve the footnotes in my test document. http://www.microsoft.com/downloads/details.aspx?FamilyID=209ADBEE-3FBD-482C-83B0-96FB79B74DED&displaylang=EN BTW, I didn't know this, but much of the extra html was added by Microsoft to allow round-tripping between html and Word. - Tidy with Win2000 configuration: It's already bundled in with my editor (PSPad) so this was a nice surprise (I guess I never explored that submenu -- that's the "problem" with modern editors and their zillions of features). The tidy output could use a more whitespace to improve html readability, but I assume I can change the config file to do this. No "blank paragraphs" (better than the Microsoft tool) but footnotes were messed up. http://www.w3.org/People/Raggett/tidy/ -- jeff -- http://mail.python.org/mailman/listinfo/python-list