On Sun, 2005-02-27 at 12:37 -0500, David A. Desrosiers wrote: > > 1)Do you know if there is problems when converting *.htm files, > > created using Microsoft Word 2000, to plucker format? > > Yes, Microsoft doesn't understand how to use standards, and > they made up their own invalid HTML standard instead. You'll need to > make sure you run the *.html files through something to remove the > Microsoft-specific bits from the code. > > Try looking into the Microsoft Office 2000 HTML Filter: > > http://tinyurl.com/3okuf > > "The Office HTML Filter is a tool you can use to remove > Office-specific markup tags embedded in Office 2000 documents > saved as HTML." > >
If you encounter any difficulty fixing "ms html" to something standard, I have a program you might try. It's a very small python program built overtop of the BeautifulSoup python module, which is intended for parsing possibly-very-broken html, and making sense of same. Or alternatively, if you have a sticky .htm you want to translate, you could mail me the URL and I'll try it.
signature.asc
Description: This is a digitally signed message part

