On Sun, 2005-02-27 at 12:37 -0500, David A. Desrosiers wrote:
> > 1)Do you know if there is problems when converting *.htm files, 
> > created using Microsoft Word 2000, to plucker format?
> 
>       Yes, Microsoft doesn't understand how to use standards, and 
> they made up their own invalid HTML standard instead. You'll need to 
> make sure you run the *.html files through something to remove the 
> Microsoft-specific bits from the code. 
> 
>       Try looking into the Microsoft Office 2000 HTML Filter: 
> 
>       http://tinyurl.com/3okuf
> 
>       "The Office HTML Filter is a tool you can use to remove 
>        Office-specific markup tags embedded in Office 2000 documents 
>        saved as HTML."
> 
> 

If you encounter any difficulty fixing "ms html" to something standard,
I have a program you might try.

It's a very small python program built overtop of the BeautifulSoup
python module, which is intended for parsing possibly-very-broken html,
and making sense of same.

Or alternatively, if you have a sticky .htm you want to translate, you
could mail me the URL and I'll try it.

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to