> Is there some reason you really want to convert to PDF first? You can > get much better HTML right from the Word doc. You'll lose a lot of info > going from PDF to HTML.
Right now, two reasons: Printing to PDF allows me to create the PDF "for the web" which means it has a much smaller filesize needed for downloading. Mainly accomplished with automatic image resizing. 2nd, htmltidy simply doesn't work for the files I get from Word HTML. I need it to work automagically without intervention and usable by not really PC literate people: D:\tmp\pdftohtml>tidy auberer.htm -o aub2.htm line 1 column 1 - Warning: missing <!DOCTYPE> declaration line 172 column 70 - Error: <o:p> is not recognized! line 172 column 70 - Warning: discarding unexpected <o:p> line 172 column 75 - Warning: discarding unexpected </o:p> line 176 column 47 - Error: <o:p> is not recognized! line 176 column 47 - Warning: discarding unexpected <o:p> line 176 column 52 - Warning: discarding unexpected </o:p> line 179 column 77 - Error: <o:p> is not recognized! line 179 column 77 - Warning: discarding unexpected <o:p> line 179 column 82 - Warning: discarding unexpected </o:p> line 182 column 55 - Error: <o:p> is not recognized! line 182 column 55 - Warning: discarding unexpected <o:p> line 182 column 60 - Warning: discarding unexpected </o:p> line 185 column 57 - Error: <o:p> is not recognized! line 185 column 57 - Warning: discarding unexpected <o:p> line 185 column 62 - Warning: discarding unexpected </o:p> line 188 column 55 - Error: <o:p> is not recognized! 94 warnings, 38 errors were found! Not all warnings/errors were shown. This document has errors that must be fixed before using HTML Tidy to generate a tidied up version. So far, pdftohtml has worked flawlessly and created much saner HTML output out of the box than Word 2000 _____________________________________________________________________ Der WEB.DE SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! http://smartsurfer.web.de/?mc=100071&distributionid=000000000066 -- http://mail.python.org/mailman/listinfo/python-list