Hi Matthew

Sounds cool!

Tidy has an MS Word parameter that can be used to cleanup the kludge that is supposed to be HTML generated from this piece of software.

See: http://tidy.sourceforge.net/docs/quickref.html#word-2000

>From memory a couple of other things have to be used as well like drop-proprietary-attributes , drop-empty-paras and I added show-body-only to remove all the <head>guff</head>. Also did a str_replace() to get rid of those weirdo <o:p> things.

Have you used this in your convertor or rolled your own?

Cheers
James

On 9/13/05, Matthew Cruickshank <[EMAIL PROTECTED]> wrote:
Hi,

For a while now one problem that kept coming up was dealing with MS Word
files, and getting them into a format that was easy to parse. Anyway, I
figure some people might also have this problem so I wrote a web service
that converts MS Word to Oasis OpenDocument 1.0 format, and then
optionally runs the XML through an XSLT pipeline. So, basically, this
web service converts MS Word to arbitrary HTML or XML, and returns the
results in a .zip file.

The software, called Docvert, has reached 1.0 and is at
http://holloway.co.nz/docvert

Let me know what you think.


.Matthew Cruickshank
http://holloway.co.nz/

******************************************************
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list & getting help
******************************************************


Reply via email to