El 18/11/2011 11:01, Hussein Shafie escribió: > On 11/17/2011 09:15 PM, klaus e. werner wrote: >> >> I have made good experiences with LibreOffice (ex OpenOffice). Just open >> it from LibreOffice Writer, and save it as "docbook *.xml". >> >> The structure might not come out very well at the first try, but slight >> changes (Heading -> Header 1, Header 1 -> Header 2) do wonders. >> >> Then updating to DocBook 5 with XXE, maybe some simple XSLT stylesheet >> and you're done. >> >> I'd say: make a test and have a look at the outcome. >> >> p.s. You don't need any special setup to do this, just a normal >> LibreOffice install is enough. > > Thank you for this information. Because we are *very* *interested* in > this feature, we immediately put what you have suggested into test. > > For that, we used a *simple* .doc file containings headings, nested > lists, tables, figures, etc, styled exclusively using normal styles. > > See attached files: simple.doc, the input file and simple_from_doc.xml > the DocBook file generated by LibreOffice 3.3.1. > > Then we did the same test with an equivalent .docx file with no better > results. > > Our conclusions are: > > LibreOffice does a poor job at opening .doc and .docx file. In > consequence, it does a poor job at saving these files as DocBook XML.
It seems that the problem lies more in the Word side than in the LibreOffice side. My assumption is that lists in .doc files are represented internally as sequences of specialized paragraphs. If you export the Word document as HTML, or as native wml (the XML proprietary format, ) or as RTF, all of them human readable, you always see sequences of paragraphs instead of list structures marked as such. Once upon a time I've converted a big document from an old Wordperfect format into XHTML. MS Word was used to open the WP legacy format and export it as HTML, then converted to XHTML and edited with XXE. To easy the manual editing task, I wrote a customized version of the XHTML configuration, with additional editing macros. In particular, to convert a sequence of paragraphs into a properly structured list. -- Manuel Collado - http://lml.ls.fi.upm.es/~mcollado -- XMLmind XML Editor Support List [email protected] http://www.xmlmind.com/mailman/listinfo/xmleditor-support

