El 18/11/2011 11:01, Hussein Shafie escribió:
> On 11/17/2011 09:15 PM, klaus e. werner wrote:
>>
>> I have made good experiences with LibreOffice (ex OpenOffice). Just open
>> it from LibreOffice Writer, and save it as "docbook *.xml".
>>
>> The structure might not come out very well at the first try, but slight
>> changes (Heading -> Header 1, Header 1 -> Header 2) do wonders.
>>
>> Then updating to DocBook 5 with XXE, maybe some simple XSLT stylesheet
>> and you're done.
>>
>> I'd say: make a test and have a look at the outcome.
>>
>> p.s. You don't need any special setup to do this, just a normal
>> LibreOffice install is enough.
>
> Thank you for this information. Because we are *very* *interested* in
> this feature, we immediately put what you have suggested into test.
>
> For that, we used a *simple* .doc file containings headings, nested
> lists, tables, figures, etc, styled exclusively using normal styles.
>
> See attached files: simple.doc, the input file and simple_from_doc.xml
> the DocBook file generated by LibreOffice 3.3.1.
>
> Then we did the same test with an equivalent .docx file with no better
> results.
>
> Our conclusions are:
>
> LibreOffice does a poor job at opening .doc and .docx file. In
> consequence, it does a poor job at saving these files as DocBook XML.

It seems that the problem lies more in the Word side than in the 
LibreOffice side. My assumption is that lists in .doc files are 
represented internally as sequences of specialized paragraphs. If you 
export the Word document as HTML, or as native wml (the XML proprietary 
format, ) or as RTF, all of them human readable, you always see 
sequences of paragraphs instead of list structures marked as such.

Once upon a time I've converted a big document from an old Wordperfect 
format into XHTML. MS Word was used to open the WP legacy format and 
export it as HTML, then converted to XHTML and edited with XXE.

To easy the manual editing task, I wrote a customized version of the 
XHTML configuration, with additional editing macros. In particular, to 
convert a sequence of paragraphs into a properly structured list.

-- 
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado

 
--
XMLmind XML Editor Support List
[email protected]
http://www.xmlmind.com/mailman/listinfo/xmleditor-support

Reply via email to