Hello,
when it comes to document conversion, the effectiveness of tools is
directly related to "how well structured is the original document".
This is even more true when doing conversions with office software
(Libre/OpenOffice) and with some "word2*" tools out there.
If your original document has structure (read: all styling is done
using the office word processor software's "Styles" tool so the
document structural elements - chapter, section, title, paragraph,
etc. - are identifiable beyond their rendering or visual appearance),
chances are you're going to have to do very little to no
post-processing.
On the other hand, if your original document has very little to no
structure (read: the author concentrated more on the renedering, on
the way things looked, and for that he didn't use the "Styles" tool,
but instead used a combination of font settings and the B, U, and I
buttons, or their equivalents), chances are you're going to have to do
a lot of post-processing, or even have to redo it from scratch, or by
lots of copy/paste into the structured authoring tool of your choice.
This is not XXE (or any other conversion tools and scripts) fault,
this is due to the lack of structure information on the original, to
be converted, document.
On Fri, Nov 18, 2011 at 7:59 AM, Manuel Collado <[email protected]> wrote:
> El 18/11/2011 11:01, Hussein Shafie escribió:
>> On 11/17/2011 09:15 PM, klaus e. werner wrote:
>>>
>>> I have made good experiences with LibreOffice (ex OpenOffice). Just open
>>> it from LibreOffice Writer, and save it as "docbook *.xml".
>>>
>>> The structure might not come out very well at the first try, but slight
>>> changes (Heading -> Header 1, Header 1 -> Header 2) do wonders.
>>>
>>> Then updating to DocBook 5 with XXE, maybe some simple XSLT stylesheet
>>> and you're done.
>>>
>>> I'd say: make a test and have a look at the outcome.
>>>
>>> p.s. You don't need any special setup to do this, just a normal
>>> LibreOffice install is enough.
>>
>> Thank you for this information. Because we are *very* *interested* in
>> this feature, we immediately put what you have suggested into test.
>>
>> For that, we used a *simple* .doc file containings headings, nested
>> lists, tables, figures, etc, styled exclusively using normal styles.
>>
>> See attached files: simple.doc, the input file and simple_from_doc.xml
>> the DocBook file generated by LibreOffice 3.3.1.
>>
>> Then we did the same test with an equivalent .docx file with no better
>> results.
>>
>> Our conclusions are:
>>
>> LibreOffice does a poor job at opening .doc and .docx file. In
>> consequence, it does a poor job at saving these files as DocBook XML.
>
> It seems that the problem lies more in the Word side than in the
> LibreOffice side. My assumption is that lists in .doc files are
> represented internally as sequences of specialized paragraphs. If you
> export the Word document as HTML, or as native wml (the XML proprietary
> format, ) or as RTF, all of them human readable, you always see
> sequences of paragraphs instead of list structures marked as such.
>
> Once upon a time I've converted a big document from an old Wordperfect
> format into XHTML. MS Word was used to open the WP legacy format and
> export it as HTML, then converted to XHTML and edited with XXE.
>
> To easy the manual editing task, I wrote a customized version of the
> XHTML configuration, with additional editing macros. In particular, to
> convert a sequence of paragraphs into a properly structured list.
>
> --
> Manuel Collado - http://lml.ls.fi.upm.es/~mcollado
>
>
> --
> XMLmind XML Editor Support List
> [email protected]
> http://www.xmlmind.com/mailman/listinfo/xmleditor-support



-- 
Fabián Mandelbaum
IS Engineer
 
--
XMLmind XML Editor Support List
[email protected]
http://www.xmlmind.com/mailman/listinfo/xmleditor-support

Reply via email to