Hi Rainer,

yes i think that this is the level of detail I am interested in. As i had in
my mind when i wrote the previous message, i will have  a lot of work (code)
to do. My final goal is to transform word documents into XML (DITA
specifications). So, I hope to have the support of this community during my
effort.

Thanks a lot!!!


On 9/5/07, Rainer Schwarze <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> Themistoklis Dakanalis wrote:
> > i have some fundamental questions about the HWPF and the word elements
> it
> > supports. Particularly, I am interesting in the  following word
> elements:
>
> General limitation: HWPF can read a lot of real life Word files, but not
> write them. (I seem to remember that "fast saved" files cannot be read
> reliably.)
>
> >    1. Text (headings, paragraphs,...)
>
> Text can be retrieved, YMMV with 2-byte characters.
> Headings should be identifiable by reading the paragraph style and
> comparing its id with the predefined heading style numbers.
> (I think that these codes are not yet declared conveniently in HWPF, so
> you need to look at the Word spec if you need them [get yourself a nice
> comfy chair when reading the Word spec and put away all heavy and
> dangerous objects in case you become furious while reading it :-) ].)
>
> Formatting information (bold, italic, ...) can be retrieved. If you need
> very accurate formatting information, you need to create a few relevant
> test cases and check that everything works as expected. I remember that
> HWPF sometimes uses other defaults than Word and may for instance
> sometimes say font size is "12pt" when it actually is "10pt".
>
> >    2. Tables
>
> There is a getTable method in Range which retrieves a Table instance for
> the table starting at the supplied Paragraph. I don't know how well that
> works. You should expect that you get most, but not all Table
> properties, because several property codes occuring in table information
> are not documented in the public Word spec.
>
> >    3. Images
>
> Simple images should work, Office Art is not supported.
> I doubt that images referenced by fields only ( {INCLUDEPICTURE} ) are
> supported (Fields are not supported). If the image is cached in the Word
> file, it may be accessible via the PicturesTable class, but I did not
> check that.
>
> >    4. Headers & Footers
> >    5. Footnotes
>
> Those are not fully supported. Trouble increases with 2-byte characters,
> fields and bookmarks in these parts.
>
>
> I hope that this is roughly the detail level you were interested in. If
> you need more details, just ask :-)
>
> If anybody has a different view on these topics, please correct me.
>
> Best wishes, Rainer
> --
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-- 
Themistoklis Dakanalis

Reply via email to