Thanks Angelo.

I gave XDocReport a go and had limited success with a dock file which
contains Microsoft Equations.

I believe the equations edited by the equation editor are in the format of
wmf files. The reference section is like this:

<xml-fragment w:dxaOrig="1542" w:dyaOrig="300">

  <v:shape id="_x0000_i1027" o:spid="_x0000_i1028" type="#_x0000_t75"
style="width:77pt;height:15pt;mso-position-horizontal-relative:page;mso-position-vertical-relative:page"
o:ole="">

    <v:imagedata r:id="rId12" o:title=""/>

  </v:shape>

  <o:OLEObject Type="Embed" ProgID="Equation.DSMT4" ShapeID="_x0000_i1027"
DrawAspect="Content" ObjectID="_1336171117" r:id="rId13">

    <o:FieldCodes>\* MERGEFORMAT</o:FieldCodes>

  </o:OLEObject>

</xml-fragment>


I overrode the XWPFDocumentVisitor and realized that the CTObject instance
derived from the above fragment was not handled by the visitRun().


I'm wondering how I'm going to retrieve the picture data from CTObject.


Reading this line: <v:imagedata r:id="rId12" o:title=""/> I


I would imagine that the picture data is stored somewhere with an ID of
rId12.


Any help is highly appreciated!


Bing





2014-06-09 5:12 GMT+08:00 Angelo zerr <[email protected]>:

> Hi Bing,
>
> XDocReport provides a docx->xhtml converter based on POI. See at
> https://code.google.com/p/xdocreport/wiki/XWPFConverterXHTML
>
> @nick I didn't know Tika, I will try it and perhaps will integrate it to
> XDocReport if it works well. Thank's for this info.
>
> Regards Angelo
>
>
> 2014-06-08 20:13 GMT+02:00 Nick Burch <[email protected]>:
>
> > On Mon, 9 Jun 2014, Bing Ran wrote:
> >
> >> Now I'm looking at some docx files and wondering if there's something
> >> similar to the hwpf WordToHtmlConverter/WordToTextConverter which has
> >> served me very well for extracting text and images for doc files.
> >>
> >
> > For plain text, try XWPFWordExtractor. For HTML, try Apache Tika (which
> > wraps Apache POI)
> >
> > Nick
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
>

Reply via email to