Hi Bing,

XDocReport converter doesn't manage shape (I must update our wiki to set
the limitations for our converter).

But any contribution are welcome!

if you wish to speak about XDocReport converter, I suggest you to post on
XDocReport forum to avoid disturbing POI forum.

Regard's Angelo


2014-06-09 10:17 GMT+02:00 Bing Ran <[email protected]>:

> Thanks Angelo.
>
> I gave XDocReport a go and had limited success with a dock file which
> contains Microsoft Equations.
>
> I believe the equations edited by the equation editor are in the format of
> wmf files. The reference section is like this:
>
> <xml-fragment w:dxaOrig="1542" w:dyaOrig="300">
>
>   <v:shape id="_x0000_i1027" o:spid="_x0000_i1028" type="#_x0000_t75"
>
> style="width:77pt;height:15pt;mso-position-horizontal-relative:page;mso-position-vertical-relative:page"
> o:ole="">
>
>     <v:imagedata r:id="rId12" o:title=""/>
>
>   </v:shape>
>
>   <o:OLEObject Type="Embed" ProgID="Equation.DSMT4" ShapeID="_x0000_i1027"
> DrawAspect="Content" ObjectID="_1336171117" r:id="rId13">
>
>     <o:FieldCodes>\* MERGEFORMAT</o:FieldCodes>
>
>   </o:OLEObject>
>
> </xml-fragment>
>
>
> I overrode the XWPFDocumentVisitor and realized that the CTObject instance
> derived from the above fragment was not handled by the visitRun().
>
>
> I'm wondering how I'm going to retrieve the picture data from CTObject.
>
>
> Reading this line: <v:imagedata r:id="rId12" o:title=""/> I
>
>
> I would imagine that the picture data is stored somewhere with an ID of
> rId12.
>
>
> Any help is highly appreciated!
>
>
> Bing
>
>
>
>
>
> 2014-06-09 5:12 GMT+08:00 Angelo zerr <[email protected]>:
>
> > Hi Bing,
> >
> > XDocReport provides a docx->xhtml converter based on POI. See at
> > https://code.google.com/p/xdocreport/wiki/XWPFConverterXHTML
> >
> > @nick I didn't know Tika, I will try it and perhaps will integrate it to
> > XDocReport if it works well. Thank's for this info.
> >
> > Regards Angelo
> >
> >
> > 2014-06-08 20:13 GMT+02:00 Nick Burch <[email protected]>:
> >
> > > On Mon, 9 Jun 2014, Bing Ran wrote:
> > >
> > >> Now I'm looking at some docx files and wondering if there's something
> > >> similar to the hwpf WordToHtmlConverter/WordToTextConverter which has
> > >> served me very well for extracting text and images for doc files.
> > >>
> > >
> > > For plain text, try XWPFWordExtractor. For HTML, try Apache Tika (which
> > > wraps Apache POI)
> > >
> > > Nick
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
> >
>

Reply via email to