Hi Bing, XDocReport converter doesn't manage shape (I must update our wiki to set the limitations for our converter).
But any contribution are welcome! if you wish to speak about XDocReport converter, I suggest you to post on XDocReport forum to avoid disturbing POI forum. Regard's Angelo 2014-06-09 10:17 GMT+02:00 Bing Ran <[email protected]>: > Thanks Angelo. > > I gave XDocReport a go and had limited success with a dock file which > contains Microsoft Equations. > > I believe the equations edited by the equation editor are in the format of > wmf files. The reference section is like this: > > <xml-fragment w:dxaOrig="1542" w:dyaOrig="300"> > > <v:shape id="_x0000_i1027" o:spid="_x0000_i1028" type="#_x0000_t75" > > style="width:77pt;height:15pt;mso-position-horizontal-relative:page;mso-position-vertical-relative:page" > o:ole=""> > > <v:imagedata r:id="rId12" o:title=""/> > > </v:shape> > > <o:OLEObject Type="Embed" ProgID="Equation.DSMT4" ShapeID="_x0000_i1027" > DrawAspect="Content" ObjectID="_1336171117" r:id="rId13"> > > <o:FieldCodes>\* MERGEFORMAT</o:FieldCodes> > > </o:OLEObject> > > </xml-fragment> > > > I overrode the XWPFDocumentVisitor and realized that the CTObject instance > derived from the above fragment was not handled by the visitRun(). > > > I'm wondering how I'm going to retrieve the picture data from CTObject. > > > Reading this line: <v:imagedata r:id="rId12" o:title=""/> I > > > I would imagine that the picture data is stored somewhere with an ID of > rId12. > > > Any help is highly appreciated! > > > Bing > > > > > > 2014-06-09 5:12 GMT+08:00 Angelo zerr <[email protected]>: > > > Hi Bing, > > > > XDocReport provides a docx->xhtml converter based on POI. See at > > https://code.google.com/p/xdocreport/wiki/XWPFConverterXHTML > > > > @nick I didn't know Tika, I will try it and perhaps will integrate it to > > XDocReport if it works well. Thank's for this info. > > > > Regards Angelo > > > > > > 2014-06-08 20:13 GMT+02:00 Nick Burch <[email protected]>: > > > > > On Mon, 9 Jun 2014, Bing Ran wrote: > > > > > >> Now I'm looking at some docx files and wondering if there's something > > >> similar to the hwpf WordToHtmlConverter/WordToTextConverter which has > > >> served me very well for extracting text and images for doc files. > > >> > > > > > > For plain text, try XWPFWordExtractor. For HTML, try Apache Tika (which > > > wraps Apache POI) > > > > > > Nick > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > > > > >
