Re: WordToHtmlConverter in xwpf

Angelo zerr Mon, 09 Jun 2014 01:24:27 -0700

Hi Bing,

XDocReport converter doesn't manage shape (I must update our wiki to set
the limitations for our converter).


But any contribution are welcome!

if you wish to speak about XDocReport converter, I suggest you to post on
XDocReport forum to avoid disturbing POI forum.

Regard's Angelo


2014-06-09 10:17 GMT+02:00 Bing Ran <[email protected]>:

> Thanks Angelo.
>
> I gave XDocReport a go and had limited success with a dock file which
> contains Microsoft Equations.
>
> I believe the equations edited by the equation editor are in the format of
> wmf files. The reference section is like this:
>
> <xml-fragment w:dxaOrig="1542" w:dyaOrig="300">
>
>   <v:shape id="_x0000_i1027" o:spid="_x0000_i1028" type="#_x0000_t75"
>
> style="width:77pt;height:15pt;mso-position-horizontal-relative:page;mso-position-vertical-relative:page"
> o:ole="">
>
>     <v:imagedata r:id="rId12" o:title=""/>
>
>   </v:shape>
>
>   <o:OLEObject Type="Embed" ProgID="Equation.DSMT4" ShapeID="_x0000_i1027"
> DrawAspect="Content" ObjectID="_1336171117" r:id="rId13">
>
>     <o:FieldCodes>\* MERGEFORMAT</o:FieldCodes>
>
>   </o:OLEObject>
>
> </xml-fragment>
>
>
> I overrode the XWPFDocumentVisitor and realized that the CTObject instance
> derived from the above fragment was not handled by the visitRun().
>
>
> I'm wondering how I'm going to retrieve the picture data from CTObject.
>
>
> Reading this line: <v:imagedata r:id="rId12" o:title=""/> I
>
>
> I would imagine that the picture data is stored somewhere with an ID of
> rId12.
>
>
> Any help is highly appreciated!
>
>
> Bing
>
>
>
>
>
> 2014-06-09 5:12 GMT+08:00 Angelo zerr <[email protected]>:
>
> > Hi Bing,
> >
> > XDocReport provides a docx->xhtml converter based on POI. See at
> > https://code.google.com/p/xdocreport/wiki/XWPFConverterXHTML
> >
> > @nick I didn't know Tika, I will try it and perhaps will integrate it to
> > XDocReport if it works well. Thank's for this info.
> >
> > Regards Angelo
> >
> >
> > 2014-06-08 20:13 GMT+02:00 Nick Burch <[email protected]>:
> >
> > > On Mon, 9 Jun 2014, Bing Ran wrote:
> > >
> > >> Now I'm looking at some docx files and wondering if there's something
> > >> similar to the hwpf WordToHtmlConverter/WordToTextConverter which has
> > >> served me very well for extracting text and images for doc files.
> > >>
> > >
> > > For plain text, try XWPFWordExtractor. For HTML, try Apache Tika (which
> > > wraps Apache POI)
> > >
> > > Nick
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
> >
>

Re: WordToHtmlConverter in xwpf

Reply via email to