Hi Bing, XDocReport provides a docx->xhtml converter based on POI. See at https://code.google.com/p/xdocreport/wiki/XWPFConverterXHTML
@nick I didn't know Tika, I will try it and perhaps will integrate it to XDocReport if it works well. Thank's for this info. Regards Angelo 2014-06-08 20:13 GMT+02:00 Nick Burch <[email protected]>: > On Mon, 9 Jun 2014, Bing Ran wrote: > >> Now I'm looking at some docx files and wondering if there's something >> similar to the hwpf WordToHtmlConverter/WordToTextConverter which has >> served me very well for extracting text and images for doc files. >> > > For plain text, try XWPFWordExtractor. For HTML, try Apache Tika (which > wraps Apache POI) > > Nick > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
