Hi Bing,

XDocReport provides a docx->xhtml converter based on POI. See at
https://code.google.com/p/xdocreport/wiki/XWPFConverterXHTML

@nick I didn't know Tika, I will try it and perhaps will integrate it to
XDocReport if it works well. Thank's for this info.

Regards Angelo


2014-06-08 20:13 GMT+02:00 Nick Burch <[email protected]>:

> On Mon, 9 Jun 2014, Bing Ran wrote:
>
>> Now I'm looking at some docx files and wondering if there's something
>> similar to the hwpf WordToHtmlConverter/WordToTextConverter which has
>> served me very well for extracting text and images for doc files.
>>
>
> For plain text, try XWPFWordExtractor. For HTML, try Apache Tika (which
> wraps Apache POI)
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to