pof <MelbourneBeerBaron <at> gmail.com> writes: > > > Hi, I was wondering if someone could provide an example how to parse out the > plain text from a docx using poi 3.5 beta5? > > Cheers, Brett.
I dicsovered it's fairly easy to get all (or most anyway) of the text from a DOCX with basic Java libraries. A docx file is just a zip file with a bunch of XML files in it. I have an example of this I posted in my blog at http://www.maxstocker.com/blog.php?en=c6270d6e2bde17ae8c6f9659b3b863773 but the basic steps are 1) open the docx as a ZipFile 2) Get the XML file as the ZipEntry "word/document.xml" 3) Parse the XML document and get all tags named "w:t" 4) Extract content from those tags --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
