Re: got docx?

2016-12-12 Thread Mark Murphy
Yes, there is no glossary support, and I don't think templates are supported very well either, if at all. I tried once to read a template and save it as a document to another file, and things didn't go well. I'm sure this just scratches the surface. Of course you are looking at things from an

RE: got docx?

2016-12-12 Thread Allison, Timothy B.
This is very helpful, Mark. Thank you. Y, I'd add handling of the glossary document, as well. As I was working on the SAX parser for Tika, it "feels" more robust from an extraction standpoint because it is extracting all "w:t",...with a few exceptions (deltext, moveFrom, alternatecontent,

RE: got docx?

2016-12-12 Thread Murphy, Mark
Lol, just from looking through the code, and standard, there are a number of things that I know are not handled or not handled properly in XWPF. A quick subset from the top of my head includes: * Pictures that are not inlined in the main document, header, or footer parts. * Sections * SDT

RE: got docx?

2016-12-12 Thread Allison, Timothy B.
To close the loop and share my gratitude publicly... Thank you, Dominik, for transferring 41k, 5GB of docx/dotx to our regression corpus! I’ve already found a number of “areas for improvement” in Tika's experimental docx SAX parser, and a few areas for improvement in POI's XWPFDocument/DOM

[Bug 60471] New: Not loading AlternateContent in XWPF

2016-12-12 Thread bugzilla
https://bz.apache.org/bugzilla/show_bug.cgi?id=60471 Bug ID: 60471 Summary: Not loading AlternateContent in XWPF Product: POI Version: 3.16-dev Hardware: PC Status: NEW Severity: normal Priority: P2