Yes, there is no glossary support, and I don't think templates are
supported very well either, if at all. I tried once to read a template and
save it as a document to another file, and things didn't go well. I'm sure
this just scratches the surface. Of course you are looking at things from
an
This is very helpful, Mark. Thank you. Y, I'd add handling of the glossary
document, as well.
As I was working on the SAX parser for Tika, it "feels" more robust from an
extraction standpoint because it is extracting all "w:t",...with a few
exceptions (deltext, moveFrom, alternatecontent,
Lol, just from looking through the code, and standard, there are a number of
things that I know are not handled or not handled properly in XWPF. A quick
subset from the top of my head includes:
* Pictures that are not inlined in the main document, header, or footer parts.
* Sections
* SDT
To close the loop and share my gratitude publicly...
Thank you, Dominik, for transferring 41k, 5GB of docx/dotx to our regression
corpus!
I’ve already found a number of “areas for improvement” in Tika's experimental
docx SAX parser, and a few areas for improvement in POI's XWPFDocument/DOM
https://bz.apache.org/bugzilla/show_bug.cgi?id=60471
Bug ID: 60471
Summary: Not loading AlternateContent in XWPF
Product: POI
Version: 3.16-dev
Hardware: PC
Status: NEW
Severity: normal
Priority: P2