Yes, the second file brings this on the console log:
Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate
an array of length 100,000,015, but the maximum length for this record
type is 100,000,000.
If the file is not corrupt and not large, please open an issue on
bugzilla to request
increasing the maximum allowable size for this record type.
You can set a higher override value with IOUtils.setByteArrayMaxOverride()
at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:599)
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:276)
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:230)
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:203)
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
at
org.apache.poi.openxml4j.util.ZipArchiveFakeEntry.<init>(ZipArchiveFakeEntry.java:82)
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
at
org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:98)
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
at
org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:132)
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
at
org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:319)
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
at
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:127)
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:115)
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
... 41 more
So I googled for the error message and found this:
https://stackoverflow.com/a/64221068/535646
I then included this into the config.xml file from
https://cwiki.apache.org/confluence/display/TIKA/TikaServer+in+Tika+2.x
and then it works, although the meta output now came as xml instead of
as text, maybe that default config file does change something instead of
keeping defaults, but that's another story.
Tilman