Yes, the second file brings this on the console log:

Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 100,000,015, but the maximum length for this record type is 100,000,000. If the file is not corrupt and not large, please open an issue on bugzilla to request
increasing the maximum allowable size for this record type.
You can set a higher override value with IOUtils.setByteArrayMaxOverride()
        at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:599) ~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]         at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:276) ~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]         at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:230) ~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]         at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:203) ~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]         at org.apache.poi.openxml4j.util.ZipArchiveFakeEntry.<init>(ZipArchiveFakeEntry.java:82) ~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]         at org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:98) ~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]         at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:132) ~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]         at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:319) ~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]         at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:127) ~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]         at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:115) ~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) ~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
        ... 41 more

So I googled for the error message and found this:

https://stackoverflow.com/a/64221068/535646

I then included this into the config.xml file from https://cwiki.apache.org/confluence/display/TIKA/TikaServer+in+Tika+2.x and then it works, although the meta output now came as xml instead of as text, maybe that default config file does change something instead of keeping defaults, but that's another story.

Tilman


Reply via email to