The problem has been solved.

Thank you.



[email protected]
 
From: Tilman Hausherr
Date: 2023-04-20 11:09
To: user
Subject: Re: Tika server extraction failed
Yes, the second file brings this on the console log:
 
Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate 
an array of length 100,000,015, but the maximum length for this record 
type is 100,000,000.
If the file is not corrupt and not large, please open an issue on 
bugzilla to request
increasing the maximum allowable size for this record type.
You can set a higher override value with IOUtils.setByteArrayMaxOverride()
        at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:599) 
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
        at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:276) 
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
        at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:230) 
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
        at org.apache.poi.util.IOUtils.toByteArray(IOUtils.java:203) 
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
        at 
org.apache.poi.openxml4j.util.ZipArchiveFakeEntry.<init>(ZipArchiveFakeEntry.java:82)
 
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
        at 
org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:98)
 
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
        at 
org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:132) 
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
        at 
org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:319) 
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
        at 
org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:127)
 
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
        at 
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:115) 
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) 
~[tika-server-standard-2.7.1-SNAPSHOT.jar:2.7.1-SNAPSHOT]
        ... 41 more
 
So I googled for the error message and found this:
 
https://stackoverflow.com/a/64221068/535646
 
I then included this into the config.xml file from 
https://cwiki.apache.org/confluence/display/TIKA/TikaServer+in+Tika+2.x 
and then it works, although the meta output now came as xml instead of 
as text, maybe that default config file does change something instead of 
keeping defaults, but that's another story.
 
Tilman
 

Reply via email to