I confirmed that this will require the next version of POI due to a bug that is my fault: https://bz.apache.org/bugzilla/show_bug.cgi?id=63569
Many thanks to Dominik Stadler for fixing this. If you are able to build POI-4.1.2-SNAPSHOT, the above configuration file will work. The next version of POI should be out fairly soon(???); I've asked on POI's dev list. On Thu, Jan 23, 2020 at 9:51 AM Tim Allison <[email protected]> wrote: > Hans, > I'm sorry for my delay. There was a bug found in setting the global max > in POI, which may require us to wait for the next release, but I _think_ > you should be ok with this: > > <?xml version="1.0" encoding="UTF-8"?> > <properties> > <parsers> > <parser class="org.apache.tika.parser.DefaultParser"/> > <parser class="org.apache.tika.parser.microsoft.OfficeParser"> > <params> > <param name="byteArrayMaxOverride" type="int">2000000</param> > > </params> > </parser> > </parsers> > </properties> > > > > > On Tue, Jan 21, 2020 at 3:44 PM <[email protected]> wrote: > >> Hi >> >> Still stuck on this issue. Trying to take it up again to see if Tika can >> be an option. >> >> >> >> I still get the error message although i have tika-server 1.23 and python >> tika 1.23. >> >> >> >> The call to tika using file in the python code is >> parser.from_file(filename). >> >> >> >> I have tried setting the ByteMaxOverride using a tika config file: >> >> <?xml version="1.0" encoding="UTF-8"?> >> >> >> >> <properties> >> >> <parsers> >> >> <parser >> class="org.apache.tika.parser.microsoft.OfficeParserConfig"> >> >> <params> >> >> <param name="ByteArrayMaxOverride" >> type="int">2048000</param> >> >> </params> >> >> </parser> >> >> </parsers> >> >> </properties> >> >> >> >> But no luck in that the error message is not there anymore. It seems like >> all the content is parsed though but i would appreciate to not get the >> warning message: >> >> >> >> WARN Ignoring unexpected exception while parsing summary entry >> DocumentSummaryInformation >> >> org.apache.poi.util.RecordFormatException: Tried to allocate an array of >> length 1186960, but 100000 is the maximum for this record type. >> >> If the file is not corrupt, please open an issue on bugzilla to request >> >> increasing the maximum allowable size for this record type. >> >> As a temporary workaround, consider setting a higher override value with >> IOUtils.setByteArrayMaxOverride() >> >> at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:591) >> >> >> >> Any hints on how to get rid of it? >> >> Everything is 1.23 version and i am using the python library. >> >> >> >> >> >> Really appreciate any hints! >> >> >> >> Kind regards >> >> Hans >> >> >> >> *Från:* Tim Allison <[email protected]> >> *Skickat:* den 18 december 2019 14:52 >> *Till:* [email protected] >> *Kopia:* [email protected] >> *Ämne:* Re: 100000 is the maximum for this record type >> >> >> >> SummaryInformation parsing can be buggy so we catch pretty much >> everything there and parse the rest of the document. >> >> >> >> As of Tika 1.23, you can bump the global ByteArrayMaxOverride via the >> OfficeParserConfig if you're calling Tika programmatically or via >> tika-config.xml. >> >> >> >> On Wed, Dec 18, 2019 at 8:39 AM Hans Meijer <[email protected]> >> wrote: >> >> Tika version 1.23: >> When trying to parse a larger excel file, size in bytes: 10038272, this >> error occurs: >> WARN Ignoring unexpected exception while parsing summary entry >> DocumentSummaryInformation >> org.apache.poi.util.RecordFormatException: Tried to allocate an array of >> length 1186960, but 100000 is the maximum for this record type. >> If the file is not corrupt, please open an issue on bugzilla to request >> increasing the maximum allowable size for this record type. >> As a temporary workaround, consider setting a higher override value with >> IOUtils.setByteArrayMaxOverride() >> >> However, it seems like all text gets extracted etc. but still get the >> warning message. >> >> Any way to analyze more why the warning text is still coming if the >> content >> get extracted from the excel spread sheet. >> >> >> >> >> -- >> Sent from: http://apache-tika-users.1629097.n2.nabble.com/ >> >>
