Thanks, I have tried but I am a bit uncertain on the tika-config.xml.
This is what i tried with: <?xml version="1.0" encoding="UTF-8"?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <properties> <parsers> <parser class="org.apache.tika.parser.microsoft.OfficeParserConfig"> <params> <param name="ByteArrayMaxOverride" type="int">40960000</param> </params> </parser> </parsers> </properties> Från: Tim Allison <[email protected]> Skickat: den 18 december 2019 14:52 Till: [email protected] Kopia: [email protected] Ämne: Re: 100000 is the maximum for this record type SummaryInformation parsing can be buggy so we catch pretty much everything there and parse the rest of the document. As of Tika 1.23, you can bump the global ByteArrayMaxOverride via the OfficeParserConfig if you're calling Tika programmatically or via tika-config.xml. On Wed, Dec 18, 2019 at 8:39 AM Hans Meijer <[email protected] <mailto:[email protected]> > wrote: Tika version 1.23: When trying to parse a larger excel file, size in bytes: 10038272, this error occurs: WARN Ignoring unexpected exception while parsing summary entry DocumentSummaryInformation org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 1186960, but 100000 is the maximum for this record type. If the file is not corrupt, please open an issue on bugzilla to request increasing the maximum allowable size for this record type. As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride() However, it seems like all text gets extracted etc. but still get the warning message. Any way to analyze more why the warning text is still coming if the content get extracted from the excel spread sheet. -- Sent from: http://apache-tika-users.1629097.n2.nabble.com/
