Thanks,

I have tried but I am a bit uncertain on the tika-config.xml.

This is what i tried with:

<?xml version="1.0" encoding="UTF-8"?>

<!--

  Licensed to the Apache Software Foundation (ASF) under one or more

  contributor license agreements.  See the NOTICE file distributed with

  this work for additional information regarding copyright ownership.

  The ASF licenses this file to You under the Apache License, Version 2.0

  (the "License"); you may not use this file except in compliance with

  the License.  You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License.

-->

<properties>

    <parsers>

        <parser class="org.apache.tika.parser.microsoft.OfficeParserConfig">

            <params>

                <param name="ByteArrayMaxOverride" type="int">40960000</param>

            </params>

        </parser>

    </parsers>

</properties>

 

Från: Tim Allison <[email protected]> 
Skickat: den 18 december 2019 14:52
Till: [email protected]
Kopia: [email protected]
Ämne: Re: 100000 is the maximum for this record type

 

SummaryInformation parsing can be buggy so we catch pretty much everything 
there and parse the rest of the document.

 

As of Tika 1.23, you can bump the global ByteArrayMaxOverride via the 
OfficeParserConfig if you're calling Tika programmatically or via 
tika-config.xml.  

 

On Wed, Dec 18, 2019 at 8:39 AM Hans Meijer <[email protected] 
<mailto:[email protected]> > wrote:

Tika version 1.23:
When trying to parse a larger excel file, size in bytes: 10038272,  this
error occurs:
WARN  Ignoring unexpected exception while parsing summary entry
DocumentSummaryInformation
org.apache.poi.util.RecordFormatException: Tried to allocate an array of
length 1186960, but 100000 is the maximum for this record type.
If the file is not corrupt, please open an issue on bugzilla to request
increasing the maximum allowable size for this record type.
As a temporary workaround, consider setting a higher override value with
IOUtils.setByteArrayMaxOverride()

However, it seems like all text gets extracted etc. but still  get the
warning message.

Any way to analyze more why the warning text is still coming if the content
get extracted from the excel spread sheet.




--
Sent from: http://apache-tika-users.1629097.n2.nabble.com/

Reply via email to