Thank you!

 

Appreciate it.

I did find a build in Jenkins ”POI-DSL-1.8#876” with 
poi-src-4.1.2-SNAPSHOTY-20200123.tar.gz. 

I assume that would be ok also?

 

What would be the max value to set? Any recommendations?

 

Kind regards

Hans

 

Från: Tim Allison <[email protected]> 
Skickat: den 23 januari 2020 16:21
Till: [email protected]
Ämne: Re: 100000 is the maximum for this record type

 

I confirmed that this will require the next version of POI due to a bug that is 
my fault: https://bz.apache.org/bugzilla/show_bug.cgi?id=63569

 

Many thanks to Dominik Stadler for fixing this.

 

If you are able to build POI-4.1.2-SNAPSHOT, the above configuration file will 
work.  The next version of POI should be out fairly soon(???); I've asked on 
POI's dev list.

 

On Thu, Jan 23, 2020 at 9:51 AM Tim Allison <[email protected] 
<mailto:[email protected]> > wrote:

Hans,

  I'm sorry for my delay.  There was a bug found in setting the global max in 
POI, which may require us to wait for the next release, but I _think_ you 
should be ok with this:

<?xml version="1.0" encoding="UTF-8"?>
<properties>
    <parsers>
        <parser class="org.apache.tika.parser.DefaultParser"/>
        <parser class="org.apache.tika.parser.microsoft.OfficeParser">
            <params>
                <param name="byteArrayMaxOverride" type="int">2000000</param>
            </params>
        </parser>
    </parsers>
</properties>

 

 

 

On Tue, Jan 21, 2020 at 3:44 PM <[email protected] 
<mailto:[email protected]> > wrote:

Hi

Still stuck on this issue. Trying to take it up again to see if Tika can be an 
option.

 

I still get the error message although i have tika-server 1.23 and python tika 
1.23.

 

The call to tika  using file in the python code is parser.from_file(filename).

 

I have tried setting the ByteMaxOverride using a tika config file:

<?xml version="1.0" encoding="UTF-8"?>

 

<properties>

    <parsers>

        <parser class="org.apache.tika.parser.microsoft.OfficeParserConfig">

            <params>

                <param name="ByteArrayMaxOverride" type="int">2048000</param>

            </params>

        </parser>

    </parsers>

</properties>

 

But no luck in that the error message is not there anymore. It seems like all 
the content is parsed though but i would appreciate to not get the warning 
message:

 

WARN  Ignoring unexpected exception while parsing summary entry 
DocumentSummaryInformation

org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 
1186960, but 100000 is the maximum for this record type.

If the file is not corrupt, please open an issue on bugzilla to request

increasing the maximum allowable size for this record type.

As a temporary workaround, consider setting a higher override value with 
IOUtils.setByteArrayMaxOverride()

        at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:591)

 

Any hints on how to get rid of it?

Everything is 1.23 version and i am using the python library.

 

 

Really appreciate any hints!

 

Kind regards

Hans

 

Från: Tim Allison <[email protected] <mailto:[email protected]> > 
Skickat: den 18 december 2019 14:52
Till: [email protected] <mailto:[email protected]> 
Kopia: [email protected] <mailto:[email protected]> 
Ämne: Re: 100000 is the maximum for this record type

 

SummaryInformation parsing can be buggy so we catch pretty much everything 
there and parse the rest of the document.

 

As of Tika 1.23, you can bump the global ByteArrayMaxOverride via the 
OfficeParserConfig if you're calling Tika programmatically or via 
tika-config.xml.  

 

On Wed, Dec 18, 2019 at 8:39 AM Hans Meijer <[email protected] 
<mailto:[email protected]> > wrote:

Tika version 1.23:
When trying to parse a larger excel file, size in bytes: 10038272,  this
error occurs:
WARN  Ignoring unexpected exception while parsing summary entry
DocumentSummaryInformation
org.apache.poi.util.RecordFormatException: Tried to allocate an array of
length 1186960, but 100000 is the maximum for this record type.
If the file is not corrupt, please open an issue on bugzilla to request
increasing the maximum allowable size for this record type.
As a temporary workaround, consider setting a higher override value with
IOUtils.setByteArrayMaxOverride()

However, it seems like all text gets extracted etc. but still  get the
warning message.

Any way to analyze more why the warning text is still coming if the content
get extracted from the excel spread sheet.




--
Sent from: http://apache-tika-users.1629097.n2.nabble.com/

Reply via email to