SummaryInformation parsing can be buggy so we catch pretty much everything
there and parse the rest of the document.

As of Tika 1.23, you can bump the global ByteArrayMaxOverride via the
OfficeParserConfig if you're calling Tika programmatically or via
tika-config.xml.

On Wed, Dec 18, 2019 at 8:39 AM Hans Meijer <[email protected]>
wrote:

> Tika version 1.23:
> When trying to parse a larger excel file, size in bytes: 10038272,  this
> error occurs:
> WARN  Ignoring unexpected exception while parsing summary entry
> DocumentSummaryInformation
> org.apache.poi.util.RecordFormatException: Tried to allocate an array of
> length 1186960, but 100000 is the maximum for this record type.
> If the file is not corrupt, please open an issue on bugzilla to request
> increasing the maximum allowable size for this record type.
> As a temporary workaround, consider setting a higher override value with
> IOUtils.setByteArrayMaxOverride()
>
> However, it seems like all text gets extracted etc. but still  get the
> warning message.
>
> Any way to analyze more why the warning text is still coming if the content
> get extracted from the excel spread sheet.
>
>
>
>
> --
> Sent from: http://apache-tika-users.1629097.n2.nabble.com/
>

Reply via email to