Thank you!
I just found the way of how to configure the config xml.

On Thu, 2 Feb 2023 at 18:12 Tim Allison <[email protected]> wrote:

> Thank you! The error message gives a hint on how to fix this.  You can
> configure the Tika's OfficeParser to override this maximum record
> length:
> https://tika.apache.org/2.6.0/api/org/apache/tika/parser/microsoft/AbstractOfficeParser.html#setByteArrayMaxOverride-int-
>
> I can send a link on how to do this via tika-config.xml if this is the
> path you'd like to pursue.
>
> I was responsible for adding this code in POI because throughout the
> older MSOffice docs (doc, ppt, xls), there's a common pattern of: read
> a record length, allocate that length in memory, then read the stream
> into the byte array.  The problem is that files can be carefully
> modified/created to have a very small file allocate 2GB.  This is a
> protection against that behavior.
>
> On Wed, Feb 1, 2023 at 11:03 PM Tilman Hausherr <[email protected]>
> wrote:
> >
> > Hi,
> >
> > A complete stack trace would be useful, if it isn't in the log, then
> using tika-app would be helpful. At this time the only thing we know is
> that it's an office file, which may or may not be corrupt.
> >
> > The exception happens as part of a call to  IOUtils.toByteArray()
> >
> > A google search for that error finds several pages that answers your
> original question:
> >
> >
> https://stackoverflow.com/questions/64221010/apache-tika-tried-to-allocate-an-array-of-length-1835606-but-1000000-is-the-ma
> > https://bz.apache.org/bugzilla/show_bug.cgi?id=65639
> >
> https://www.ibm.com/support/pages/converter-dropped-some-document-tried-allocate-array-length-xxxx-1000000-maximum-record-type-message
> >
> > Tilman
> >
> > On 01.02.2023 23:17, שי ברק wrote:
> >
> > The logs I got:
> >
> > Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate
> > an array of length 1835606, but 1000000 is the maximum for this record
> type.
> > If the file is not corrupt, please open an issue on bugzilla to request
> > increasing the maximum allowable size for this record type.
> > As a temporary workaround, consider setting a higher override value with
> > IOUtils.setByteArrayMaxOverride()
> >
> >
> > On Wed, 1 Feb 2023 at 23:49 Tim Allison <[email protected]> wrote:
> >>
> >> As Tilman said, I don't think the issue is on the Tika side, but I
> >> can't tell without testing.  What happens when you curl the file to
> >> the server?  You might have to use multipart/form-data?
> >>
> >> Again, as Tilman said, it would be useful to see what the logs are.
> >> Try sending the file to the /rmeta endpoint to get the stacktrace if
> >> you can't otherwise see the logs.
> >>
> >>
> >> On Wed, Feb 1, 2023 at 12:04 PM Tilman Hausherr <[email protected]>
> wrote:
> >> >
> >> > How would you know that it is size related? Try what I mentioned, or
> look at the server logs, or share the file.
> >> >
> >> > Tilman
> >> >
> >> > On 01.02.2023 17:08, שי ברק wrote:
> >> >
> >> > I work on C# project that uses Tika Server with http request, so I’m
> wondering if there’s something I can do with the config file of the
> server…maybe there’s a way to modify the size limit
> >> >
> >> > On Wed, 1 Feb 2023 at 17:51 Tilman Hausherr <[email protected]>
> wrote:
> >> >>
> >> >> On 01.02.2023 09:40, שי ברק wrote:
> >> >> > I have a 150 MB power point office document and when send request
> to
> >> >> > Tika server I get 422 response back, says it’s unprocessable
> entity.
> >> >> > Is there size limitation in Tika or the issue is with my specific
> >> >> > document?
> >> >>
> >> >> What happens if you do the same with tika-app from the command line?
> >> >>
> >> >> Tilman
> >> >>
> >> >>
> >> >>
> >> >
> >
> >
>

Reply via email to