Thank you! I just found the way of how to configure the config xml. On Thu, 2 Feb 2023 at 18:12 Tim Allison <[email protected]> wrote:
> Thank you! The error message gives a hint on how to fix this. You can > configure the Tika's OfficeParser to override this maximum record > length: > https://tika.apache.org/2.6.0/api/org/apache/tika/parser/microsoft/AbstractOfficeParser.html#setByteArrayMaxOverride-int- > > I can send a link on how to do this via tika-config.xml if this is the > path you'd like to pursue. > > I was responsible for adding this code in POI because throughout the > older MSOffice docs (doc, ppt, xls), there's a common pattern of: read > a record length, allocate that length in memory, then read the stream > into the byte array. The problem is that files can be carefully > modified/created to have a very small file allocate 2GB. This is a > protection against that behavior. > > On Wed, Feb 1, 2023 at 11:03 PM Tilman Hausherr <[email protected]> > wrote: > > > > Hi, > > > > A complete stack trace would be useful, if it isn't in the log, then > using tika-app would be helpful. At this time the only thing we know is > that it's an office file, which may or may not be corrupt. > > > > The exception happens as part of a call to IOUtils.toByteArray() > > > > A google search for that error finds several pages that answers your > original question: > > > > > https://stackoverflow.com/questions/64221010/apache-tika-tried-to-allocate-an-array-of-length-1835606-but-1000000-is-the-ma > > https://bz.apache.org/bugzilla/show_bug.cgi?id=65639 > > > https://www.ibm.com/support/pages/converter-dropped-some-document-tried-allocate-array-length-xxxx-1000000-maximum-record-type-message > > > > Tilman > > > > On 01.02.2023 23:17, שי ברק wrote: > > > > The logs I got: > > > > Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate > > an array of length 1835606, but 1000000 is the maximum for this record > type. > > If the file is not corrupt, please open an issue on bugzilla to request > > increasing the maximum allowable size for this record type. > > As a temporary workaround, consider setting a higher override value with > > IOUtils.setByteArrayMaxOverride() > > > > > > On Wed, 1 Feb 2023 at 23:49 Tim Allison <[email protected]> wrote: > >> > >> As Tilman said, I don't think the issue is on the Tika side, but I > >> can't tell without testing. What happens when you curl the file to > >> the server? You might have to use multipart/form-data? > >> > >> Again, as Tilman said, it would be useful to see what the logs are. > >> Try sending the file to the /rmeta endpoint to get the stacktrace if > >> you can't otherwise see the logs. > >> > >> > >> On Wed, Feb 1, 2023 at 12:04 PM Tilman Hausherr <[email protected]> > wrote: > >> > > >> > How would you know that it is size related? Try what I mentioned, or > look at the server logs, or share the file. > >> > > >> > Tilman > >> > > >> > On 01.02.2023 17:08, שי ברק wrote: > >> > > >> > I work on C# project that uses Tika Server with http request, so I’m > wondering if there’s something I can do with the config file of the > server…maybe there’s a way to modify the size limit > >> > > >> > On Wed, 1 Feb 2023 at 17:51 Tilman Hausherr <[email protected]> > wrote: > >> >> > >> >> On 01.02.2023 09:40, שי ברק wrote: > >> >> > I have a 150 MB power point office document and when send request > to > >> >> > Tika server I get 422 response back, says it’s unprocessable > entity. > >> >> > Is there size limitation in Tika or the issue is with my specific > >> >> > document? > >> >> > >> >> What happens if you do the same with tika-app from the command line? > >> >> > >> >> Tilman > >> >> > >> >> > >> >> > >> > > > > > >
