Regrettably, you have to start from scratch.

For potentially super large files -- like some zips, pst, ost, etc... --
I've seen preprocessing "unravellers" that pull out the top level binaries
so that you Tika doesn't have to hold the whole thing in memory.

Any other recs from the community?

On Mon, Feb 6, 2023 at 11:13 AM Rob McCoy <[email protected]> wrote:

> First off apologies if this is done incorrectly, I haven't used mailing
> lists before :)
>
> Does Tika offer any way of starting an embedded file parse from a certain
> "position"? For example, if we are attempting to extract files from a very
> large zip and the parsing is stopped halfway due to an error (E.g. an OOM
> from the Java process), but I had tracked what file I was on and knew where
> I had failed, and I wanted to start the unzipping with Tika again at the
> same file/location in the ZIP in a new instance of the service, is is
> possible for me to tell Tika specifically where to "jump ahead" to, or do
> we have to start from scratch?
>
> Thanks
>

Reply via email to