Hi,

I wonder if we should just increase the default thresholds to allow deeper
nesting before the exception gets thrown. The defaults should be tuned to
make the false-positive rate as low as possible without opening the door
for false negatives that could result denial of service attacks.

The package-entry depth limit added in
https://issues.apache.org/jira/browse/TIKA-741 should make it OK to
increase the default maxDepth from 100 to say 200 if people are hitting
this limit with valid documents.

Markus, what kind of documents are triggering the exception for you? What
would be a good maxDepth setting for your case?

Best,

Jukka


On Mon, Aug 26, 2019 at 1:40 PM Tim Allison <[email protected]> wrote:

> Oh, ok.  This is helpful.  Got it.  The AutoDetectParser automatically
> wraps the incoming handler in a SecureContentHandler.  Some options...
>
> 1) We could have the AutoDetectParser skip wrapping a
> SecureContentHandler around the incoming handler if the user calls
> parse with a SecureContentHandler...
> 2) We could add SecureContentHandler parameter settings to the
> AutoDetectParser, and it would configure the SecureContentHandler
> accordingly...I think there are a few subtleties, but this might get
> you configurability via tika-config.xml.
>
> I'm not offering static thresholds on the SecureContentHandler. :D
>
> Fellow devs, how else might we make this work and make it configurable
> via tika-config.xml?
>
> Cheers,
>
>            Tim
>
>
> On Mon, Aug 26, 2019 at 1:24 PM Markus Jelsma
> <[email protected]> wrote:
> >
> > Hello Tim,
> >
> > I use Tika embedded in another Java application. passing it a custom
> ContentHandler which collects interesting stuff, which we, after the parse,
> use to construct meaningful text.
> >
> >     ReadableContentHandler handler = new ReadableContentHandler(url,
> config);
> >
> >     AutoDetectParser parser = new AutoDetectParser(tikaConfig);
> >     parser.parse(stream, handler,  new Metadata(), context);
> >
> > My ContentHandler does not extend SecureContentHandler so i never have a
> chance to pass some different value for the nesting limit check.
> >
> > Many thanks,
> > Markus
> >
> > -----Original message-----
> > > From:Tim Allison <[email protected]>
> > > Sent: Monday 26th August 2019 19:11
> > > To: [email protected]
> > > Subject: Re: How to increase ZIP bomb maximum depth
> > >
> > > Hi Markus,
> > >
> > >   This requires some work...the zip bomb protections are currently
> > > handled by the handler.  We allow for configuration of the parsers,
> > > detectors, charset detectors, but not yet the handlers.  IIRC, we've
> > > talked a bit about specifying a custom handler via the commandline at
> > > least in tika-server.  I wonder if we should allow for a default
> > > handler configuration that would specify a handler to be used by the
> > > facade Tika.parse(inputStream)?
> > >
> > >   Fellow devs have any recommendations?
> > >
> > >   How are you currently calling Tika?  Via tika-server, Solr's DIH or
> > > something else?
> > >
> > >           Best,
> > >
> > >                 Tim
> > >
> > > On Mon, Aug 26, 2019 at 11:20 AM Markus Jelsma
> > > <[email protected]> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I've been looking around to increase the limit, but i don't seem to
> be able to find how. I know there the setter for it, but using
> AutoDetectParser, i'd like to set it via tika-config. I haven't seen a
> parameter for tika-config that would set that value and the manual on
> Configuring Tika doesn't mention it.
> > > >
> > > > Many thanks,
> > > > Markus
> > > >
> > > >
> > >
>

Reply via email to