Hello Jukka,

This is a customer's output by WYSIWYG, and is an error in my opinion that it 
generated this deeply nested structure. So no, this is not a valid document, 
although 100 could well be valid for real documents, but i have never seen that 
before, and i have seen thousands of unique sites.

I think 100 is fine, i just looked for a way to work around it by 
configuration, without bothering the customer's customer with it.

Thanks,
Markus
 
-----Original message-----
> From:Jukka Zitting <[email protected]>
> Sent: Monday 26th August 2019 19:48
> To: Tika Users <[email protected]>; [email protected]
> Subject: Re: How to increase ZIP bomb maximum depth
> 
> Hi, 
> 
> I wonder if we should just increase the default thresholds to allow deeper 
> nesting before the exception gets thrown. The defaults should be tuned to 
> make the false-positive rate as low as possible without opening the door for 
> false negatives that could result denial of service attacks. 
> 
> The package-entry depth limit added in 
> https://issues.apache.org/jira/browse/TIKA-741 
> <https://issues.apache.org/jira/browse/TIKA-741> should make it OK to 
> increase the default maxDepth from 100 to say 200 if people are hitting this 
> limit with valid documents. 
> 
> Markus, what kind of documents are triggering the exception for you? What 
> would be a good maxDepth setting for your case? 
> 
> Best, 
> 
> Jukka 
> 
> 
> On Mon, Aug 26, 2019 at 1:40 PM Tim Allison <[email protected] 
> <mailto:[email protected]>> wrote:
> Oh, ok.  This is helpful.  Got it.  The AutoDetectParser automatically
 
> wraps the incoming handler in a SecureContentHandler.  Some options...
 
> 
 
> 1) We could have the AutoDetectParser skip wrapping a
 
> SecureContentHandler around the incoming handler if the user calls
 
> parse with a SecureContentHandler...
 
> 2) We could add SecureContentHandler parameter settings to the
 
> AutoDetectParser, and it would configure the SecureContentHandler
 
> accordingly...I think there are a few subtleties, but this might get
 
> you configurability via tika-config.xml.
 
> 
 
> Im not offering static thresholds on the SecureContentHandler. :D
 
> 
 
> Fellow devs, how else might we make this work and make it configurable
 
> via tika-config.xml?
 
> 
 
> Cheers,
 
> 
 
>            Tim
 
> 
 
> 
 
> On Mon, Aug 26, 2019 at 1:24 PM Markus Jelsma
 
> <[email protected] <mailto:[email protected]>> wrote:
 
> >
 
> > Hello Tim,
 
> >
 
> > I use Tika embedded in another Java application. passing it a custom 
> > ContentHandler which collects interesting stuff, which we, after the parse, 
> > use to construct meaningful text.
 
> >
 
> >     ReadableContentHandler handler = new ReadableContentHandler(url, 
> >config);
 
> >
 
> >     AutoDetectParser parser = new AutoDetectParser(tikaConfig);
 
> >     parser.parse(stream, handler,  new Metadata(), context);
 
> >
 
> > My ContentHandler does not extend SecureContentHandler so i never have a 
> > chance to pass some different value for the nesting limit check.
 
> >
 
> > Many thanks,
 
> > Markus
 
> >
 
> > -----Original message-----
 
> > > From:Tim Allison <[email protected] <mailto:[email protected]>>
 
> > > Sent: Monday 26th August 2019 19:11
 
> > > To: [email protected] <mailto:[email protected]>
 
> > > Subject: Re: How to increase ZIP bomb maximum depth
 
> > >
 
> > > Hi Markus,
 
> > >
 
> > >   This requires some work...the zip bomb protections are currently
 
> > > handled by the handler.  We allow for configuration of the parsers,
 
> > > detectors, charset detectors, but not yet the handlers.  IIRC, weve
 
> > > talked a bit about specifying a custom handler via the commandline at
 
> > > least in tika-server.  I wonder if we should allow for a default
 
> > > handler configuration that would specify a handler to be used by the
 
> > > facade Tika.parse(inputStream)?
 
> > >
 
> > >   Fellow devs have any recommendations?
 
> > >
 
> > >   How are you currently calling Tika?  Via tika-server, Solrs DIH or
 
> > > something else?
 
> > >
 
> > >           Best,
 
> > >
 
> > >                 Tim
 
> > >
 
> > > On Mon, Aug 26, 2019 at 11:20 AM Markus Jelsma
 
> > > <[email protected] <mailto:[email protected]>> wrote:
 
> > > >
 
> > > > Hello,
 
> > > >
 
> > > > Ive been looking around to increase the limit, but i dont seem to be 
> > > > able to find how. I know there the setter for it, but using 
> > > > AutoDetectParser, id like to set it via tika-config. I havent seen a 
> > > > parameter for tika-config that would set that value and the manual on 
> > > > Configuring Tika doesnt mention it.
 
> > > >
 
> > > > Many thanks,
 
> > > > Markus
 
> > > >
 
> > > >
 
> > >
 

Reply via email to