Hello,

Couple things, but no great solution.  An Error isA Throwable, and I've
definitely seen some tough to find problems when they were caught and not
correctly dealt with (e.g., logged, propagated) -- I'm sure we all have, so
I'd definitely stress being wary of "solving" problems by catching
Throwable.  It seems weird that excluding a jar/class would cause a
NoSuchMethodError rather than ClassNotFoundException or maybe a
NoSuchMethodException?  In my experiences, the Error forms were caused by
weird things like multiple versions of a class in the classpath and/or
issues resulting from byte code manipulation.  This can definitey be a
tricky area to get the right balance of jar size, debuggability, and
performance.  Maybe some kind of initialization routine which would would
trigger/highlight configuration errors of the "enabled" parsers as early as
possible would be a good compromise?

Kind regards,
- Luke

On Tue, Dec 15, 2009 at 7:33 PM, Jukka Zitting <jukka.zitt...@gmail.com>wrote:

> Hi,
>
> On Wed, Dec 16, 2009 at 1:31 AM, Ken Krugler
> <kkrugler_li...@transpac.com> wrote:
> > But I ran into a problem, where the Tika auto-detect code was correctly
> > identifying  a file as being a Microsoft format, even though the server
> said
> > it was text/plain. The Tika Microsoft parser would try to dynamically
> figure
> > out which support code to call, and in the end it throws a
> > NoSuchMethodError.
> >
> > Note that this is an Error, not an Exception. As such, it flies on past
> all
> > of the Tika catch blocks, and my own code's catch blocks, and kills the
> > Hadoop job in weird and wonderful ways.
> >
> > It seems like Errors shouldn't be thrown for situations where dynamic
> > configuration could result in a class not existing, but before I started
> > writing up an issue I wanted to get input from the community about this.
> > It's a bit gray to me, since I essentially "did it to myself" by
> excluding
> > jars.
>
> As a general rule I think Tika should be more resilient about such issues.
>
> The TikaConfig code that tries to load and instantiate the configured
> parser classes was already made to catch and ignore any Throwables,
> but I guess in this case the problem occurs outside TikaConfig when
> the parse() method of the instantiated parser is called.
>
> Catching Errors is a bit questionable, but it sounds like in this case
> we should do it. See the code in CompositeParser that already catches
> any RuntimeExceptions and wraps them into TikaExceptions. Perhaps we
> should add similar handling also for Errors.
>
> BR,
>
> Jukka Zitting
>

Reply via email to