Hi Arkadi,

thanks for reporting that. Can you open a Jira ticket [1] to address this
bug?

It's rather a bug of the plugin parse-tika and should be solved there,
cf. https://issues.apache.org/jira/browse/TIKA-1240
A plugin should be able to load all required classes.

Thanks,
Sebastian

[1] https://issues.apache.org/jira/browse/NUTCH

2015-06-23 3:59 GMT+02:00 <[email protected]>:

> Hi,
>
> This is what happened:
>
> java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>         at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:213)
>         <...>
> Caused by: java.lang.IncompatibleClassChangeError: class
> org.apache.tika.parser.asm.XHTMLClassVisitor has interface
> org.objectweb.asm.ClassVisitor as super class
>                 at java.lang.ClassLoader.defineClass1(Native Method)
>                 at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
>                 at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>                 at
> java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
>                 at
> java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>                 at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>                 at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>                 at java.security.AccessController.doPrivileged(Native
> Method)
>                 at
> java.net.URLClassLoader.findClass(URLClassLoader.java:361)
>                 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>                 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>                 at
> org.apache.tika.parser.asm.ClassParser.parse(ClassParser.java:51)
>                 at
> org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:98)
>                 at
> org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:103)
>
> Suggested fix in ParseUtil:
>
> Replace
>
>             if (maxParseTime!=-1)
>                        parseResult = runParser(parsers[i], content);
>             else
>                        parseResult = parsers[i].getParse(content);
>
> with
>
>       try
>       {
>             if (maxParseTime!=-1)
>                        parseResult = runParser(parsers[i], content);
>             else
>                        parseResult = parsers[i].getParse(content);
>       } catch( Throwable e )
>       {
>         LOG.warn( "Parsing " + content.getUrl() + " with " +
> parsers[i].getClass().getName() + " failed: " + e.getMessage() ) ;
>         parseResult = null ;
>       }
>
> Also replace
>
>       if (maxParseTime!=-1)
>                   parseResult = runParser(p, content);
>        else
>                   parseResult = p.getParse(content);
>
> with
>
>     try
>     {
>       if (maxParseTime!=-1)
>                   parseResult = runParser(p, content);
>        else
>                   parseResult = p.getParse(content);
>     } catch( Throwable e )
>     {
>       LOG.warn( "Parsing " + content.getUrl() + " with " +
> p.getClass().getName() + " failed: " + e.getMessage() ) ;
>     }
>
> Regards,
> Arkadi
>

Reply via email to