Hi Arkadi, thanks for reporting that. Can you open a Jira ticket [1] to address this bug?
It's rather a bug of the plugin parse-tika and should be solved there, cf. https://issues.apache.org/jira/browse/TIKA-1240 A plugin should be able to load all required classes. Thanks, Sebastian [1] https://issues.apache.org/jira/browse/NUTCH 2015-06-23 3:59 GMT+02:00 <[email protected]>: > Hi, > > This is what happened: > > java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) > at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:213) > <...> > Caused by: java.lang.IncompatibleClassChangeError: class > org.apache.tika.parser.asm.XHTMLClassVisitor has interface > org.objectweb.asm.ClassVisitor as super class > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:760) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at > java.net.URLClassLoader.defineClass(URLClassLoader.java:467) > at > java.net.URLClassLoader.access$100(URLClassLoader.java:73) > at java.net.URLClassLoader$1.run(URLClassLoader.java:368) > at java.net.URLClassLoader$1.run(URLClassLoader.java:362) > at java.security.AccessController.doPrivileged(Native > Method) > at > java.net.URLClassLoader.findClass(URLClassLoader.java:361) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.tika.parser.asm.ClassParser.parse(ClassParser.java:51) > at > org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:98) > at > org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:103) > > Suggested fix in ParseUtil: > > Replace > > if (maxParseTime!=-1) > parseResult = runParser(parsers[i], content); > else > parseResult = parsers[i].getParse(content); > > with > > try > { > if (maxParseTime!=-1) > parseResult = runParser(parsers[i], content); > else > parseResult = parsers[i].getParse(content); > } catch( Throwable e ) > { > LOG.warn( "Parsing " + content.getUrl() + " with " + > parsers[i].getClass().getName() + " failed: " + e.getMessage() ) ; > parseResult = null ; > } > > Also replace > > if (maxParseTime!=-1) > parseResult = runParser(p, content); > else > parseResult = p.getParse(content); > > with > > try > { > if (maxParseTime!=-1) > parseResult = runParser(p, content); > else > parseResult = p.getParse(content); > } catch( Throwable e ) > { > LOG.warn( "Parsing " + content.getUrl() + " with " + > p.getClass().getName() + " failed: " + e.getMessage() ) ; > } > > Regards, > Arkadi >

