Hi Arkadi,

does the problem persist?
Which version of Nutch are you using?
Can you point to one file or URL to reproduce it?

Thanks,
Sebastian

On 06/26/2015 03:26 PM, Sebastian Nagel wrote:
> Hi Arkadi,
> 
> thanks for reporting that. Can you open a Jira ticket [1] to address this bug?
> 
> It's rather a bug of the plugin parse-tika and should be solved there,
> cf. https://issues.apache.org/jira/browse/TIKA-1240
> A plugin should be able to load all required classes.
> 
> Thanks,
> Sebastian
> 
> [1] https://issues.apache.org/jira/browse/NUTCH
> 
> 2015-06-23 3:59 GMT+02:00 <[email protected] 
> <mailto:[email protected]>>:
> 
>     Hi,
> 
>     This is what happened:
> 
>     java.io.IOException: Job failed!
>             at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>             at 
> org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:213)
>             <...>
>     Caused by: java.lang.IncompatibleClassChangeError: class
>     org.apache.tika.parser.asm.XHTMLClassVisitor has interface 
> org.objectweb.asm.ClassVisitor as
>     super class
>                     at java.lang.ClassLoader.defineClass1(Native Method)
>                     at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
>                     at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>                     at 
> java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
>                     at 
> java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>                     at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>                     at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>                     at java.security.AccessController.doPrivileged(Native 
> Method)
>                     at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:361)
>                     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>                     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>                     at 
> org.apache.tika.parser.asm.ClassParser.parse(ClassParser.java:51)
>                     at 
> org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:98)
>                     at 
> org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:103)
> 
>     Suggested fix in ParseUtil:
> 
>     Replace
> 
>                 if (maxParseTime!=-1)
>                            parseResult = runParser(parsers[i], content);
>                 else
>                            parseResult = parsers[i].getParse(content);
> 
>     with
> 
>           try
>           {
>                 if (maxParseTime!=-1)
>                            parseResult = runParser(parsers[i], content);
>                 else
>                            parseResult = parsers[i].getParse(content);
>           } catch( Throwable e )
>           {
>             LOG.warn( "Parsing " + content.getUrl() + " with " + 
> parsers[i].getClass().getName() + "
>     failed: " + e.getMessage() ) ;
>             parseResult = null ;
>           }
> 
>     Also replace
> 
>           if (maxParseTime!=-1)
>                       parseResult = runParser(p, content);
>            else
>                       parseResult = p.getParse(content);
> 
>     with
> 
>         try
>         {
>           if (maxParseTime!=-1)
>                       parseResult = runParser(p, content);
>            else
>                       parseResult = p.getParse(content);
>         } catch( Throwable e )
>         {
>           LOG.warn( "Parsing " + content.getUrl() + " with " + 
> p.getClass().getName() + " failed: "
>     + e.getMessage() ) ;
>         }
> 
>     Regards,
>     Arkadi
> 
> 

Reply via email to