Hi,

This is what happened:

java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
        at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:213)
        <...>
Caused by: java.lang.IncompatibleClassChangeError: class 
org.apache.tika.parser.asm.XHTMLClassVisitor has interface 
org.objectweb.asm.ClassVisitor as super class
                at java.lang.ClassLoader.defineClass1(Native Method)
                at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
                at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
                at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
                at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
                at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
                at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
                at java.security.AccessController.doPrivileged(Native Method)
                at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
                at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
                at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
                at 
org.apache.tika.parser.asm.ClassParser.parse(ClassParser.java:51)
                at 
org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:98)
                at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:103)

Suggested fix in ParseUtil:

Replace

            if (maxParseTime!=-1)
                       parseResult = runParser(parsers[i], content);
            else
                       parseResult = parsers[i].getParse(content);

with

      try
      {
            if (maxParseTime!=-1)
                       parseResult = runParser(parsers[i], content);
            else
                       parseResult = parsers[i].getParse(content);
      } catch( Throwable e )
      {
        LOG.warn( "Parsing " + content.getUrl() + " with " + 
parsers[i].getClass().getName() + " failed: " + e.getMessage() ) ;
        parseResult = null ;
      }

Also replace

      if (maxParseTime!=-1)
                  parseResult = runParser(p, content);
       else
                  parseResult = p.getParse(content);

with

    try
    {
      if (maxParseTime!=-1)
                  parseResult = runParser(p, content);
       else
                  parseResult = p.getParse(content);
    } catch( Throwable e )
    {
      LOG.warn( "Parsing " + content.getUrl() + " with " + 
p.getClass().getName() + " failed: " + e.getMessage() ) ;
    }

Regards,
Arkadi

Reply via email to