Hello,
This wednesday we experienced trouble running the 1.12 injector on Hadoop
2.7.3. We operated 2.7.2 before and we had no trouble running a job.
2017-01-18 15:36:53,005 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error
running child : java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.Counter, but class was expected
at org.apache.nutch.crawl.Injector$InjectMapper.map(Injector.java:216)
at org.apache.nutch.crawl.Injector$InjectMapper.map(Injector.java:100)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found
interface org.apache.hadoop.mapreduce.Counter, but class was expected
at org.apache.nutch.crawl.Injector.inject(Injector.java:383)
at org.apache.nutch.crawl.Injector.run(Injector.java:467)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:441)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Our processes retried injecting for a few minutes until we manually shut it
down. Meanwhile on HDFS, our CrawlDB was gone, thanks for snapshots and/or
backups we could restore it, so enable those if you haven't done so yet.
These freak Hadoop errors can be notoriously difficult to debug but it seems we
are in luck, recompile Nutch with Hadoop 2.7.3 instead 2.4.0. You are also in
luck if your job file uses the old org.hadoop.mapred.* API, only jobs using the
org.hadoop.mapreduce.* API seem to fail.
Reference issue: https://issues.apache.org/jira/browse/NUTCH-2354
Regards,
Markus