Hello,

I have created a custom nutch solr indexer as a jar file and put it under 
nutch_home/lib. It runs successfully in local mode. In deploy mode it gives the 
following error. The same jar file is included in job file and lib/

java.lang.RuntimeException: java.io.IOException: WritableName can't load class: 
org.apache.nutch.parse.ParseData
        at 
org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1673)
        at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
        at 
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
        at 
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
        at 
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
        at 
org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
        at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: WritableName can't load class: 
org.apache.nutch.parse.ParseData
        at org.apache.hadoop.io.WritableName.getClass(WritableName.java:73)
        at 
org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
        ... 13 more
Caused by: java.lang.ClassNotFoundException: org.apache.nutch.parse.ParseData
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
        at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
        ... 14 more

This is because of the line of the code 

    FileInputFormat.addInputPath(job, new Path(segment, ParseData.DIR_NAME));


If I include CrawlDatum as

 FileInputFormat.addInputPath(job, new Path(segment, 
CrawlDatum.PARSE_DIR_NAME));

It gives the exact error with the class being as CrawlDatum.

Also, I have done exactly the same thing in nutch-2.x and it runs successfully 
in both local and deploy modes.


Any ideas how to fix this issue?

Thanks.
Alex.

Reply via email to