Hi,
I have taken nutch 1.7 source and copied
mapred-site.xml,hdfs-site.xml,yarn-site.xml,hadoop-env.sh,core-site.xml
from my Hadoop 2.3.0-cdh5.1.0 and did an ant build.
Then went on to
runtime/deploy/bin to start the crawling. it successfully submitted
the jobs to my yarn. But later during indexing to solr, i'm getting below
exceptions.
I have copied the scheme-solr4.xml to my solr and added
exceptions in regex-urlfilter.txt for a particular website which i give
for crawling in the directory urls/seed.txt.
Error:
java.lang.NullPointerException
at
org.apache.hadoop.io.Text.encode(Text.java:443)
at
org.apache.hadoop.io.Text.set(Text.java:198)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184)
at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at
java.security.AccessController.doPrivileged(Native Method)
at
javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at
org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Kindly, can any one tell me how to solve this issue? I'm basically stuck
here!!