Hi,
I have taken nutch 1.7 source and copied
mapred-site.xml,hdfs-site.xml,yarn-site.xml,hadoop-env.sh,core-site.xml
from my Hadoop 2.3.0-cdh5.1.0 and did an ant build.
Then went on to
runtime/deploy/bin  to start the crawling. it successfully submitted
the jobs to my yarn. But later during indexing to solr, i'm getting below
exceptions.
I have copied the scheme-solr4.xml to my solr and added
exceptions in regex-urlfilter.txt for a particular website which i give
for crawling in the directory urls/seed.txt.
Error:
java.lang.NullPointerException

                at
org.apache.hadoop.io.Text.encode(Text.java:443)

                at
org.apache.hadoop.io.Text.set(Text.java:198)

                at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270)

                at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241)

                at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198)

                at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184)

                at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)

                at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)

                at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)

                at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)

                at
java.security.AccessController.doPrivileged(Native Method)

                at
javax.security.auth.Subject.doAs(Subject.java:415)

                at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)

                at
org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

         

        Kindly, can any one tell me how to solve this issue? I'm basically stuck
here!! 

Reply via email to