Hi Talat,
Thanks for the information, I tried with 1.8 nutch, It
works fine and job compeleted.
However, i was not able to find the
data indexed in solr even i gave below command where solr url is
mentioned:
./crawl /user/nutch/urls /tmp/nutch_1_8_first_output
http://solr-server:8983/solr 1
I was assuming that after migrating
and by specifying solr-server url while running would ensure that data
crawled would get indexed automatically to solr.
is that not the
case?
If not then how do i manually do it?
:)
________________________________________________
From:"Talat Uyarer" <[email protected]>
Sent:[email protected]
Date:Tue, September 2, 2014 8:35 pm
Subject:Re: NullPointerException occured during indexing to solr from
nutch 1.7 source build.
> Hi,
>
> This is an issue. Below is the code of SolrDeleteDuplicate class
from
> nutch
> 1.7 trunk where the solr record is deleted by id field. As documents
don't
> have the url field therefore the id of the documents empty, so its
> throwing
> a null pointer exception when it runs.
>
> Now i am writing on my phone. i diÅŸ not find this issue.
But if you
> update
> from 1.7 to newer version. You will not get this error.
>
> Talat
> On Sep 2, 2014 10:22 AM, <[email protected]>
wrote:
>
>>
>>
>>
>> Hi,
>> I have taken nutch 1.7 source and copied
>>
mapred-site.xml,hdfs-site.xml,yarn-site.xml,hadoop-env.sh,core-site.xml
>> from my Hadoop 2.3.0-cdh5.1.0 and did an ant build.
>> Then went on to
>> runtime/deploy/bin to start the crawling. it successfully
submitted
>> the jobs to my yarn. But later during indexing to solr, i'm
getting
>> below
>> exceptions.
>> I have copied the scheme-solr4.xml to my solr and added
>> exceptions in regex-urlfilter.txt for a particular website which
i give
>> for crawling in the directory urls/seed.txt.
>> Error:
>> java.lang.NullPointerException
>>
>> at
>> org.apache.hadoop.io.Text.encode(Text.java:443)
>>
>> at
>> org.apache.hadoop.io.Text.set(Text.java:198)
>>
>> at
>>
>>
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270)
>>
>> at
>>
>>
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241)
>>
>> at
>>
>>
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198)
>>
>> at
>>
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184)
>>
>> at
>> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
>>
>> at
>>
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>>
>> at
>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>>
>> at
>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>>
>> at
>> java.security.AccessController.doPrivileged(Native Method)
>>
>> at
>> javax.security.auth.Subject.doAs(Subject.java:415)
>>
>> at
>>
>>
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>>
>> at
>> org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>>
>>
>>
>> Kindly, can any one tell me how to solve this issue? I'm
>> basically
>> stuck
>> here!!
>>
>>
>