Hi Talat,
Thanks for the information, I tried with 1.8 nutch, It
works fine and job compeleted.
However, i was not able to find the
data indexed in solr even i gave below command where solr url is
mentioned:
./crawl /user/nutch/urls /tmp/nutch_1_8_first_output
http://solr-server:8983/solr 1
I was assuming that after migrating
and by specifying solr-server url while running would ensure that data
crawled would get indexed automatically to solr.
is that not the
case? 
If not then how do i manually do it?
:)
________________________________________________

From:"Talat Uyarer" <[email protected]>

Sent:[email protected]

Date:Tue, September 2, 2014 8:35 pm

Subject:Re: NullPointerException occured during indexing to solr from
nutch 1.7 source build.





> Hi,

>

> This is an issue. Below is the code of SolrDeleteDuplicate class
from

> nutch

> 1.7 trunk where the solr record is deleted by id field. As documents
don't

> have the url field therefore the id of the documents empty, so its

> throwing

> a null pointer exception when it runs.

>

> Now i am writing on my phone. i diÅ&Yuml; not find this issue.
But if you

> update

> from 1.7 to newer version. You will not get this error.

>

> Talat

> On Sep 2, 2014 10:22 AM, <[email protected]>
wrote:

>

>>

>>

>>

>> Hi,

>> I have taken nutch 1.7 source and copied

>>
mapred-site.xml,hdfs-site.xml,yarn-site.xml,hadoop-env.sh,core-site.xml

>> from my Hadoop 2.3.0-cdh5.1.0 and did an ant build.

>> Then went on to

>> runtime/deploy/bin to start the crawling. it successfully
submitted

>> the jobs to my yarn. But later during indexing to solr, i'm
getting

>> below

>> exceptions.

>> I have copied the scheme-solr4.xml to my solr and added

>> exceptions in regex-urlfilter.txt for a particular website which
i give

>> for crawling in the directory urls/seed.txt.

>> Error:

>> java.lang.NullPointerException

>>

>> at

>> org.apache.hadoop.io.Text.encode(Text.java:443)

>>

>> at

>> org.apache.hadoop.io.Text.set(Text.java:198)

>>

>> at

>>

>>
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270)

>>

>> at

>>

>>
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241)

>>

>> at

>>

>>
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198)

>>

>> at

>>
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184)

>>

>> at

>> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)

>>

>> at

>>
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)

>>

>> at

>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)

>>

>> at

>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)

>>

>> at

>> java.security.AccessController.doPrivileged(Native Method)

>>

>> at

>> javax.security.auth.Subject.doAs(Subject.java:415)

>>

>> at

>>

>>
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)

>>

>> at

>> org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

>>

>>

>>

>> Kindly, can any one tell me how to solve this issue? I'm

>> basically

>> stuck

>> here!!

>>

>>

>

 

Reply via email to