Good Morning Friends:
In order that I could not solve my problem with Nutch and Solr 4.4.0 1.7/2.2.1 
I intend to publish what I have done from the beginning .
1 - I Downloaded solr 4.4.0
2 - I Downloaded Nutch 1.7
3 - I Copied the file to schema- solr4.xml / example/solr/collection1/conf and 
renamed to schema.xml
4 - When you start solr 4.4.0 , there was the following error: msg = SolrCore ' 
collection1 ' is not available due to init failure:
Unable to use updateLog : _version_field must exist in schema , using indexed = 
"true " stored = "true " and multivalued = "false " ( _Version_ does not exist 
) , trace = org.apache.solr.common.SolrException : SolrCore ' collection1 ' is 
not available due to init failure: Unable to use updateLog : _version_field 
must exist in schema , using indexed = "true " stored = "true " and multivalued 
= "false " ( _Version_ does not exist )
5 - To resolve this error was added the following line to schema.xml : <field 
name="_version_" indexed="true" type="long" stored="true"/>
6 - The Nutch configuration files can be found here :
   nutch - site.xml : http://pastebin.com/Dh3tTacL
   regex - urlfilter : http://pastebin.com/eRdxPB1b
   seed.txt : http://pastebin.com/unNgJdmU
7 - When I run the next command: ./bin/nutch solrdedup 
http://localhost:8983/solr/

I get this hadoop.log file:
2013-10-21 14:22:31,645 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: 
starting at 2013-10-21 14:22:31
2013-10-21 14:22:31,647 INFO  solr.SolrDeleteDuplicates - SolrDeleteDuplicates: 
Solr url: http://localhost:8983/solr/
2013-10-21 14:22:32,050 WARN  util.NativeCodeLoader - Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
2013-10-21 14:22:32,927 WARN  mapred.FileOutputCommitter - Output path is null 
in cleanup
2013-10-21 14:22:32,928 WARN  mapred.LocalJobRunner - job_local741622751_0001
java.lang.Exception: java.lang.NullPointerException
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.io.Text.encode(Text.java:388)
        at org.apache.hadoop.io.Text.set(Text.java:178)
        at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270)
        at 
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241)
        at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:230)
        at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:210)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)



Talat, can you explain me how to check solr index for committed documents? 
Sorry, I'm new with solr and nutch.
I don'y know what I'm doing wrong, is necessary change to solr 3.x or solr 
4.4.0 is find?? Can someone give me a tuto, step by step to integrate solr and 
nutch, I had followed the nutch tutorials in the web:
http://wiki.apache.org/nutch/NutchTutorial , but I can get done the job

Any ideas are welcomed
Thanks for your time, friends,
Luis Armando
________________________________________
De: Talat UYARER [[email protected]]
Enviado el: viernes, 18 de octubre de 2013 10:59 p.m.
Para: [email protected]
Asunto: Re: Nutch 1.7 and Solr 4.4.0 Integrate

Hi Luis,

I am not sure what will be cause that. Did you check your solr index for
committed document ? Maybe it didn't commit. You dont need run all over
nutch jobs. Other jobs works fine. You can only run dedup job with :
bin/nutch solrdedup sorl_url
After that you can you share your solr.log.

Talat


La Universidad Central "Marta Abreu" de Las Villas en su 60 Aniversario. 
Fundada el 30 de noviembre de 1952. Visítenos en:  http://www.uclv.edu.cu
Participe en Universidad 2014, del 10 al 14 de febrero de 2014. Habana. Cuba. 
http://www.congresouniversidad.cu/


Reply via email to