Good Morning Friends: In order that I could not solve my problem with Nutch and Solr 4.4.0 1.7/2.2.1 I intend to publish what I have done from the beginning . 1 - I Downloaded solr 4.4.0 2 - I Downloaded Nutch 1.7 3 - I Copied the file to schema- solr4.xml / example/solr/collection1/conf and renamed to schema.xml 4 - When you start solr 4.4.0 , there was the following error: msg = SolrCore ' collection1 ' is not available due to init failure: Unable to use updateLog : _version_field must exist in schema , using indexed = "true " stored = "true " and multivalued = "false " ( _Version_ does not exist ) , trace = org.apache.solr.common.SolrException : SolrCore ' collection1 ' is not available due to init failure: Unable to use updateLog : _version_field must exist in schema , using indexed = "true " stored = "true " and multivalued = "false " ( _Version_ does not exist ) 5 - To resolve this error was added the following line to schema.xml : <field name="_version_" indexed="true" type="long" stored="true"/> 6 - The Nutch configuration files can be found here : nutch - site.xml : http://pastebin.com/Dh3tTacL regex - urlfilter : http://pastebin.com/eRdxPB1b seed.txt : http://pastebin.com/unNgJdmU 7 - When I run the next command: ./bin/nutch solrdedup http://localhost:8983/solr/
I get this hadoop.log file: 2013-10-21 14:22:31,645 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at 2013-10-21 14:22:31 2013-10-21 14:22:31,647 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/ 2013-10-21 14:22:32,050 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2013-10-21 14:22:32,927 WARN mapred.FileOutputCommitter - Output path is null in cleanup 2013-10-21 14:22:32,928 WARN mapred.LocalJobRunner - job_local741622751_0001 java.lang.Exception: java.lang.NullPointerException at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.NullPointerException at org.apache.hadoop.io.Text.encode(Text.java:388) at org.apache.hadoop.io.Text.set(Text.java:178) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:230) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:210) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Talat, can you explain me how to check solr index for committed documents? Sorry, I'm new with solr and nutch. I don'y know what I'm doing wrong, is necessary change to solr 3.x or solr 4.4.0 is find?? Can someone give me a tuto, step by step to integrate solr and nutch, I had followed the nutch tutorials in the web: http://wiki.apache.org/nutch/NutchTutorial , but I can get done the job Any ideas are welcomed Thanks for your time, friends, Luis Armando ________________________________________ De: Talat UYARER [[email protected]] Enviado el: viernes, 18 de octubre de 2013 10:59 p.m. Para: [email protected] Asunto: Re: Nutch 1.7 and Solr 4.4.0 Integrate Hi Luis, I am not sure what will be cause that. Did you check your solr index for committed document ? Maybe it didn't commit. You dont need run all over nutch jobs. Other jobs works fine. You can only run dedup job with : bin/nutch solrdedup sorl_url After that you can you share your solr.log. Talat La Universidad Central "Marta Abreu" de Las Villas en su 60 Aniversario. Fundada el 30 de noviembre de 1952. VisÃtenos en: http://www.uclv.edu.cu Participe en Universidad 2014, del 10 al 14 de febrero de 2014. Habana. Cuba. http://www.congresouniversidad.cu/

