You might be run into this issue: https://issues.apache.org/jira/browse/NUTCH-1100
2013/10/21 Luis Armando Roca Fumero <[email protected]>: > Good Morning Friends: > In order that I could not solve my problem with Nutch and Solr 4.4.0 > 1.7/2.2.1 I intend to publish what I have done from the beginning . > 1 - I Downloaded solr 4.4.0 > 2 - I Downloaded Nutch 1.7 > 3 - I Copied the file to schema- solr4.xml / example/solr/collection1/conf > and renamed to schema.xml > 4 - When you start solr 4.4.0 , there was the following error: msg = SolrCore > ' collection1 ' is not available due to init failure: > Unable to use updateLog : _version_field must exist in schema , using indexed > = "true " stored = "true " and multivalued = "false " ( _Version_ does not > exist ) , trace = org.apache.solr.common.SolrException : SolrCore ' > collection1 ' is not available due to init failure: Unable to use updateLog : > _version_field must exist in schema , using indexed = "true " stored = "true > " and multivalued = "false " ( _Version_ does not exist ) > 5 - To resolve this error was added the following line to schema.xml : <field > name="_version_" indexed="true" type="long" stored="true"/> > 6 - The Nutch configuration files can be found here : > nutch - site.xml : http://pastebin.com/Dh3tTacL > regex - urlfilter : http://pastebin.com/eRdxPB1b > seed.txt : http://pastebin.com/unNgJdmU > 7 - When I run the next command: ./bin/nutch solrdedup > http://localhost:8983/solr/ > > I get this hadoop.log file: > 2013-10-21 14:22:31,645 INFO solr.SolrDeleteDuplicates - > SolrDeleteDuplicates: starting at 2013-10-21 14:22:31 > 2013-10-21 14:22:31,647 INFO solr.SolrDeleteDuplicates - > SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/ > 2013-10-21 14:22:32,050 WARN util.NativeCodeLoader - Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 2013-10-21 14:22:32,927 WARN mapred.FileOutputCommitter - Output path is > null in cleanup > 2013-10-21 14:22:32,928 WARN mapred.LocalJobRunner - job_local741622751_0001 > java.lang.Exception: java.lang.NullPointerException > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) > Caused by: java.lang.NullPointerException > at org.apache.hadoop.io.Text.encode(Text.java:388) > at org.apache.hadoop.io.Text.set(Text.java:178) > at > org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270) > at > org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:230) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:210) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > > > > Talat, can you explain me how to check solr index for committed documents? > Sorry, I'm new with solr and nutch. > I don'y know what I'm doing wrong, is necessary change to solr 3.x or solr > 4.4.0 is find?? Can someone give me a tuto, step by step to integrate solr > and nutch, I had followed the nutch tutorials in the web: > http://wiki.apache.org/nutch/NutchTutorial , but I can get done the job > > Any ideas are welcomed > Thanks for your time, friends, > Luis Armando > ________________________________________ > De: Talat UYARER [[email protected]] > Enviado el: viernes, 18 de octubre de 2013 10:59 p.m. > Para: [email protected] > Asunto: Re: Nutch 1.7 and Solr 4.4.0 Integrate > > Hi Luis, > > I am not sure what will be cause that. Did you check your solr index for > committed document ? Maybe it didn't commit. You dont need run all over > nutch jobs. Other jobs works fine. You can only run dedup job with : > bin/nutch solrdedup sorl_url > After that you can you share your solr.log. > > Talat > > > La Universidad Central "Marta Abreu" de Las Villas en su 60 Aniversario. > Fundada el 30 de noviembre de 1952. VisÃtenos en: http://www.uclv.edu.cu > Participe en Universidad 2014, del 10 al 14 de febrero de 2014. Habana. Cuba. > http://www.congresouniversidad.cu/ > >

