You might be run into this issue:

https://issues.apache.org/jira/browse/NUTCH-1100

2013/10/21 Luis Armando Roca Fumero <[email protected]>:
> Good Morning Friends:
> In order that I could not solve my problem with Nutch and Solr 4.4.0 
> 1.7/2.2.1 I intend to publish what I have done from the beginning .
> 1 - I Downloaded solr 4.4.0
> 2 - I Downloaded Nutch 1.7
> 3 - I Copied the file to schema- solr4.xml / example/solr/collection1/conf 
> and renamed to schema.xml
> 4 - When you start solr 4.4.0 , there was the following error: msg = SolrCore 
> ' collection1 ' is not available due to init failure:
> Unable to use updateLog : _version_field must exist in schema , using indexed 
> = "true " stored = "true " and multivalued = "false " ( _Version_ does not 
> exist ) , trace = org.apache.solr.common.SolrException : SolrCore ' 
> collection1 ' is not available due to init failure: Unable to use updateLog : 
> _version_field must exist in schema , using indexed = "true " stored = "true 
> " and multivalued = "false " ( _Version_ does not exist )
> 5 - To resolve this error was added the following line to schema.xml : <field 
> name="_version_" indexed="true" type="long" stored="true"/>
> 6 - The Nutch configuration files can be found here :
>    nutch - site.xml : http://pastebin.com/Dh3tTacL
>    regex - urlfilter : http://pastebin.com/eRdxPB1b
>    seed.txt : http://pastebin.com/unNgJdmU
> 7 - When I run the next command: ./bin/nutch solrdedup 
> http://localhost:8983/solr/
>
> I get this hadoop.log file:
> 2013-10-21 14:22:31,645 INFO  solr.SolrDeleteDuplicates - 
> SolrDeleteDuplicates: starting at 2013-10-21 14:22:31
> 2013-10-21 14:22:31,647 INFO  solr.SolrDeleteDuplicates - 
> SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/
> 2013-10-21 14:22:32,050 WARN  util.NativeCodeLoader - Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 2013-10-21 14:22:32,927 WARN  mapred.FileOutputCommitter - Output path is 
> null in cleanup
> 2013-10-21 14:22:32,928 WARN  mapred.LocalJobRunner - job_local741622751_0001
> java.lang.Exception: java.lang.NullPointerException
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
> Caused by: java.lang.NullPointerException
>         at org.apache.hadoop.io.Text.encode(Text.java:388)
>         at org.apache.hadoop.io.Text.set(Text.java:178)
>         at 
> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270)
>         at 
> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241)
>         at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:230)
>         at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:210)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>         at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
>
>
>
> Talat, can you explain me how to check solr index for committed documents? 
> Sorry, I'm new with solr and nutch.
> I don'y know what I'm doing wrong, is necessary change to solr 3.x or solr 
> 4.4.0 is find?? Can someone give me a tuto, step by step to integrate solr 
> and nutch, I had followed the nutch tutorials in the web:
> http://wiki.apache.org/nutch/NutchTutorial , but I can get done the job
>
> Any ideas are welcomed
> Thanks for your time, friends,
> Luis Armando
> ________________________________________
> De: Talat UYARER [[email protected]]
> Enviado el: viernes, 18 de octubre de 2013 10:59 p.m.
> Para: [email protected]
> Asunto: Re: Nutch 1.7 and Solr 4.4.0 Integrate
>
> Hi Luis,
>
> I am not sure what will be cause that. Did you check your solr index for
> committed document ? Maybe it didn't commit. You dont need run all over
> nutch jobs. Other jobs works fine. You can only run dedup job with :
> bin/nutch solrdedup sorl_url
> After that you can you share your solr.log.
>
> Talat
>
>
> La Universidad Central "Marta Abreu" de Las Villas en su 60 Aniversario. 
> Fundada el 30 de noviembre de 1952. Visítenos en:  http://www.uclv.edu.cu
> Participe en Universidad 2014, del 10 al 14 de febrero de 2014. Habana. Cuba. 
> http://www.congresouniversidad.cu/
>
>

Reply via email to