I have already installed Solr.I want to integrate it with nutch. Whenever I try to issue this command to nutch ""bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20150406231502/ -filter -normalize"
I always get a error Indexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186) Here is the complete hadoop log for the process.I have underlined the error part in it. 2015-04-07 09:38:06,613 INFO indexer.IndexingJob - Indexer: starting at 2015-04-07 09:38:06 2015-04-07 09:38:06,684 INFO indexer.IndexingJob - Indexer: deleting gone documents: false 2015-04-07 09:38:06,685 INFO indexer.IndexingJob - Indexer: URL filtering: true 2015-04-07 09:38:06,685 INFO indexer.IndexingJob - Indexer: URL normalizing: true 2015-04-07 09:38:06,893 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 2015-04-07 09:38:06,893 INFO indexer.IndexingJob - Active IndexWriters : SOLRIndexWriter solr.server.url : URL of the SOLR instance (mandatory) solr.commit.size : buffer size when sending to SOLR (default 1000) solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml) solr.auth : use authentication (default false) solr.auth.username : use authentication (default false) solr.auth : username for authentication solr.auth.password : password for authentication 2015-04-07 09:38:06,898 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb 2015-04-07 09:38:06,898 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb 2015-04-07 09:38:06,898 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20150406231502 2015-04-07 09:38:07,036 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2015-04-07 09:38:07,540 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2015-04-07 09:38:07,565 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 09:38:09,552 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 09:38:10,642 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 09:38:10,734 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 09:38:10,895 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 09:38:11,088 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 09:38:11,219 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: content dest: content 2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: title dest: title 2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: host dest: host 2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: segment dest: segment 2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: boost dest: boost 2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: digest dest: digest 2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: tstamp dest: tstamp 2015-04-07 09:38:11,526 INFO solr.SolrIndexWriter - Indexing 250 documents 2015-04-07 09:38:11,526 INFO solr.SolrIndexWriter - Deleting 0 documents 2015-04-07 09:38:11,644 INFO solr.SolrIndexWriter - Indexing 250 documents *2015-04-07 09:38:11,699 WARN mapred.LocalJobRunner - job_local1245074757_0001* *org.apache.solr.common.SolrException: Not Found* *Not Found* *request: http://localhost:8983/solr/update?wt=javabin&version=2 <http://localhost:8983/solr/update?wt=javabin&version=2>* * at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)* * at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)* * at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)* * at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:135)* * at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:88)* * at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)* * at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)* * at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:458)* * at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:500)* * at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:323)* * at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)* * at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)* * at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)* * at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)* *2015-04-07 09:38:12,408 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!* * at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)* * at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)* * at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)* * at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)* * at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)* On Tue, 7 Apr 2015 at 03:18 Jeff Cocking <[email protected]> wrote: > With Solr5.0.0 you can skip that step. Solr will auto create your schema > document based on the data being provided. > > One of the new features with Solr5 is the install/service feature. I did a > quick write up on how to install Solr5 on Centos. Might be something > useful there for you. > > http://www.cocking.com/apache-solr-5-0-install-on-centos-7/ > > jeff > > On Mon, Apr 6, 2015 at 3:13 PM, Anchit Jain <[email protected]> > wrote: > > > I want to index nutch results using *Solr 5.0* but as mentioned in > > https://wiki.apache.org/nutch/NutchTutorial there is no directory > > ${APACHE_SOLR_HOME}/example/solr/collection1/conf/ > > in solr 5.0 . So where I have to copy *schema.xml*? > > Also there is no *start.jar* present in example directory. > > >

