On 30/01/2012 10:29, Denis Sinner wrote:
Hello,
I have Nutch setup on a Linux server with Solr running on the same server. When 
i try to crawl some websites i get a job fail at the indexing part of the crawl:

2012-01-30 10:11:29,445 INFO  solr.SolrWriter - Adding 1 documents
2012-01-30 10:11:29,698 WARN  mapred.LocalJobRunner - job_local_0009
org.apache.solr.common.SolrException: Not Found

Not Found

request: 
http://127.0.0.1:8080/solr_3-5/searchdkdde_en/update?wt=javabin&version=2
         at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
         at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
         at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
         at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
         at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:93)
         at 
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
         at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
         at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2012-01-30 10:11:30,362 ERROR solr.SolrIndexer - java.io.IOException: Job 
failed!

This message means that either:

* there is no solr instance listening on that URL, or
* there is no such handler using that URL, i.e. the path /solr_3-5/searchdkdde_en/update in fact is not a Solr update handler.

Now, when i call the 'solrindex' command with the freshly created crawl Folders 
as arguments, the document gets indexed.

Most likely it uses a different Solr url (perhaps without the .../update part?).


So - what is the difference between the 'solrindex' command and the indexing 
part of the 'crawl' command? What could be the reason why the latter doesn't 
work?
My solution would be to write a shell script that simulates the 'crawl' command 
by calling the single inject/fetch/index/... commands.




--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to