On 30/01/2012 10:29, Denis Sinner wrote:
Hello,
I have Nutch setup on a Linux server with Solr running on the same server. When
i try to crawl some websites i get a job fail at the indexing part of the crawl:
2012-01-30 10:11:29,445 INFO solr.SolrWriter - Adding 1 documents
2012-01-30 10:11:29,698 WARN mapred.LocalJobRunner - job_local_0009
org.apache.solr.common.SolrException: Not Found
Not Found
request:
http://127.0.0.1:8080/solr_3-5/searchdkdde_en/update?wt=javabin&version=2
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:93)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2012-01-30 10:11:30,362 ERROR solr.SolrIndexer - java.io.IOException: Job
failed!
This message means that either:
* there is no solr instance listening on that URL, or
* there is no such handler using that URL, i.e. the path
/solr_3-5/searchdkdde_en/update in fact is not a Solr update handler.
Now, when i call the 'solrindex' command with the freshly created crawl Folders
as arguments, the document gets indexed.
Most likely it uses a different Solr url (perhaps without the .../update
part?).
So - what is the difference between the 'solrindex' command and the indexing
part of the 'crawl' command? What could be the reason why the latter doesn't
work?
My solution would be to write a shell script that simulates the 'crawl' command
by calling the single inject/fetch/index/... commands.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com