Hello,
I have Nutch setup on a Linux server with Solr running on the same server. When 
i try to crawl some websites i get a job fail at the indexing part of the crawl:

2012-01-30 10:11:29,445 INFO  solr.SolrWriter - Adding 1 documents
2012-01-30 10:11:29,698 WARN  mapred.LocalJobRunner - job_local_0009
org.apache.solr.common.SolrException: Not Found

Not Found

request: 
http://127.0.0.1:8080/solr_3-5/searchdkdde_en/update?wt=javabin&version=2
        at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
        at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
        at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
        at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:93)
        at 
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
        at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2012-01-30 10:11:30,362 ERROR solr.SolrIndexer - java.io.IOException: Job 
failed!



Now, when i call the 'solrindex' command with the freshly created crawl Folders 
as arguments, the document gets indexed.

So - what is the difference between the 'solrindex' command and the indexing 
part of the 'crawl' command? What could be the reason why the latter doesn't 
work?
My solution would be to write a shell script that simulates the 'crawl' command 
by calling the single inject/fetch/index/... commands.

-- 

[Entwickler]

dkd Internet Service GmbH
development // kommunikation // design
Kaiserstraße 73
60329 Frankfurt/Main

fon:  +49 69 2475218-0
fax:  +49 69 2475218-99
e-mail: [email protected]
twitter: http://twitter.com/dkd_de
facebook: http://www.facebook.com/www.dkd.de
web: http://www.dkd.de

Registergericht: Amtsgericht Frankfurt am Main
Registernummer: HRB 45590
Geschäftsführer: Olivier Dobberkau, Søren Schaffstein, Götz Wegenast, Christian 
Zabanski

Aktuelle Projekte:
http://www.spielwarenmesse-eg.de – Relaunch & Responsive Design (TYPO3)
http://www.horsch.com – Relaunch Website (TYPO3)
http://www.dosb.de – Refresh Website (TYPO3)






Reply via email to