Greetings, I am trying to integrate nutch 1.3 and solr 3.4. I am using bin/nutch crawl command with solr param, but before to finish completly the process, I get the following output in my terminal:
SolrIndexer: starting at 2011-11-10 15:58:39 java.io.IOException: Job failed! SolrDeleteDuplicates: starting at 2011-11-10 15:58:44 SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/ SolrDeleteDuplicates: finished at 2011-11-10 15:58:45, elapsed: 00:00:01 crawl finished: ../data I thinks that something is wrong because the Job fail with java.io.IOException. The last lines in my hadoop.log are: no segments* file found in org.apache.lucene.store.NIOFSDirectory@/opt/apache-solr-3.4.0/example/solr/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@9b1670: files: [write.lock] org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.NIOFSDirectory@/opt/apache-solr-3.4.0/example/solr/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@9b1670: files: [write.lock] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:712) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:593) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1152) at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:83) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:101) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:175) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:223) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:158) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.se request: http://localhost:8983/solr/update?wt=javabin&version=2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:436) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:82) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) 2011-11-10 15:58:44,309 ERROR solr.SolrIndexer - java.io.IOException: Job failed! 2011-11-10 15:58:44,311 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting at 2011-11-10 15:58:44 2011-11-10 15:58:44,311 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/ 2011-11-10 15:58:45,512 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: finished at 2011-11-10 15:58:45, elapsed: 00:00:01 2011-11-10 15:58:45,512 INFO crawl.Crawl - crawl finished: ../data Any idea? Greetings -- -------------------------------------------------------------------------------------------- Yusniel Hidalgo Delgado Universidad de las Ciencias Informáticas https://twitter.com/#!/yhdelgado La Habana, Cuba. -------------------------------------------------------------------------------------------- Fin a la injusticia, LIBERTAD AHORA A NUESTROS CINCO COMPATRIOTAS QUE SE ENCUENTRAN INJUSTAMENTE EN PRISIONES DE LOS EEUU! http://www.antiterroristas.cu http://justiciaparaloscinco.wordpress.com

