Well, there are two tutorials that I found. http://thewiki4opentech.org/index.php/Nutch
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ As far as benefits of solr go, I am not entirely sure. solr is a search engine, but nutch seems to have one of its own. You can either use solrindex and index to solr or run index and index with nutch's search engine. the regular index seems to work fine with hadoop's task tracker and job tracker daemons. Solrindex seems to have to run single threaded. What I am doing right now in my script is starting up the hadoop daemons with one configuration to take advantage of mutiple cores and threads, then stopping the daemons right before the script kicks off solrindex, and starting up the hadoop daemons with the tasks set to one using --config /path/to/different/conf/dir in the hadoop script. The problem with that is solrindex is really slow on a sun sparc processor if it is running single threaded. I just tried creating a processor set and assigning all the nutch/hadoop processes to it and that caused it to produce errors like: org.apache.solr.common.SolrException: no segments* file found in org.apache.lucene.store.FSDirectory@/nutchnew/crawl/index: files: java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.FSDirectory@/search_fodors/nutchnew/crawl/index: files: at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:604) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:272) at org.apache.lucene.index.In dexWriter.init(IndexWriter.java:1158) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:938) at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:116) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122) at org.apache.solr.update.DirectUpdat eHandler2.openWriter(DirectUpdateHandler2.java:167) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:221) at org.apache .solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59) at org.apache.solr.handler.XmlUpdateRequestHandler.processUpd ate(XmlUpdateRequestHandler.java:196) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123) at org.apache .solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) So I am still figuring out how to get solrindex to work faster. On Thu, Oct 7, 2010 at 8:38 AM, Israel <[email protected]> wrote: > Hi Steve, I don“t have the answer to your question, but wanted to ask how > to > integrate SOLR to nutch 1.2 and what brings benefits. Or if you have your > own tutorial. >

