Hi, I'm a nutch newbie, using nutch 0.9 on Solaris. So like a sensible newbie I followed a tutorial here: http://wiki.apache.org/nutch/NutchTutorial.
Alas when I reached the subsection called "Step-by-Step: Indexing", I encountered a problem: bash-3.00$ nutch index crawl/indexes crawl/crawldb crawl/linkdb crawl/segments/* Indexer: starting Indexer: linkdb: crawl/linkdb Indexer: adding segment: crawl/segments/20081221122211 Indexer: adding segment: crawl/segments/20081221122227 Indexer: adding segment: crawl/segments/20081221122356 Indexer: adding segment: crawl/segments/20081221123015 Indexer: adding segment: crawl/segments/20081221134456 Indexer: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /d2/nutch/crawl/indexes already exists at org.apache.hadoop.mapred.OutputFormatBase.checkOutputSpecs(OutputFormatBase.java:96) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:329) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543) at org.apache.nutch.indexer.Indexer.index(Indexer.java:273) at org.apache.nutch.indexer.Indexer.run(Indexer.java:295) at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189) at org.apache.nutch.indexer.Indexer.main(Indexer.java:278) I've read a bit in the mailing lists about similar problems, but I'd like to get a recommended solution. Also, the tutorial should be changed to reflect the solution cos as it stands it doesn't work as expected. Secondly, I can do a search from the command line: bash-3.00$ nutch org.apache.nutch.searcher.NutchBean tcl Total hits: 3 0 20081221122227/http://www.apache.org/ ... SpamAssassin STDCXX Struts Synapse Tapestry TCL Tiles Tomcat Turbine Tuscany Velocity ... [snipped for brevity] But the same query using Tomcat gives a blank page and this error: 21-Dec-2008 15:03:18 org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet jsp threw exception java.lang.ArithmeticException: / by zero at org.apache.hadoop.mapred.lib.HashPartitioner.getPartition(HashPartitioner.java:35) at org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFormat.java:85) at org.apache.nutch.searcher.FetchedSegments$Segment.getEntry(FetchedSegments.java:95) at org.apache.nutch.searcher.FetchedSegments$Segment.getParseText(FetchedSegments.java:86) at org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:159) at org.apache.nutch.searcher.FetchedSegments$SummaryThread.run(FetchedSegments.java:177) Although some queries work fine, e.g. the search for "apache" in the example page. Can somebody help me with my newbieness? Thanks. M.
