Newbie + problems with nutch

Freminlins Sun, 21 Dec 2008 08:07:49 -0800

Hi,

I'm a nutch newbie, using nutch 0.9 on Solaris. So like a sensible newbie I
followed a tutorial here: http://wiki.apache.org/nutch/NutchTutorial.


Alas when I reached the subsection called "Step-by-Step: Indexing", I
encountered a problem:

bash-3.00$ nutch index crawl/indexes crawl/crawldb crawl/linkdb
crawl/segments/*
Indexer: starting
Indexer: linkdb: crawl/linkdb
Indexer: adding segment: crawl/segments/20081221122211
Indexer: adding segment: crawl/segments/20081221122227
Indexer: adding segment: crawl/segments/20081221122356
Indexer: adding segment: crawl/segments/20081221123015
Indexer: adding segment: crawl/segments/20081221134456
Indexer: org.apache.hadoop.mapred.FileAlreadyExistsException: Output
directory /d2/nutch/crawl/indexes
already exists
  at
org.apache.hadoop.mapred.OutputFormatBase.checkOutputSpecs(OutputFormatBase.java:96)
  at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:329)
  at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543)
  at org.apache.nutch.indexer.Indexer.index(Indexer.java:273)
  at org.apache.nutch.indexer.Indexer.run(Indexer.java:295)
  at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
  at org.apache.nutch.indexer.Indexer.main(Indexer.java:278)

I've read a bit in the mailing lists about similar problems, but I'd like to
get a recommended solution. Also, the tutorial should be changed to reflect
the solution cos as it stands it doesn't work as expected.


Secondly, I can do a search from the command line:

bash-3.00$ nutch org.apache.nutch.searcher.NutchBean tcl Total hits: 3
 0 20081221122227/http://www.apache.org/
 ... SpamAssassin STDCXX Struts Synapse Tapestry TCL Tiles Tomcat Turbine
Tuscany Velocity ...

[snipped for brevity]

But the same query using Tomcat gives a blank page and this error:

21-Dec-2008 15:03:18 org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet jsp threw exception
java.lang.ArithmeticException: / by zero
  at
org.apache.hadoop.mapred.lib.HashPartitioner.getPartition(HashPartitioner.java:35)
  at
org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFormat.java:85)
  at
org.apache.nutch.searcher.FetchedSegments$Segment.getEntry(FetchedSegments.java:95)
  at
org.apache.nutch.searcher.FetchedSegments$Segment.getParseText(FetchedSegments.java:86)
  at
org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:159)
  at
org.apache.nutch.searcher.FetchedSegments$SummaryThread.run(FetchedSegments.java:177)

Although some queries work fine, e.g. the search for "apache" in the example
page.

Can somebody help me with my newbieness?

Thanks.

M.

Newbie + problems with nutch

Reply via email to