Any opinions about a problem we have with 0.9 are appreciated.
The problem is that hits are found via command line NutchBean invocation, (in this small test case 333 hits) however, the result set is zero hits due to the exception. Luke also accesses these same indexes just fine.

Got the Hadoop patch that was referred to in the archives, because the description seemed applicable, however it appears to be the same version of hadoop-core: 12.2.2 that came with nutch 0.9. Is that patch already integrated into the most recent 0.9 nutch release or is it otherwise not applicable? Can someone tell me what the problem is given the exception in the log below?

This looks similar to a problem I had when I was trying to use an older crawl (one generated by a version of Nutch in between 0.8.1 and 0.9) with the 0.9 distribution.

E.g. if the page content was saved using an older version of Nutch, then when the summarizer tries to load the content, you can run into this exception.

-- Ken


Thanks.
Lauren Massa-Lochridge
eXlr8, Inc.

   $ bin/nutch org.apache.nutch.searcher.NutchBean news
   Total hits: 333
   Exception in thread "main" java.lang.RuntimeException:
   java.lang.ClassCastExcept
   ion: org.apache.hadoop.io.Text
           at
   org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
   java:204)
           at
   org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:344)
           at org.apache.nutch.searcher.NutchBean.main(NutchBean.java:395)
   Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text
           at org.apache.hadoop.io.UTF8.compareTo(UTF8.java:123)
           at
   org.apache.hadoop.io.WritableComparator.compare(WritableComparator.ja
   va:107)
           at
   org.apache.hadoop.io.MapFile$Reader.binarySearch(MapFile.java:369)
           at org.apache.hadoop.io.MapFile$Reader.seek(MapFile.java:338)
           at org.apache.hadoop.io.MapFile$Reader.get(MapFile.java:392)
           at
   org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFo
   rmat.java:86)
           at
   org.apache.nutch.searcher.FetchedSegments$Segment.getEntry(FetchedSeg
   ments.java:95)
           at
   org.apache.nutch.searcher.FetchedSegments$Segment.getParseText(Fetche
   dSegments.java:86)
           at
   org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
   java:159)
           at
   org.apache.nutch.searcher.FetchedSegments$SummaryThread.run(FetchedSe
   gments.java:177)


--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"

Reply via email to