Any opinions about a problem we have with 0.9 are appreciated.
The problem is that hits are found via command line NutchBean
invocation, (in this small test case 333 hits) however, the result
set is zero hits due to the exception. Luke also accesses these same
indexes just fine.
Got the Hadoop patch that was referred to in the archives, because
the description seemed applicable, however it appears to be the same
version of hadoop-core: 12.2.2 that came with nutch 0.9. Is that
patch already integrated into the most recent 0.9 nutch release or
is it otherwise not applicable? Can someone tell me what the problem
is given the exception in the log below?
This looks similar to a problem I had when I was trying to use an
older crawl (one generated by a version of Nutch in between 0.8.1 and
0.9) with the 0.9 distribution.
E.g. if the page content was saved using an older version of Nutch,
then when the summarizer tries to load the content, you can run into
this exception.
-- Ken
Thanks.
Lauren Massa-Lochridge
eXlr8, Inc.
$ bin/nutch org.apache.nutch.searcher.NutchBean news
Total hits: 333
Exception in thread "main" java.lang.RuntimeException:
java.lang.ClassCastExcept
ion: org.apache.hadoop.io.Text
at
org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
java:204)
at
org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:344)
at org.apache.nutch.searcher.NutchBean.main(NutchBean.java:395)
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text
at org.apache.hadoop.io.UTF8.compareTo(UTF8.java:123)
at
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.ja
va:107)
at
org.apache.hadoop.io.MapFile$Reader.binarySearch(MapFile.java:369)
at org.apache.hadoop.io.MapFile$Reader.seek(MapFile.java:338)
at org.apache.hadoop.io.MapFile$Reader.get(MapFile.java:392)
at
org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFo
rmat.java:86)
at
org.apache.nutch.searcher.FetchedSegments$Segment.getEntry(FetchedSeg
ments.java:95)
at
org.apache.nutch.searcher.FetchedSegments$Segment.getParseText(Fetche
dSegments.java:86)
at
org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
java:159)
at
org.apache.nutch.searcher.FetchedSegments$SummaryThread.run(FetchedSe
gments.java:177)
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"