Ken,
Thanks very much - you were right. I'd never made the mistake before of
copying in the newly created, ( 0.9 ), /crawl which resulted in adding
to the existing 8.1 segments, rather than deleting all of the old 8.1
and thereby replacing /crawl entirely; your response prompted me to look
at that again and sure enough that's what it was!
Thanks.
Lauren Massa-Lochridge
eXlr8, Inc.
Ken Krugler wrote:
Any opinions about a problem we have with 0.9 are appreciated.
The problem is that hits are found via command line NutchBean
invocation, (in this small test case 333 hits) however, the result
set is zero hits due to the exception. Luke also accesses these same
indexes just fine.
Got the Hadoop patch that was referred to in the archives, because
the description seemed applicable, however it appears to be the same
version of hadoop-core: 12.2.2 that came with nutch 0.9. Is that
patch already integrated into the most recent 0.9 nutch release or is
it otherwise not applicable? Can someone tell me what the problem is
given the exception in the log below?
This looks similar to a problem I had when I was trying to use an
older crawl (one generated by a version of Nutch in between 0.8.1 and
0.9) with the 0.9 distribution.
E.g. if the page content was saved using an older version of Nutch,
then when the summarizer tries to load the content, you can run into
this exception.
-- Ken
Thanks.
Lauren Lochridge
eXlr8, Inc.
$ bin/nutch org.apache.nutch.searcher.NutchBean news
Total hits: 333
Exception in thread "main" java.lang.RuntimeException:
java.lang.ClassCastExcept
ion: org.apache.hadoop.io.Text
at
org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
java:204)
at
org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:344)
at
org.apache.nutch.searcher.NutchBean.main(NutchBean.java:395)
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text
at org.apache.hadoop.io.UTF8.compareTo(UTF8.java:123)
at
org.apache.hadoop.io.WritableComparator.compare(WritableComparator.ja
va:107)
at
org.apache.hadoop.io.MapFile$Reader.binarySearch(MapFile.java:369)
at org.apache.hadoop.io.MapFile$Reader.seek(MapFile.java:338)
at org.apache.hadoop.io.MapFile$Reader.get(MapFile.java:392)
at
org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFo
rmat.java:86)
at
org.apache.nutch.searcher.FetchedSegments$Segment.getEntry(FetchedSeg
ments.java:95)
at
org.apache.nutch.searcher.FetchedSegments$Segment.getParseText(Fetche
dSegments.java:86)
at
org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.
java:159)
at
org.apache.nutch.searcher.FetchedSegments$SummaryThread.run(FetchedSe
gments.java:177)
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general