I hava found the problem, The nutch is not initially support Chinese . In Chinese two token may be overlap. For example:"可爱的小女生" may be parse to “可爱”、“小女”、“女生”。 so,the two token "小女" and “女生” are overlap. And this overlap cause the error at org.apache.nutch.summary.basic.BasicSummarizer.getSummary(BasicSummarizer.java:188).
2010/12/13 Bupo Jung <[email protected]> > Hi, > I use "org.apache.nutch.searcher.NutchBean" to search some Chinese words. > It return the right result most of the time,but sometimes it only return > the total hits but no summarys. > And I found a StringIndexOutOfBoundsException in the hadoop.log as follow: > > 2010-12-13 19:43:54,277 ERROR searcher.NutchBean - Exception occured while > executing search: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: > java.lang.StringIndexOutOfBoundsException: String index out of range: -2 > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.StringIndexOutOfBoundsException: String index out of range: -2 > at > org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:297) > at org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:350) > at org.apache.nutch.searcher.NutchBean.main(NutchBean.java:410) > Caused by: java.util.concurrent.ExecutionException: > java.lang.StringIndexOutOfBoundsException: String index out of range: -2 > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > at > org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:292) > ... 2 more > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: -2 > at java.lang.String.substring(String.java:1937) > at > org.apache.nutch.summary.basic.BasicSummarizer.getSummary(BasicSummarizer.java:188) > at > org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:263) > at > org.apache.nutch.searcher.FetchedSegments$SummaryTask.call(FetchedSegments.java:63) > at > org.apache.nutch.searcher.FetchedSegments$SummaryTask.call(FetchedSegments.java:53) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > Is there any clue what cause this error? > > from bupo.jung >

