Hi, In short no.
I see that just before we distribute the score to outlinks in line 70 of DbUpdateMapper.java [0] there is a TODO which reads // TODO: Outlink filtering (i.e. "only keep the first n outlinks") I wonder if this could be why the if condition is satisfied in the toInt() method (line 739) of Bytes.java [1] Can you reproduce this and explain a bit more about the outlinks.size() for the URL? Thanks Lewis [0] http://svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/crawl/DbUpdateMapper.java?view=markup [1] http://svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/util/Bytes.java?view=markup On Wed, Dec 19, 2012 at 11:35 AM, Stanislav Orlenko <[email protected]>wrote: > Hello > Have anyone faced such a problem? > > java.lang.IllegalArgumentException: offset (0) + length (4) exceed the > capacity of the array: 2 > at > org.apache.nutch.util.Bytes.explainWrongLengthOrOffset(Bytes.java:559) > at org.apache.nutch.util.Bytes.toInt(Bytes.java:740) > at org.apache.nutch.util.Bytes.toFloat(Bytes.java:611) > at org.apache.nutch.util.Bytes.toFloat(Bytes.java:598) > at > > org.apache.nutch.scoring.opic.OPICScoringFilter.distributeScoreToOutlinks(OPICScoringFilter.java:128) > at > > org.apache.nutch.scoring.ScoringFilters.distributeScoreToOutlinks(ScoringFilters.java:117) > at > org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:70) > at > org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:37) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > > Nutch version is 2.1. > > Thanks > -- *Lewis*

