Hi,

In short no.

I see that just before we distribute the score to outlinks in line 70 of
DbUpdateMapper.java [0] there is a TODO which reads

// TODO: Outlink filtering (i.e. "only keep the first n outlinks")

I wonder if this could be why the if condition is satisfied in the toInt()
method (line 739) of Bytes.java [1]

Can you reproduce this and explain a bit more about the outlinks.size() for
the URL?

Thanks

Lewis

[0]
http://svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/crawl/DbUpdateMapper.java?view=markup
[1]
http://svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/util/Bytes.java?view=markup

On Wed, Dec 19, 2012 at 11:35 AM, Stanislav Orlenko
<[email protected]>wrote:

> Hello
> Have anyone faced such a problem?
>
> java.lang.IllegalArgumentException: offset (0) + length (4) exceed the
> capacity of the array: 2
>         at
> org.apache.nutch.util.Bytes.explainWrongLengthOrOffset(Bytes.java:559)
>         at org.apache.nutch.util.Bytes.toInt(Bytes.java:740)
>         at org.apache.nutch.util.Bytes.toFloat(Bytes.java:611)
>         at org.apache.nutch.util.Bytes.toFloat(Bytes.java:598)
>         at
>
> org.apache.nutch.scoring.opic.OPICScoringFilter.distributeScoreToOutlinks(OPICScoringFilter.java:128)
>         at
>
> org.apache.nutch.scoring.ScoringFilters.distributeScoreToOutlinks(ScoringFilters.java:117)
>         at
> org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:70)
>         at
> org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:37)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>
> Nutch version is 2.1.
>
> Thanks
>



-- 
*Lewis*

Reply via email to