Hi,

In the specific case of Alex, it means that a row name in the database is
malformed. Looking at the stacktrace lines in TableUtil, it looks like an
url is stored without protocol (at least without a ":"). This is probably
because of redirected urls not correctly being checked for wellformedness.
If you look at line 664 in the FetcherReducer (HEAD) it writes out a new
url directly as a row in the database. I have never experienced this
exception and this might be because I changed some behaviour that makes
sure a redirected url is handled a bit more like a general outlink. I have
created an issue for this that I will update shortly:
https://issues.apache.org/jira/browse/NUTCH-1448

Ferdy.

On Mon, Aug 13, 2012 at 2:52 AM, <[email protected]> wrote:

> The url is stored in a different order (reversed domain
> name:protocol:port and path) from the order normally seen in your web
> browser so that it can be searched more quickly in NoSQL data stores
> like hbase. Nutch has a brief explanation and convenience utility
> methods around this at TableUtil
> (http://nutch.apache.org/apidocs-2.0/org/apache/nutch/util/TableUtil.htm
> l)
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Monday, August 13, 2012 9:25 AM
> To: [email protected]
> Subject: updatedb error in nutch-2.0
>
>
>
> Hello,
>
>
> I get the following error when I do bin/nutch updatedb in nutch-2.0 with
> hbase
>
> java.lang.ArrayIndexOutOfBoundsException: 1
>         at
> org.apache.nutch.util.TableUtil.unreverseUrl(TableUtil.java:98)
>         at
> org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:54)
>         at
> org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:37)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>
> I see this is because of reversing and unreversing urls. What is the
> idea behind this reversal and unreversal in nutch-2.0?
>
> Thanks.
> Alex.
>
>
>

Reply via email to