I found out that the key sent to unreverseUrl in DbUpdateMapper.map was ":index.php/http"
This happened in the depth 3 and I checked seed file there was no line in the form of http:/index.php Thanks. Alex. -----Original Message----- From: Ferdy Galema <[email protected]> To: user <[email protected]> Sent: Mon, Aug 13, 2012 1:53 am Subject: Re: updatedb error in nutch-2.0 Hi, In the specific case of Alex, it means that a row name in the database is malformed. Looking at the stacktrace lines in TableUtil, it looks like an url is stored without protocol (at least without a ":"). This is probably because of redirected urls not correctly being checked for wellformedness. If you look at line 664 in the FetcherReducer (HEAD) it writes out a new url directly as a row in the database. I have never experienced this exception and this might be because I changed some behaviour that makes sure a redirected url is handled a bit more like a general outlink. I have created an issue for this that I will update shortly: https://issues.apache.org/jira/browse/NUTCH-1448 Ferdy. On Mon, Aug 13, 2012 at 2:52 AM, <[email protected]> wrote: > The url is stored in a different order (reversed domain > name:protocol:port and path) from the order normally seen in your web > browser so that it can be searched more quickly in NoSQL data stores > like hbase. Nutch has a brief explanation and convenience utility > methods around this at TableUtil > (http://nutch.apache.org/apidocs-2.0/org/apache/nutch/util/TableUtil.htm > l) > > > -----Original Message----- > From: [email protected] [mailto:[email protected]] > Sent: Monday, August 13, 2012 9:25 AM > To: [email protected] > Subject: updatedb error in nutch-2.0 > > > > Hello, > > > I get the following error when I do bin/nutch updatedb in nutch-2.0 with > hbase > > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.nutch.util.TableUtil.unreverseUrl(TableUtil.java:98) > at > org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:54) > at > org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:37) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > > I see this is because of reversing and unreversing urls. What is the > idea behind this reversal and unreversal in nutch-2.0? > > Thanks. > Alex. > > >

