Oft. This is a brutal issue to raise on Friday afternoon ;) For reference, the Cassandra issue https://issues.apache.org/jira/browse/NUTCH-1390 There have been no changes to the WebTableReader for over two months http://svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/crawl/WebTableReader.java It is great that more people are using 2.x HEAD, especially you guys using it with HBase.
Are you able to check if page.getPrevModifiedTime() and getBatchId() return values? If not this may be why we are throwing the NPE. All I can think of (right now) is that some URL entries do not have such fields for when they were generated... or something similar with the other accessor. By the sounds of it we've introduced a bug(s) within the most recent commits to 2.x HEAD within the Generator area! On Fri, Mar 29, 2013 at 4:35 PM, <[email protected]> wrote: > Yes, with hbase. Here is the error > > 13/03/29 16:33:29 INFO zookeeper.ZooKeeper: Session: 0x13d7770d67d005f > closed > 13/03/29 16:33:29 ERROR crawl.WebTableReader: WebTableReader: > java.lang.NullPointerException > at > org.apache.gora.hbase.store.HBaseStore.addFields(HBaseStore.java:398) > at > org.apache.gora.hbase.store.HBaseStore.execute(HBaseStore.java:360) > at > org.apache.nutch.crawl.WebTableReader.read(WebTableReader.java:234) > at > org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:476) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > org.apache.nutch.crawl.WebTableReader.main(WebTableReader.java:412) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > If I revert to previous release it works fine. > > Thanks. > Alex. > > > > > > -----Original Message----- > From: Lewis John Mcgibbney <[email protected]> > To: user <[email protected]> > Sent: Fri, Mar 29, 2013 4:30 pm > Subject: Re: error using generate in 2.x > > > Hi Alex, > With HBase also? > There 'was' a bug in gora-cassandra module for this command + params > however I thought it had been addressed and therefore resolved it. > Lewis > > > On Fri, Mar 29, 2013 at 4:00 PM, <[email protected]> wrote: > > > Hi, > > > > It seems that trunk has a few bugs. I found out that readdb -url urlname > > also gives errors. > > > > Thanks. > > Alex. > > > > > > > > > > > > > > > > -----Original Message----- > > From: kaveh minooie <[email protected]> > > To: user <[email protected]> > > Sent: Fri, Mar 29, 2013 1:53 pm > > Subject: Re: error using generate in 2.x > > > > > > Hi lewis > > > > the mapping file that I am using is the one that comes with nutch, and I > > haven't touched it. this message in the log is caused by using the > > -crawlId on the command line. for example this log was the result of > > this command : > > > > bin/nutch generate -topN 1000 -crawlId t1 > > > > which causes the nutch( or i guess technically gora ) to use a table > > name 't1_webpage'. thou, I have to say that i don't understand the > > rational behind the code generating a warning like this ( I mean I know > > it is not actually a warning, just that the way the message has been > > phrased makes it look like warning) for something that should be a > > routine operation. for someone like me who is crawling ( i mean hoping > > to cause it is not working right now ) thousands of websites to maintain > > multiple crawldb ( or its equivalent in gora, webpage table ) for > > different group of websites. > > > > > > Now that being said, it has nothing to do with the problem that I am > > having. it is the same when I ommit the -crawlId parameter ( forcing it > > to use the default name webpage ), and more importantly it is new. I > > haven't had this problem before, it just started to happening 2 days ago > > when i pulled the latest commits to 2.x branch. > > > > > > On 03/29/2013 09:50 AM, Lewis John Mcgibbney wrote: > > > Hi Kaveh, > > > Firstly, as logged below, Gora attempts to associate your HBase table > > > configuration with specified tables (from within > gora-hbase-mapping.xml) > > > however it seems that your case satisfies the condition "if > > > (!tableName.equals(tableNameFromMapping))" meaining that the table name > > is > > > not equal to the value for the table name attribute or that this value > is > > > null. > > > This is allowed, but I am interested to find out what the mapping file > > > looks like... the entire file is not required, just the <class > > name="value" > > > snippet if this is possible. > > > I am not using the gora-hbase module and haven't ever seen anyone come > > > across this problem before. > > > Thanks > > > Lewis > > > > > > On Thursday, March 28, 2013, kaveh minooie <[email protected]> wrote: > > > > > >> 2013-03-28 11:06:25,158 INFO store.HBaseStore - Keyclass and > nameclass > > > match but mismatching table names mappingfile schema is 'webpage' vs > > > actual schema 't1_webpage' , assuming they are the same. > > > > > > > -- > > Kaveh Minooie > > > > > > > > > -- > *Lewis* > > > -- *Lewis*

