I use git and i fetch from github (https://github.com/apache/nutch.git )
currently I am on this commit:
commit 4bb01d6b908dc230c8be89d398b03a86581ec42b
Author: lufeng <[email protected]>
Date: Thu Mar 28 13:09:09 2013 +0000
NUTCH-1547 BasicIndexingFilter - Problem to index full title
git-svn-id:
https://svn.apache.org/repos/asf/nutch/branches/2.x@1462079
13f79535-47bb-0310-9956-ffa450edef68
before I was on this commit :
commit f02dcf62566583551426c08bd388080e5b2bc93e
> f02dcf6 NUTCH-XX remove unused db.max.inlinks from nutch-default.xml
On 03/29/2013 04:35 PM, [email protected] wrote:
Yes, with hbase. Here is the error
13/03/29 16:33:29 INFO zookeeper.ZooKeeper: Session: 0x13d7770d67d005f closed
13/03/29 16:33:29 ERROR crawl.WebTableReader: WebTableReader:
java.lang.NullPointerException
at
org.apache.gora.hbase.store.HBaseStore.addFields(HBaseStore.java:398)
at org.apache.gora.hbase.store.HBaseStore.execute(HBaseStore.java:360)
at org.apache.nutch.crawl.WebTableReader.read(WebTableReader.java:234)
at org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:476)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.WebTableReader.main(WebTableReader.java:412)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
If I revert to previous release it works fine.
Thanks.
Alex.
-----Original Message-----
From: Lewis John Mcgibbney <[email protected]>
To: user <[email protected]>
Sent: Fri, Mar 29, 2013 4:30 pm
Subject: Re: error using generate in 2.x
Hi Alex,
With HBase also?
There 'was' a bug in gora-cassandra module for this command + params
however I thought it had been addressed and therefore resolved it.
Lewis
On Fri, Mar 29, 2013 at 4:00 PM, <[email protected]> wrote:
Hi,
It seems that trunk has a few bugs. I found out that readdb -url urlname
also gives errors.
Thanks.
Alex.
-----Original Message-----
From: kaveh minooie <[email protected]>
To: user <[email protected]>
Sent: Fri, Mar 29, 2013 1:53 pm
Subject: Re: error using generate in 2.x
Hi lewis
the mapping file that I am using is the one that comes with nutch, and I
haven't touched it. this message in the log is caused by using the
-crawlId on the command line. for example this log was the result of
this command :
bin/nutch generate -topN 1000 -crawlId t1
which causes the nutch( or i guess technically gora ) to use a table
name 't1_webpage'. thou, I have to say that i don't understand the
rational behind the code generating a warning like this ( I mean I know
it is not actually a warning, just that the way the message has been
phrased makes it look like warning) for something that should be a
routine operation. for someone like me who is crawling ( i mean hoping
to cause it is not working right now ) thousands of websites to maintain
multiple crawldb ( or its equivalent in gora, webpage table ) for
different group of websites.
Now that being said, it has nothing to do with the problem that I am
having. it is the same when I ommit the -crawlId parameter ( forcing it
to use the default name webpage ), and more importantly it is new. I
haven't had this problem before, it just started to happening 2 days ago
when i pulled the latest commits to 2.x branch.
On 03/29/2013 09:50 AM, Lewis John Mcgibbney wrote:
Hi Kaveh,
Firstly, as logged below, Gora attempts to associate your HBase table
configuration with specified tables (from within gora-hbase-mapping.xml)
however it seems that your case satisfies the condition "if
(!tableName.equals(tableNameFromMapping))" meaining that the table name
is
not equal to the value for the table name attribute or that this value is
null.
This is allowed, but I am interested to find out what the mapping file
looks like... the entire file is not required, just the <class
name="value"
snippet if this is possible.
I am not using the gora-hbase module and haven't ever seen anyone come
across this problem before.
Thanks
Lewis
On Thursday, March 28, 2013, kaveh minooie <[email protected]> wrote:
2013-03-28 11:06:25,158 INFO store.HBaseStore - Keyclass and nameclass
match but mismatching table names mappingfile schema is 'webpage' vs
actual schema 't1_webpage' , assuming they are the same.
--
Kaveh Minooie
--
Kaveh Minooie