Re: Nutch 2.0 and HBase 0.90.4

Adriana Farina Mon, 04 Feb 2013 02:18:44 -0800

I solved my issue and I want to write how I solved it in case somebody else
runs into the same problem.


*/etc/hosts* was not configured properly: I had to configure it as
described in [0]. For each machine of my cluster, I had to comment the line
*127.0.0.1 localhost* and add *localhost *to the line where my master's
address was written.



[0]
http://stackoverflow.com/questions/7791788/hbase-client-do-not-able-to-connect-with-remote-hbase-server



2013/1/31 Adriana Farina <[email protected]>

> Hello,
>
> I've set up a cluster of 4 machines with Hadoop 1.0.4 and I'm trying to
> run nutch 2.0 in distributed mode using HBase 0.90.4 to store crawling
> informations.
> I've followed the tutorial 
> Nutch2Tutorial<https://wiki.apache.org/nutch/Nutch2Tutorial> and configured
> HBase following the guide http://hbase.apache.org/book/quickstart.html.
> However, when I try to run nutch, the crawling process runs for a little
> bit and then I get the following exception:
>
>
> org.apache.gora.util.GoraException:
> org.apache.hadoop.hbase.MasterNotRunningException: master:60000
>         at
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
>         at
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:118)
>         at
> org.apache.gora.mapreduce.GoraOutputFormat.getRecordWriter(GoraOutputFormat.java:88)
>         at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:628)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:753)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: org.apache.hadoop.hbase.MasterNotRunningException: master:60000
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:396)
>         at
> org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:94)
>         at
> org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:108)
>         at
> org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
>         at
> org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
>         ... 10 more
>
>
> After that, the crawling process keeps on running, but after some
> map\reduce cycles it outputs that exception again and so on...
> The strange thing is that the hbase master is up and running: there is no
> error in the log files and I can access http://localhost:60010/ with no
> problem.
>
> My hbase-site.xml is:
>
>
>  <property>
>     <name>hbase.master</name>
>     <value>crawler1a:60000</value>
>     <description>The host and port that the HBase master runs
> at.</description>
>   </property>
>
>   <property>
>     <name>hbase.rootdir</name>
>     <value>hdfs://*master ip address*:54310/hbase</value>
>     <description>The directory shared by region servers.</description>
>   </property>
>
>   <property>
>     <name>hbase.cluster.distributed</name>
>     <value>true</value>
>     <description>The mode the cluster will be in. Possible values are
>       false: standalone and pseudo-distributed setups with managed
> Zookeeper
>       true: fully-distributed with unmanaged Zookeeper Quorum (see
> hbase-env.sh)
>     </description>
>   </property>
>
>  <!--<property>
>     <name>hbase.zookeeper.quorum</name>
>     <value>*master ip address*</value>
>  </property>-->
>
>   <property>
>     <name>hbase.zookeeper.property.dataDir</name>
>     <value>/usr/local/hbase-0.90.4/zookeeper_data</value>
>   </property>
>
>     <property>
>       <name>hbase.zookeeper.quorum</name>
>       <value>*cluster machines addresses*</value>
>       <description>Comma separated list of servers in the ZooKeeper Quorum.
>       For example, "host1.mydomain.com,host2.mydomain.com,
> host3.mydomain.com".
>        By default this is set to localhost for local and
> pseudo-distributed modes
>       of operation. For a fully-distributed setup, this should be set to a
> full
>       list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in
> hbase-env.sh
>       this is the list of servers which we will start/stop ZooKeeper on.
>       </description>
>     </property>
>
> <property>
>     <name>zookeeper.session.timeout</name>
>     <value>30000</value>
>     <description>ZooKeeper session timeout.
>         HBase passes this to the zk quorum as suggested maximum time for a
>         session (This setting becomes zookeeper's 'maxSessionTimeout').
>  See
>
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions
>         "The client sends a requested timeout, the server responds with the
>         timeout that it can give the client. " In milliseconds.
>     </description>
>   </property>
>
> </configuration>
>
>
>
> Searching on google, I've found that it can be an issue due to /etc/hosts,
> but it's correctly configured:
>
> 127.0.0.1               crawler1a localhost.localdomain localhost
>
> where crawler1a is the master machine both for hadoop and for hbase.
>
>
> Anybody can help?
>
> Thank you very much.
>
> --
> Adriana Farina
>



-- 
Adriana Farina

Re: Nutch 2.0 and HBase 0.90.4

Reply via email to