Re: Fwd: HBase Pseudo mode - RegionServer disconnects after some time

Talat UYARER Mon, 21 Oct 2013 11:09:51 -0700

Hi Laxmi,

Sorry for my late reply. is Your total ram 6 GB ? If that is totalmemory of computer. Your computer use that. You should decrease yourheap size.You can read every property from the Hbase book. But I try to explainwhy use this property.

- You should increase your *hbase.client.scanner.caching* property from1 to 20. Now Gora has not filter method. Now in Nutch 2.x, every rowgetting from habse for GeneratorJob or FetchJob. That is tiring If youincrease that property as much as possible, That provide less connectionevery map/reduce.

- *hbase.regionserver.handler.count*, Count of RPC Listener instancesspun up on RegionServers.Little Note : The importance of reducing the number of separate RPCcalls is tied to the round-trip time, which is the time it takes for aclient to send a request and the server to send a response over thenetwork. This does not include the time required for the data transfer.It simply is the overhead of sending packages over the wire. On average,these take about 1ms on a LAN, which means you can handle 1,000round-trips per second only. The other important factor is the messagesize: if you send large requests over the network, you already need amuch lower number of round-trips, as most of the time is spenttransferring data. But when doing, for example, counter increments,which are small in size, you will see better performance when batchingupdates into fewer requests. (From: Hbase Definitive Guide page 86.)

- *hbase.client.write.buffer*, Default size of the HTable client writebuffer in bytes. A bigger buffer takes more memory -- on both the clientand server side since server instantiates the passed write buffer toprocess it -- but a larger buffer size reduces the number of RPCs made.For an estimate of server-side memory-used, your regionserver in sameserver, evaluate hbase.client.write.buffer *hbase.regionserver.handler.count * regionserver count * regionserver count.

In my opion, Your problem *hbase.client.scanner.caching*. If youincrease that. Your problem will be solved. If it wont be solved, youcan try other properties.


Have a nice day,
Talat

21-10-2013 04:31 tarihinde, A Laxmi yazdı:

Hi Talat -

Since I am running HBase Pseudo mode for Nutch, I have changed some of your
properties like *hbase.client.scanner.caching* to 1 and *
hbase.regionserver.handler.count* to 10 and *hbase.client.write.buffer* to
2097152. I emailed hbase user list with the out of memory error hoping to
find some help. I don't understand why crawling goes on fine until like 5th
iteration and in 5th iteration parsing stage of some urls, nutch just hangs
with an out of memory error: heap space. I have a 6GB RAM but I am not
crawling a million records, as a test run I am just trying to crawl a url
with depth 7 and topN 1000. I am not sure what can be done in this case.

Thanks for your help,
Laxmi


On Sun, Oct 20, 2013 at 3:55 AM, Talat UYARER <[email protected]>wrote:

Hey Laxmi,

First of all please send your email our maillist. Maybe somebody tell
their experiences.

If you use my settings with didnt change any value, your heap size will be
out of memory. That is normal. I have 64 GB Ram on my datanodes. You should
change my settings for sutiable your computer.

20-10-2013 05:13 tarihinde, A Laxmi yazdı:

Hi Talat!

Update - So I added some of the properties you recommended for tuning and
I have some good and bad news.
Good news is Region Server was not getting disconnected under heavy crawl
(thanks to Talat!!!!) and the bad news is I am getting an Out of memory:
heap space exception while I was in the 5th crawl iteration.

I have set 8GB for Heap in hbase_env.sh. Not sure why I have this out of
memory: heap space issue. Please comment.

Thanks
Laxmi




On Fri, Oct 18, 2013 at 4:16 PM, A Laxmi <[email protected] <mailto:
[email protected]**>> wrote:

     Thanks Talat! I will try it out again and you will be the first
     person to notify if mine works. I will keep you posted.

     Thanks
     Laxmi


     On Fri, Oct 18, 2013 at 3:52 PM, Talat UYARER
     <[email protected] 
<mailto:talat.uyarer@agmlab.**com<[email protected]>>>
wrote:

         For the issue, I didnt see any problem. You need some property
         for tuning. But it is not our subject. I share my
         hbase-site.xml. If you dont wrong remember. This issue related
         with only hbase.

         <configuration>
                 <property>
         <name>hbase.rootdir</name>
         <value>hdfs://hdpnn01.secret.**local:8080/hbase</value>

                 </property>
                 <property>
         <name>hbase.cluster.**distributed</name>
         <value>true</value>
                 </property>
                 <property>
         <name>hbase.hregion.max.**filesize</name>
         <value>10737418240</value>

                 </property>
                 <property>
         <name>hbase.zookeeper.quorum</**name>
         <value>hdpzk01.secret.local,**hdpzk02.secret.local,hdpzk03.**
secret.local</value>
                 </property>

                 <property>
         <name>hbase.client.scanner.**caching</name>
                         <value>200</value>
                 </property>
                 <property>
         <name>hbase.client.scanner.**timeout.period</name>
         <value>120000</value>
                 </property>
                 <property>
         <name>hbase.regionserver.**lease.period</name>
                     <value>900000</value>
               </property>
               <property>
         <name>hbase.rpc.timeout</name>
                     <value>900000</value>
               </property>


                 <property>
         <name>dfs.support.append</**name>
         <value>true</value>
                 </property>
               <property>
         <name>hbase.hregion.memstore.**mslab.enabled</name>

                     <value>true</value>
               </property>
                 <property>
         <name>hbase.regionserver.**handler.count</name>
                         <value>20</value>
                 </property>
                 <property>
         <name>hbase.client.write.**buffer</name>
         <value>20971520</value>
                 </property>


         18-10-2013 20:34 tarihinde, A Laxmi yazdı:

         Thanks, Talat! I will try with those properties you
         recommended. Please look at my other properties here and
         please let me know your comments.

         HBase: 0.90.6
         Hadoop: 0.20.205.0
         Nutch: 2.2.1

         Note: I have set 8GB for heap size in *hbase/conf/hbase-env.sh*

         ==============================**===========
         Below is my *hbase/conf/hbase-site.xml:

         ==============================**===========
         <property>
         <name>hbase.rootdir</name>
         <value>hdfs://localhost:8020/**hbase</value>
           </property>
         <property>
         <name>hbase.cluster.**distributed</name>
         <value>true</value>
                </property>
                <property>
         <name>hbase.zookeeper.quorum</**name>
         <value>localhost</value>
                </property>
               <property>
         <name>dfs.replication</name>
         <value>1</value>
               </property>
               <property>
         <name>hbase.zookeeper.**property.clientPort</name>
         <value>2181</value>
               </property>
               <property>
         <name>hbase.zookeeper.**property.dataDir</name>
         <value>/home/hadoop/hbase-0.**90.6/zookeeper</value>
                </property>
         <property>
         <name>zookeeper.session.**timeout</name>
         <value>60000</value>
         </property>

         ==============================**==========
         *
         Below is my*hadoop/conf/hdfs-site.xml
         *
         *=============================**===========


         <property>
         <name>dfs.support.append</**name>
         <value>true</value>
         </property>
          <property>
         <name>dfs.replication</name>
             <value>1</value>
           </property>
           <property>
         <name>dfs.safemode.extension</**name>
             <value>0</value>
           </property>
           <property>
         <name>dfs.safemode.min.**datanodes</name>
              <value>1</value>
           </property>
           <property>
         <name>dfs.permissions.enabled<**/name>
         <value>false</value>
           </property>
           <property>
         <name>dfs.permissions</name>
         <value>false</value>
           </property>
           <property>
         <name>dfs.safemode.min.**datanodes</name>
              <value>1</value>*
         *</property>

           <property>
         <name>dfs.webhdfs.enabled</**name>
         <value>true</value>
           </property>
           <property>
         <name>hadoop.tmp.dir</name>
         <value>/home/hadoop/dfs/tmp</**value>
           </property>

         ==============================**====
         *
         Below is my *hadoop/conf/core-site.xml*
         ==============================**====
         *<property>
             <name>fs.default.name <http://fs.default.name></**name>

         <!--    <value>hdfs://0.0.0.0:8020
         <http://0.0.0.0:8020></value> -->

         <value>hdfs://localhost:8020</**value>
           </property>*

         ==============================**=====
         Below is my*hadoop/conf/**mapred-site.**xml*
         ==============================**=====
         *<configuration>
         <property>
         <name>mapred.job.tracker</**name>
             <value>0.0.0.0:8021 <http://0.0.0.0:8021></value>

           </property>

         <property>
         <name>mapred.task.timeout</**name>
         <value>3600000</value>
           </property>

         </configuration>*
         ==============================**======
         *

         *


         On Fri, Oct 18, 2013 at 12:01 PM, Talat UYARER
         <[email protected] 
<mailto:talat.uyarer@agmlab.**com<[email protected]>>>
wrote:

             Ooh no :(. Sorry Laxmi. I had same issue. I said wrong
             settings :) I should set up

             - hbase.client.scanner.caching (My value: 200)
             - hbase.regionserver.handler.**count (My Value: 20)
             - hbase.client.write.buffer (My Value:20971520)

             You should set your region load. It will be solved.

             Talat


             18-10-2013 17:52 tarihinde, A Laxmi yazdı:

             Hi Talat,
             I am sorry to say - I have not fixed it yet. I have
             spent literally sleepless nights to debug that issue. No
             matter what I do, RegionServer always used to get
             disconnected. :(

             Since now I have a deadline in two days, I will go with
             your advise to one of my email to use HBase *standalone
             *mode since I am crawling about 10 urls to achieve about

             300,000 urls. Once I get that done, I will retry
             debugging the regionserver issue. I remember you use
             Hadoop 1.x and not 0.20.205.0 like me, I am not sure if
             there is any bug with I am using?

             Thanks,
             Laxmi


             On Fri, Oct 18, 2013 at 10:47 AM, Talat UYARER
             <[email protected]
             <mailto:talat.uyarer@agmlab.**com <[email protected]>>>
wrote:

                 Hi Laxmi,
                 You are welcome. I know very well that feel. I
                 havent used Cloudera. I use pure install
                 hadoop-hbase cluster base on Centos. I am happy
                 because of you fixed your issue.

                 Talat


                 18-10-2013 17:36 tarihinde, A Laxmi yazdı:

                 Thanks for the article, Talat! It was so annoying
                 to see RegionSever getting disconnected under heavy
                 load while everything works. Have you used Cloudera
                 for Nutch?


                 On Thu, Oct 17, 2013 at 6:50 PM, Talat UYARER
                 <[email protected]
                 <mailto:talat.uyarer@agmlab.**com<[email protected]>>>
wrote:

                     Hi Laxmi,

                     It didnt come me. I understand, your
                     RegionServer is gone away. Because of this
                     problem is your Hbase heap size or xceivers
                     count is not enough. I had same issue. I set up
                     my xceivers count. I am not sure what it will
                     be count.But you should check your heaps size
                     usage. If it is enough you can set up your
                     property. This article[1] very good about this
                     property.

                     [1]
                     http://blog.cloudera.com/blog/**
2012/03/hbase-hadoop-xceivers/<http://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/>

                     Talat

                     17-10-2013 22:20 tarihinde, A Laxmi yazdı:

                     Hi Talat,

                     I hope all is well. Could you please advise me
                     on this issue?

                     Thanks for your help!

                     ---------- Forwarded message ----------
                     From: *A Laxmi* <[email protected]
                     <mailto:[email protected]**>>
                     Date: Tue, Oct 15, 2013 at 12:07 PM
                     Subject: HBase Pseudo mode - RegionServer
                     disconnects after some time
                     To: [email protected]
                     <mailto:[email protected]>


                     Hi -

                     Please find the below log of HBase-master. I
                     have tried all sorts of fixes mentioned in
                     various threads yet I could not overcome this
                     issue. I made sure I dont have 127.0.1.1 in
                     /etc/hosts file. I pinged my localhost
                     (hostname) which gives back the actual IP and
                     not 127.0.0.1 using ping -c 1 localhost. I
                     have 'localhost' in my /etc/hostname and
                     actual IP address mapped to
                     localhost.localdomain and localhost as alias -
                     something like

                     /etc/hosts -

                     192.***.*.*** localhost.localdomain localhost

                     /etc/hostname -

                     localhost

                     I am using *Hadoop 0.20.205.0 and HBase 0.90.6
                     in Pseudo mode* for storing crawled data from

                     a crawler - Apache Nutch 2.2.1. I can start
                     Hadoop and HBase and when I do jps it shows
                     all good, then after that when I start Nutch
                     crawl after about 40mins of crawling or so, I
                     can see Nutch hanging up while in about 4th
                     iteration of parsing and at the same time when
                     I do jps in HBase, I can see everything except
                     HRegionServer. Below is the log.

                     I tried all possible ways but couldn't
                     overcome this issue. I really need someone
                     from HBase list to help me with this issue.


                     2013-10-15 02:02:08,285 DEBUG
                     org.apache.hadoop.hbase.**
regionserver.wal.HLogSplitter:
                     Pushed=56 entries from
                     hdfs://localhost:8020/hbase/.**logs/127.0.0.1
                     <http://127.0.0.1>,60020,**1381814216471/
127.0.0.1
                     <http://127.0.0.1>%3A60020.**1381816329235

                     2013-10-15 02:02:08,285 DEBUG
                     org.apache.hadoop.hbase.**
regionserver.wal.HLogSplitter:
                     Splitting hlog 28 of 29:
                     hdfs://localhost:8020/hbase/.**logs/127.0.0.1
                     <http://127.0.0.1>,60020,**1381814216471/
127.0.0.1
                     <http://127.0.0.1>%3A60020.**1381816367672,

                     length=64818440
                     2013-10-15 02:02:08,285 WARN
                     org.apache.hadoop.hbase.util.**FSUtils: Running
                     on HDFS without append enabled may result in
                     data loss
                     2013-10-15 02:02:08,554 DEBUG
                     org.apache.*hadoop.hbase.**master.HMaster: Not

                     running balancer because processing dead
                     regionserver(s): [127.0.0.1,60020*,**
1381814216471]

                     2013-10-15 02:02:08,556 INFO
                     org.apache.hadoop.hbase.**
catalo*g.CatalogTracker:

                     Failed verification of .META.,,1 at
                     address=127.0.0.1:60020
                     <http://127.0.0.1:60020>;
                     java.net.ConnectException: Connection refused*

                     2013-10-15 02:02:08,559 INFO
                     org.apache.hadoop.hbase.**
catalog.*CatalogTracker:
                     Current cached META location is not valid*,

                     resetting
                     2013-10-15 02:02:08,601 WARN
                     org.apache.hadoop.*hbase.**master.CatalogJanitor:
Failed

                     scan of catalog table
                     org.apache.hadoop.hbase.**
NotAllMetaRegionsOnlineExcepti**on:
                     Timed out (2147483647ms)*

                             at
                     org.apache.hadoop.hbase.**catalog.CatalogTracker.
**waitForMeta(CatalogTracker.**java:390)
                             at
                     org.apache.hadoop.hbase.**catalog.CatalogTracker.
**waitForMetaServerConnectionDef**ault(CatalogTracker.java:422)
                             at
                     org.apache.hadoop.hbase.**
catalog.MetaReader.fullScan(**MetaReader.java:255)
                             at
                     org.apache.hadoop.hbase.**
catalog.MetaReader.fullScan(**MetaReader.java:237)
                             at
                     org.apache.hadoop.hbase.**
master.CatalogJanitor.scan(**CatalogJanitor.java:120)
                             at
                     org.apache.hadoop.hbase.**
master.CatalogJanitor.chore(**CatalogJanitor.java:88)
                             at
                     org.apache.hadoop.hbase.Chore.**
run(Chore.java:66)
                     2013-10-15 02:02:08,842 INFO
                     org.apache.hadoop.hbase.**regionserver.wal.**
SequenceFileLogWriter:
                     syncFs -- HDFS-200 -- not available,
                     dfs.support.append=false
                     2013-10-15 02:02:08,842 DEBUG
                     org.apache.hadoop.hbase.**
regionserver.wal.HLogSplitter:
                     Creating writer
                     path=hdfs://localhost:8020/**hbase/1_webpage/**
853ef78be7c0853208e865a9ff13d5**fb/recovered.edits/**
0000000000000001556.**tempregion=**853ef78be7c0853208e865a9ff13d5**fb
                     2013-10-15 02:02:09,443 DEBUG
                     org.apache.hadoop.hbase.**
regionserver.wal.HLogSplitter:
                     Pushed=39 entries from
                     hdfs://localhost:8020/hbase/.**logs/127.0.0.1
                     <http://127.0.0.1>,60020,**1381814216471/
127.0.0.1
                     <http://127.0.0.1>%3A60020.**1381816367672

                     2013-10-15 02:02:09,444 DEBUG
                     org.apache.hadoop.hbase.**
regionserver.wal.HLogSplitter:
                     Splitting hlog 29 of 29:
                     hdfs://localhost:8020/hbase/.**logs/127.0.0.1
                     <http://127.0.0.1>,60020,**1381814216471/
127.0.0.1
                     <http://127.0.0.1>%3A60020.**1381816657239,
length=0

                     Thanks for your help!

Re: Fwd: HBase Pseudo mode - RegionServer disconnects after some time

Reply via email to