Re: Fwd: HBase Pseudo mode - RegionServer disconnects after some time

Talat UYARER Sun, 20 Oct 2013 00:56:54 -0700

Hey Laxmi,

First of all please send your email our maillist. Maybe somebody telltheir experiences.

If you use my settings with didnt change any value, your heap size willbe out of memory. That is normal. I have 64 GB Ram on my datanodes. Youshould change my settings for sutiable your computer.


20-10-2013 05:13 tarihinde, A Laxmi yazdı:

Hi Talat!

Update - So I added some of the properties you recommended for tuningand I have some good and bad news.Good news is Region Server was not getting disconnected under heavycrawl (thanks to Talat!!!!) and the bad news is I am getting an Out ofmemory: heap space exception while I was in the 5th crawl iteration.

I have set 8GB for Heap in hbase_env.sh. Not sure why I have this outof memory: heap space issue. Please comment.


Thanks
Laxmi

On Fri, Oct 18, 2013 at 4:16 PM, A Laxmi <[email protected]<mailto:[email protected]>> wrote:


    Thanks Talat! I will try it out again and you will be the first
    person to notify if mine works. I will keep you posted.

    Thanks
    Laxmi


    On Fri, Oct 18, 2013 at 3:52 PM, Talat UYARER
    <[email protected] <mailto:[email protected]>> wrote:

        For the issue, I didnt see any problem. You need some property
        for tuning. But it is not our subject. I share my
        hbase-site.xml. If you dont wrong remember. This issue related
        with only hbase.

        <configuration>
                <property>
        <name>hbase.rootdir</name>
        <value>hdfs://hdpnn01.secret.local:8080/hbase</value>

                </property>
                <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
                </property>
                <property>
        <name>hbase.hregion.max.filesize</name>
        <value>10737418240</value>

                </property>
                <property>
        <name>hbase.zookeeper.quorum</name>
        
<value>hdpzk01.secret.local,hdpzk02.secret.local,hdpzk03.secret.local</value>
                </property>

                <property>
        <name>hbase.client.scanner.caching</name>
                        <value>200</value>
                </property>
                <property>
        <name>hbase.client.scanner.timeout.period</name>
        <value>120000</value>
                </property>
                <property>
        <name>hbase.regionserver.lease.period</name>
                    <value>900000</value>
              </property>
              <property>
        <name>hbase.rpc.timeout</name>
                    <value>900000</value>
              </property>


                <property>
        <name>dfs.support.append</name>
        <value>true</value>
                </property>
              <property>
        <name>hbase.hregion.memstore.mslab.enabled</name>

                    <value>true</value>
              </property>
                <property>
        <name>hbase.regionserver.handler.count</name>
                        <value>20</value>
                </property>
                <property>
        <name>hbase.client.write.buffer</name>
        <value>20971520</value>
                </property>


        18-10-2013 20:34 tarihinde, A Laxmi yazdı:

        Thanks, Talat! I will try with those properties you
        recommended. Please look at my other properties here and
        please let me know your comments.

        HBase: 0.90.6
        Hadoop: 0.20.205.0
        Nutch: 2.2.1

        Note: I have set 8GB for heap size in *hbase/conf/hbase-env.sh*

        =========================================
        Below is my *hbase/conf/hbase-site.xml:
        =========================================
        <property>
        <name>hbase.rootdir</name>
        <value>hdfs://localhost:8020/hbase</value>
          </property>
        <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
               </property>
               <property>
        <name>hbase.zookeeper.quorum</name>
        <value>localhost</value>
               </property>
              <property>
        <name>dfs.replication</name>
        <value>1</value>
              </property>
              <property>
        <name>hbase.zookeeper.property.clientPort</name>
        <value>2181</value>
              </property>
              <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/home/hadoop/hbase-0.90.6/zookeeper</value>
               </property>
        <property>
        <name>zookeeper.session.timeout</name>
        <value>60000</value>
        </property>

        ========================================
        *
        Below is my*hadoop/conf/hdfs-site.xml
        *
        *========================================

        <property>
        <name>dfs.support.append</name>
        <value>true</value>
        </property>
         <property>
        <name>dfs.replication</name>
            <value>1</value>
          </property>
          <property>
        <name>dfs.safemode.extension</name>
            <value>0</value>
          </property>
          <property>
        <name>dfs.safemode.min.datanodes</name>
             <value>1</value>
          </property>
          <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
          </property>
          <property>
        <name>dfs.permissions</name>
        <value>false</value>
          </property>
          <property>
        <name>dfs.safemode.min.datanodes</name>
             <value>1</value>*
        *</property>
          <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
          </property>
          <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/dfs/tmp</value>
          </property>

        ==================================
        *
        Below is my *hadoop/conf/core-site.xml*
        ==================================
        *<property>
            <name>fs.default.name <http://fs.default.name></name>
        <!--    <value>hdfs://0.0.0.0:8020
        <http://0.0.0.0:8020></value> -->
        <value>hdfs://localhost:8020</value>
          </property>*

        ===================================
        Below is my*hadoop/conf/**mapred-site.xml*
        ===================================
        *<configuration>
        <property>
        <name>mapred.job.tracker</name>
            <value>0.0.0.0:8021 <http://0.0.0.0:8021></value>
          </property>

        <property>
        <name>mapred.task.timeout</name>
        <value>3600000</value>
          </property>

        </configuration>*
        ====================================
        *
        *


        On Fri, Oct 18, 2013 at 12:01 PM, Talat UYARER
        <[email protected] <mailto:[email protected]>> wrote:

            Ooh no :(. Sorry Laxmi. I had same issue. I said wrong
            settings :) I should set up

            - hbase.client.scanner.caching (My value: 200)
            - hbase.regionserver.handler.count (My Value: 20)
            - hbase.client.write.buffer (My Value:20971520)

            You should set your region load. It will be solved.

            Talat


            18-10-2013 17:52 tarihinde, A Laxmi yazdı:

            Hi Talat,
            I am sorry to say - I have not fixed it yet. I have
            spent literally sleepless nights to debug that issue. No
            matter what I do, RegionServer always used to get
            disconnected. :(

            Since now I have a deadline in two days, I will go with
            your advise to one of my email to use HBase *standalone
            *mode since I am crawling about 10 urls to achieve about
            300,000 urls. Once I get that done, I will retry
            debugging the regionserver issue. I remember you use
            Hadoop 1.x and not 0.20.205.0 like me, I am not sure if
            there is any bug with I am using?

            Thanks,
            Laxmi


            On Fri, Oct 18, 2013 at 10:47 AM, Talat UYARER
            <[email protected]
            <mailto:[email protected]>> wrote:

                Hi Laxmi,
                You are welcome. I know very well that feel. I
                havent used Cloudera. I use pure install
                hadoop-hbase cluster base on Centos. I am happy
                because of you fixed your issue.

                Talat


                18-10-2013 17:36 tarihinde, A Laxmi yazdı:

                Thanks for the article, Talat! It was so annoying
                to see RegionSever getting disconnected under heavy
                load while everything works. Have you used Cloudera
                for Nutch?


                On Thu, Oct 17, 2013 at 6:50 PM, Talat UYARER
                <[email protected]
                <mailto:[email protected]>> wrote:

                    Hi Laxmi,

                    It didnt come me. I understand, your
                    RegionServer is gone away. Because of this
                    problem is your Hbase heap size or xceivers
                    count is not enough. I had same issue. I set up
                    my xceivers count. I am not sure what it will
                    be count.But you should check your heaps size
                    usage. If it is enough you can set up your
                    property. This article[1] very good about this
                    property.

                    [1]
                    http://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/

                    Talat

                    17-10-2013 22:20 tarihinde, A Laxmi yazdı:

                    Hi Talat,

                    I hope all is well. Could you please advise me
                    on this issue?

                    Thanks for your help!

                    ---------- Forwarded message ----------
                    From: *A Laxmi* <[email protected]
                    <mailto:[email protected]>>
                    Date: Tue, Oct 15, 2013 at 12:07 PM
                    Subject: HBase Pseudo mode - RegionServer
                    disconnects after some time
                    To: [email protected]
                    <mailto:[email protected]>


                    Hi -

                    Please find the below log of HBase-master. I
                    have tried all sorts of fixes mentioned in
                    various threads yet I could not overcome this
                    issue. I made sure I dont have 127.0.1.1 in
                    /etc/hosts file. I pinged my localhost
                    (hostname) which gives back the actual IP and
                    not 127.0.0.1 using ping -c 1 localhost. I
                    have 'localhost' in my /etc/hostname and
                    actual IP address mapped to
                    localhost.localdomain and localhost as alias -
                    something like

                    /etc/hosts -

                    192.***.*.*** localhost.localdomain localhost

                    /etc/hostname -

                    localhost

                    I am using *Hadoop 0.20.205.0 and HBase 0.90.6
                    in Pseudo mode* for storing crawled data from
                    a crawler - Apache Nutch 2.2.1. I can start
                    Hadoop and HBase and when I do jps it shows
                    all good, then after that when I start Nutch
                    crawl after about 40mins of crawling or so, I
                    can see Nutch hanging up while in about 4th
                    iteration of parsing and at the same time when
                    I do jps in HBase, I can see everything except
                    HRegionServer. Below is the log.

                    I tried all possible ways but couldn't
                    overcome this issue. I really need someone
                    from HBase list to help me with this issue.


                    2013-10-15 02:02:08,285 DEBUG
                    org.apache.hadoop.hbase.regionserver.wal.HLogSplitter:
                    Pushed=56 entries from
                    hdfs://localhost:8020/hbase/.logs/127.0.0.1
                    <http://127.0.0.1>,60020,1381814216471/127.0.0.1
                    <http://127.0.0.1>%3A60020.1381816329235
                    2013-10-15 02:02:08,285 DEBUG
                    org.apache.hadoop.hbase.regionserver.wal.HLogSplitter:
                    Splitting hlog 28 of 29:
                    hdfs://localhost:8020/hbase/.logs/127.0.0.1
                    <http://127.0.0.1>,60020,1381814216471/127.0.0.1
                    <http://127.0.0.1>%3A60020.1381816367672,
                    length=64818440
                    2013-10-15 02:02:08,285 WARN
                    org.apache.hadoop.hbase.util.FSUtils: Running
                    on HDFS without append enabled may result in
                    data loss
                    2013-10-15 02:02:08,554 DEBUG
                    org.apache.*hadoop.hbase.master.HMaster: Not
                    running balancer because processing dead
                    regionserver(s): [127.0.0.1,60020*,1381814216471]
                    2013-10-15 02:02:08,556 INFO
                    org.apache.hadoop.hbase.catalo*g.CatalogTracker:
                    Failed verification of .META.,,1 at
                    address=127.0.0.1:60020
                    <http://127.0.0.1:60020>;
                    java.net.ConnectException: Connection refused*
                    2013-10-15 02:02:08,559 INFO
                    org.apache.hadoop.hbase.catalog.*CatalogTracker:
                    Current cached META location is not valid*,
                    resetting
                    2013-10-15 02:02:08,601 WARN
                    org.apache.hadoop.*hbase.master.CatalogJanitor: Failed
                    scan of catalog table
                    org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException:
                    Timed out (2147483647ms)*
                            at
                    
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:390)
                            at
                    
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:422)
                            at
                    
org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:255)
                            at
                    
org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:237)
                            at
                    
org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:120)
                            at
                    
org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:88)
                            at
                    org.apache.hadoop.hbase.Chore.run(Chore.java:66)
                    2013-10-15 02:02:08,842 INFO
                    
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter:
                    syncFs -- HDFS-200 -- not available,
                    dfs.support.append=false
                    2013-10-15 02:02:08,842 DEBUG
                    org.apache.hadoop.hbase.regionserver.wal.HLogSplitter:
                    Creating writer
                    
path=hdfs://localhost:8020/hbase/1_webpage/853ef78be7c0853208e865a9ff13d5fb/recovered.edits/0000000000000001556.tempregion=853ef78be7c0853208e865a9ff13d5fb
                    2013-10-15 02:02:09,443 DEBUG
                    org.apache.hadoop.hbase.regionserver.wal.HLogSplitter:
                    Pushed=39 entries from
                    hdfs://localhost:8020/hbase/.logs/127.0.0.1
                    <http://127.0.0.1>,60020,1381814216471/127.0.0.1
                    <http://127.0.0.1>%3A60020.1381816367672
                    2013-10-15 02:02:09,444 DEBUG
                    org.apache.hadoop.hbase.regionserver.wal.HLogSplitter:
                    Splitting hlog 29 of 29:
                    hdfs://localhost:8020/hbase/.logs/127.0.0.1
                    <http://127.0.0.1>,60020,1381814216471/127.0.0.1
                    <http://127.0.0.1>%3A60020.1381816657239, length=0

                    Thanks for your help!

Re: Fwd: HBase Pseudo mode - RegionServer disconnects after some time

Reply via email to