Re: Fwd: HBase Pseudo mode - RegionServer disconnects after some time

A Laxmi Sun, 20 Oct 2013 18:31:53 -0700

Hi Talat -

Since I am running HBase Pseudo mode for Nutch, I have changed some of your
properties like *hbase.client.scanner.caching* to 1 and *
hbase.regionserver.handler.count* to 10 and *hbase.client.write.buffer* to
2097152. I emailed hbase user list with the out of memory error hoping to
find some help. I don't understand why crawling goes on fine until like 5th
iteration and in 5th iteration parsing stage of some urls, nutch just hangs
with an out of memory error: heap space. I have a 6GB RAM but I am not
crawling a million records, as a test run I am just trying to crawl a url
with depth 7 and topN 1000. I am not sure what can be done in this case.


Thanks for your help,
Laxmi


On Sun, Oct 20, 2013 at 3:55 AM, Talat UYARER <[email protected]>wrote:

> Hey Laxmi,
>
> First of all please send your email our maillist. Maybe somebody tell
> their experiences.
>
> If you use my settings with didnt change any value, your heap size will be
> out of memory. That is normal. I have 64 GB Ram on my datanodes. You should
> change my settings for sutiable your computer.
>
> 20-10-2013 05:13 tarihinde, A Laxmi yazdı:
>
>> Hi Talat!
>>
>> Update - So I added some of the properties you recommended for tuning and
>> I have some good and bad news.
>> Good news is Region Server was not getting disconnected under heavy crawl
>> (thanks to Talat!!!!) and the bad news is I am getting an Out of memory:
>> heap space exception while I was in the 5th crawl iteration.
>>
>> I have set 8GB for Heap in hbase_env.sh. Not sure why I have this out of
>> memory: heap space issue. Please comment.
>>
>> Thanks
>> Laxmi
>>
>>
>>
>>
>> On Fri, Oct 18, 2013 at 4:16 PM, A Laxmi <[email protected] <mailto:
>> [email protected]**>> wrote:
>>
>>     Thanks Talat! I will try it out again and you will be the first
>>     person to notify if mine works. I will keep you posted.
>>
>>     Thanks
>>     Laxmi
>>
>>
>>     On Fri, Oct 18, 2013 at 3:52 PM, Talat UYARER
>>     <[email protected] 
>> <mailto:talat.uyarer@agmlab.**com<[email protected]>>>
>> wrote:
>>
>>         For the issue, I didnt see any problem. You need some property
>>         for tuning. But it is not our subject. I share my
>>         hbase-site.xml. If you dont wrong remember. This issue related
>>         with only hbase.
>>
>>         <configuration>
>>                 <property>
>>         <name>hbase.rootdir</name>
>>         <value>hdfs://hdpnn01.secret.**local:8080/hbase</value>
>>
>>                 </property>
>>                 <property>
>>         <name>hbase.cluster.**distributed</name>
>>         <value>true</value>
>>                 </property>
>>                 <property>
>>         <name>hbase.hregion.max.**filesize</name>
>>         <value>10737418240</value>
>>
>>                 </property>
>>                 <property>
>>         <name>hbase.zookeeper.quorum</**name>
>>         <value>hdpzk01.secret.local,**hdpzk02.secret.local,hdpzk03.**
>> secret.local</value>
>>                 </property>
>>
>>                 <property>
>>         <name>hbase.client.scanner.**caching</name>
>>                         <value>200</value>
>>                 </property>
>>                 <property>
>>         <name>hbase.client.scanner.**timeout.period</name>
>>         <value>120000</value>
>>                 </property>
>>                 <property>
>>         <name>hbase.regionserver.**lease.period</name>
>>                     <value>900000</value>
>>               </property>
>>               <property>
>>         <name>hbase.rpc.timeout</name>
>>                     <value>900000</value>
>>               </property>
>>
>>
>>                 <property>
>>         <name>dfs.support.append</**name>
>>         <value>true</value>
>>                 </property>
>>               <property>
>>         <name>hbase.hregion.memstore.**mslab.enabled</name>
>>
>>                     <value>true</value>
>>               </property>
>>                 <property>
>>         <name>hbase.regionserver.**handler.count</name>
>>                         <value>20</value>
>>                 </property>
>>                 <property>
>>         <name>hbase.client.write.**buffer</name>
>>         <value>20971520</value>
>>                 </property>
>>
>>
>>         18-10-2013 20:34 tarihinde, A Laxmi yazdı:
>>
>>>         Thanks, Talat! I will try with those properties you
>>>         recommended. Please look at my other properties here and
>>>         please let me know your comments.
>>>
>>>         HBase: 0.90.6
>>>         Hadoop: 0.20.205.0
>>>         Nutch: 2.2.1
>>>
>>>         Note: I have set 8GB for heap size in *hbase/conf/hbase-env.sh*
>>>
>>>         ==============================**===========
>>>         Below is my *hbase/conf/hbase-site.xml:
>>>
>>>         ==============================**===========
>>>         <property>
>>>         <name>hbase.rootdir</name>
>>>         <value>hdfs://localhost:8020/**hbase</value>
>>>           </property>
>>>         <property>
>>>         <name>hbase.cluster.**distributed</name>
>>>         <value>true</value>
>>>                </property>
>>>                <property>
>>>         <name>hbase.zookeeper.quorum</**name>
>>>         <value>localhost</value>
>>>                </property>
>>>               <property>
>>>         <name>dfs.replication</name>
>>>         <value>1</value>
>>>               </property>
>>>               <property>
>>>         <name>hbase.zookeeper.**property.clientPort</name>
>>>         <value>2181</value>
>>>               </property>
>>>               <property>
>>>         <name>hbase.zookeeper.**property.dataDir</name>
>>>         <value>/home/hadoop/hbase-0.**90.6/zookeeper</value>
>>>                </property>
>>>         <property>
>>>         <name>zookeeper.session.**timeout</name>
>>>         <value>60000</value>
>>>         </property>
>>>
>>>         ==============================**==========
>>>         *
>>>         Below is my*hadoop/conf/hdfs-site.xml
>>>         *
>>>         *=============================**===========
>>>
>>>
>>>         <property>
>>>         <name>dfs.support.append</**name>
>>>         <value>true</value>
>>>         </property>
>>>          <property>
>>>         <name>dfs.replication</name>
>>>             <value>1</value>
>>>           </property>
>>>           <property>
>>>         <name>dfs.safemode.extension</**name>
>>>             <value>0</value>
>>>           </property>
>>>           <property>
>>>         <name>dfs.safemode.min.**datanodes</name>
>>>              <value>1</value>
>>>           </property>
>>>           <property>
>>>         <name>dfs.permissions.enabled<**/name>
>>>         <value>false</value>
>>>           </property>
>>>           <property>
>>>         <name>dfs.permissions</name>
>>>         <value>false</value>
>>>           </property>
>>>           <property>
>>>         <name>dfs.safemode.min.**datanodes</name>
>>>              <value>1</value>*
>>>         *</property>
>>>
>>>           <property>
>>>         <name>dfs.webhdfs.enabled</**name>
>>>         <value>true</value>
>>>           </property>
>>>           <property>
>>>         <name>hadoop.tmp.dir</name>
>>>         <value>/home/hadoop/dfs/tmp</**value>
>>>           </property>
>>>
>>>         ==============================**====
>>>         *
>>>         Below is my *hadoop/conf/core-site.xml*
>>>         ==============================**====
>>>         *<property>
>>>             <name>fs.default.name <http://fs.default.name></**name>
>>>
>>>         <!--    <value>hdfs://0.0.0.0:8020
>>>         <http://0.0.0.0:8020></value> -->
>>>
>>>         <value>hdfs://localhost:8020</**value>
>>>           </property>*
>>>
>>>         ==============================**=====
>>>         Below is my*hadoop/conf/**mapred-site.**xml*
>>>         ==============================**=====
>>>         *<configuration>
>>>         <property>
>>>         <name>mapred.job.tracker</**name>
>>>             <value>0.0.0.0:8021 <http://0.0.0.0:8021></value>
>>>
>>>           </property>
>>>
>>>         <property>
>>>         <name>mapred.task.timeout</**name>
>>>         <value>3600000</value>
>>>           </property>
>>>
>>>         </configuration>*
>>>         ==============================**======
>>>         *
>>>
>>>         *
>>>
>>>
>>>         On Fri, Oct 18, 2013 at 12:01 PM, Talat UYARER
>>>         <[email protected] 
>>> <mailto:talat.uyarer@agmlab.**com<[email protected]>>>
>>> wrote:
>>>
>>>             Ooh no :(. Sorry Laxmi. I had same issue. I said wrong
>>>             settings :) I should set up
>>>
>>>             - hbase.client.scanner.caching (My value: 200)
>>>             - hbase.regionserver.handler.**count (My Value: 20)
>>>             - hbase.client.write.buffer (My Value:20971520)
>>>
>>>             You should set your region load. It will be solved.
>>>
>>>             Talat
>>>
>>>
>>>             18-10-2013 17:52 tarihinde, A Laxmi yazdı:
>>>
>>>>             Hi Talat,
>>>>             I am sorry to say - I have not fixed it yet. I have
>>>>             spent literally sleepless nights to debug that issue. No
>>>>             matter what I do, RegionServer always used to get
>>>>             disconnected. :(
>>>>
>>>>             Since now I have a deadline in two days, I will go with
>>>>             your advise to one of my email to use HBase *standalone
>>>>             *mode since I am crawling about 10 urls to achieve about
>>>>
>>>>             300,000 urls. Once I get that done, I will retry
>>>>             debugging the regionserver issue. I remember you use
>>>>             Hadoop 1.x and not 0.20.205.0 like me, I am not sure if
>>>>             there is any bug with I am using?
>>>>
>>>>             Thanks,
>>>>             Laxmi
>>>>
>>>>
>>>>             On Fri, Oct 18, 2013 at 10:47 AM, Talat UYARER
>>>>             <[email protected]
>>>>             <mailto:talat.uyarer@agmlab.**com <[email protected]>>>
>>>> wrote:
>>>>
>>>>                 Hi Laxmi,
>>>>                 You are welcome. I know very well that feel. I
>>>>                 havent used Cloudera. I use pure install
>>>>                 hadoop-hbase cluster base on Centos. I am happy
>>>>                 because of you fixed your issue.
>>>>
>>>>                 Talat
>>>>
>>>>
>>>>                 18-10-2013 17:36 tarihinde, A Laxmi yazdı:
>>>>
>>>>>                 Thanks for the article, Talat! It was so annoying
>>>>>                 to see RegionSever getting disconnected under heavy
>>>>>                 load while everything works. Have you used Cloudera
>>>>>                 for Nutch?
>>>>>
>>>>>
>>>>>                 On Thu, Oct 17, 2013 at 6:50 PM, Talat UYARER
>>>>>                 <[email protected]
>>>>>                 
>>>>> <mailto:talat.uyarer@agmlab.**com<[email protected]>>>
>>>>> wrote:
>>>>>
>>>>>                     Hi Laxmi,
>>>>>
>>>>>                     It didnt come me. I understand, your
>>>>>                     RegionServer is gone away. Because of this
>>>>>                     problem is your Hbase heap size or xceivers
>>>>>                     count is not enough. I had same issue. I set up
>>>>>                     my xceivers count. I am not sure what it will
>>>>>                     be count.But you should check your heaps size
>>>>>                     usage. If it is enough you can set up your
>>>>>                     property. This article[1] very good about this
>>>>>                     property.
>>>>>
>>>>>                     [1]
>>>>>                     http://blog.cloudera.com/blog/**
>>>>> 2012/03/hbase-hadoop-xceivers/<http://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/>
>>>>>
>>>>>                     Talat
>>>>>
>>>>>                     17-10-2013 22:20 tarihinde, A Laxmi yazdı:
>>>>>
>>>>>>                     Hi Talat,
>>>>>>
>>>>>>                     I hope all is well. Could you please advise me
>>>>>>                     on this issue?
>>>>>>
>>>>>>                     Thanks for your help!
>>>>>>
>>>>>>                     ---------- Forwarded message ----------
>>>>>>                     From: *A Laxmi* <[email protected]
>>>>>>                     <mailto:[email protected]**>>
>>>>>>                     Date: Tue, Oct 15, 2013 at 12:07 PM
>>>>>>                     Subject: HBase Pseudo mode - RegionServer
>>>>>>                     disconnects after some time
>>>>>>                     To: [email protected]
>>>>>>                     <mailto:[email protected]>
>>>>>>
>>>>>>
>>>>>>                     Hi -
>>>>>>
>>>>>>                     Please find the below log of HBase-master. I
>>>>>>                     have tried all sorts of fixes mentioned in
>>>>>>                     various threads yet I could not overcome this
>>>>>>                     issue. I made sure I dont have 127.0.1.1 in
>>>>>>                     /etc/hosts file. I pinged my localhost
>>>>>>                     (hostname) which gives back the actual IP and
>>>>>>                     not 127.0.0.1 using ping -c 1 localhost. I
>>>>>>                     have 'localhost' in my /etc/hostname and
>>>>>>                     actual IP address mapped to
>>>>>>                     localhost.localdomain and localhost as alias -
>>>>>>                     something like
>>>>>>
>>>>>>                     /etc/hosts -
>>>>>>
>>>>>>                     192.***.*.*** localhost.localdomain localhost
>>>>>>
>>>>>>                     /etc/hostname -
>>>>>>
>>>>>>                     localhost
>>>>>>
>>>>>>                     I am using *Hadoop 0.20.205.0 and HBase 0.90.6
>>>>>>                     in Pseudo mode* for storing crawled data from
>>>>>>
>>>>>>                     a crawler - Apache Nutch 2.2.1. I can start
>>>>>>                     Hadoop and HBase and when I do jps it shows
>>>>>>                     all good, then after that when I start Nutch
>>>>>>                     crawl after about 40mins of crawling or so, I
>>>>>>                     can see Nutch hanging up while in about 4th
>>>>>>                     iteration of parsing and at the same time when
>>>>>>                     I do jps in HBase, I can see everything except
>>>>>>                     HRegionServer. Below is the log.
>>>>>>
>>>>>>                     I tried all possible ways but couldn't
>>>>>>                     overcome this issue. I really need someone
>>>>>>                     from HBase list to help me with this issue.
>>>>>>
>>>>>>
>>>>>>                     2013-10-15 02:02:08,285 DEBUG
>>>>>>                     org.apache.hadoop.hbase.**
>>>>>> regionserver.wal.HLogSplitter:
>>>>>>                     Pushed=56 entries from
>>>>>>                     hdfs://localhost:8020/hbase/.**logs/127.0.0.1
>>>>>>                     <http://127.0.0.1>,60020,**1381814216471/
>>>>>> 127.0.0.1
>>>>>>                     <http://127.0.0.1>%3A60020.**1381816329235
>>>>>>
>>>>>>                     2013-10-15 02:02:08,285 DEBUG
>>>>>>                     org.apache.hadoop.hbase.**
>>>>>> regionserver.wal.HLogSplitter:
>>>>>>                     Splitting hlog 28 of 29:
>>>>>>                     hdfs://localhost:8020/hbase/.**logs/127.0.0.1
>>>>>>                     <http://127.0.0.1>,60020,**1381814216471/
>>>>>> 127.0.0.1
>>>>>>                     <http://127.0.0.1>%3A60020.**1381816367672,
>>>>>>
>>>>>>                     length=64818440
>>>>>>                     2013-10-15 02:02:08,285 WARN
>>>>>>                     org.apache.hadoop.hbase.util.**FSUtils: Running
>>>>>>                     on HDFS without append enabled may result in
>>>>>>                     data loss
>>>>>>                     2013-10-15 02:02:08,554 DEBUG
>>>>>>                     org.apache.*hadoop.hbase.**master.HMaster: Not
>>>>>>
>>>>>>                     running balancer because processing dead
>>>>>>                     regionserver(s): [127.0.0.1,60020*,**
>>>>>> 1381814216471]
>>>>>>
>>>>>>                     2013-10-15 02:02:08,556 INFO
>>>>>>                     org.apache.hadoop.hbase.**
>>>>>> catalo*g.CatalogTracker:
>>>>>>
>>>>>>                     Failed verification of .META.,,1 at
>>>>>>                     address=127.0.0.1:60020
>>>>>>                     <http://127.0.0.1:60020>;
>>>>>>                     java.net.ConnectException: Connection refused*
>>>>>>
>>>>>>                     2013-10-15 02:02:08,559 INFO
>>>>>>                     org.apache.hadoop.hbase.**
>>>>>> catalog.*CatalogTracker:
>>>>>>                     Current cached META location is not valid*,
>>>>>>
>>>>>>                     resetting
>>>>>>                     2013-10-15 02:02:08,601 WARN
>>>>>>                     org.apache.hadoop.*hbase.**master.CatalogJanitor:
>>>>>> Failed
>>>>>>
>>>>>>                     scan of catalog table
>>>>>>                     org.apache.hadoop.hbase.**
>>>>>> NotAllMetaRegionsOnlineExcepti**on:
>>>>>>                     Timed out (2147483647ms)*
>>>>>>
>>>>>>                             at
>>>>>>                     org.apache.hadoop.hbase.**catalog.CatalogTracker.
>>>>>> **waitForMeta(CatalogTracker.**java:390)
>>>>>>                             at
>>>>>>                     org.apache.hadoop.hbase.**catalog.CatalogTracker.
>>>>>> **waitForMetaServerConnectionDef**ault(CatalogTracker.java:422)
>>>>>>                             at
>>>>>>                     org.apache.hadoop.hbase.**
>>>>>> catalog.MetaReader.fullScan(**MetaReader.java:255)
>>>>>>                             at
>>>>>>                     org.apache.hadoop.hbase.**
>>>>>> catalog.MetaReader.fullScan(**MetaReader.java:237)
>>>>>>                             at
>>>>>>                     org.apache.hadoop.hbase.**
>>>>>> master.CatalogJanitor.scan(**CatalogJanitor.java:120)
>>>>>>                             at
>>>>>>                     org.apache.hadoop.hbase.**
>>>>>> master.CatalogJanitor.chore(**CatalogJanitor.java:88)
>>>>>>                             at
>>>>>>                     org.apache.hadoop.hbase.Chore.**
>>>>>> run(Chore.java:66)
>>>>>>                     2013-10-15 02:02:08,842 INFO
>>>>>>                     org.apache.hadoop.hbase.**regionserver.wal.**
>>>>>> SequenceFileLogWriter:
>>>>>>                     syncFs -- HDFS-200 -- not available,
>>>>>>                     dfs.support.append=false
>>>>>>                     2013-10-15 02:02:08,842 DEBUG
>>>>>>                     org.apache.hadoop.hbase.**
>>>>>> regionserver.wal.HLogSplitter:
>>>>>>                     Creating writer
>>>>>>                     path=hdfs://localhost:8020/**hbase/1_webpage/**
>>>>>> 853ef78be7c0853208e865a9ff13d5**fb/recovered.edits/**
>>>>>> 0000000000000001556.**tempregion=**853ef78be7c0853208e865a9ff13d5**fb
>>>>>>                     2013-10-15 02:02:09,443 DEBUG
>>>>>>                     org.apache.hadoop.hbase.**
>>>>>> regionserver.wal.HLogSplitter:
>>>>>>                     Pushed=39 entries from
>>>>>>                     hdfs://localhost:8020/hbase/.**logs/127.0.0.1
>>>>>>                     <http://127.0.0.1>,60020,**1381814216471/
>>>>>> 127.0.0.1
>>>>>>                     <http://127.0.0.1>%3A60020.**1381816367672
>>>>>>
>>>>>>                     2013-10-15 02:02:09,444 DEBUG
>>>>>>                     org.apache.hadoop.hbase.**
>>>>>> regionserver.wal.HLogSplitter:
>>>>>>                     Splitting hlog 29 of 29:
>>>>>>                     hdfs://localhost:8020/hbase/.**logs/127.0.0.1
>>>>>>                     <http://127.0.0.1>,60020,**1381814216471/
>>>>>> 127.0.0.1
>>>>>>                     <http://127.0.0.1>%3A60020.**1381816657239,
>>>>>> length=0
>>>>>>
>>>>>>                     Thanks for your help!
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>

Re: Fwd: HBase Pseudo mode - RegionServer disconnects after some time

Reply via email to