Hi Talat! Thanks so much for the explanation. I got the basic definitions from the hbase documentation but couldn't get a whole lot to understand what is going on with some of the properties. Your explanation does help. I will retry and will keep you posted. Thanks so much for all your help!
Regards Laxmi On Mon, Oct 21, 2013 at 2:08 PM, Talat UYARER <[email protected]>wrote: > Hi Laxmi, > > Sorry for my late reply. is Your total ram 6 GB ? If that is total memory > of computer. Your computer use that. You should decrease your heap size. > You can read every property from the Hbase book. But I try to explain why > use this property. > > - You should increase your *hbase.client.scanner.caching* property from 1 > to 20. Now Gora has not filter method. Now in Nutch 2.x, every row getting > from habse for GeneratorJob or FetchJob. That is tiring If you increase > that property as much as possible, That provide less connection every > map/reduce. > > - *hbase.regionserver.handler.**count*, Count of RPC Listener instances > spun up on RegionServers. > Little Note : The importance of reducing the number of separate RPC calls > is tied to the round-trip time, which is the time it takes for a client to > send a request and the server to send a response over the network. This > does not include the time required for the data transfer. It simply is the > overhead of sending packages over the wire. On average, these take about > 1ms on a LAN, which means you can handle 1,000 round-trips per second only. > The other important factor is the message size: if you send large requests > over the network, you already need a much lower number of round-trips, as > most of the time is spent transferring data. But when doing, for example, > counter increments, which are small in size, you will see better > performance when batching updates into fewer requests. (From: Hbase > Definitive Guide page 86.) > > - *hbase.client.write.buffer*, Default size of the HTable client write > buffer in bytes. A bigger buffer takes more memory -- on both the client > and server side since server instantiates the passed write buffer to > process it -- but a larger buffer size reduces the number of RPCs made. For > an estimate of server-side memory-used, your regionserver in same server, > evaluate hbase.client.write.buffer * hbase.regionserver.handler.**count * > regionserver count * regionserver count. > > In my opion, Your problem *hbase.client.scanner.caching***. If you > increase that. Your problem will be solved. If it wont be solved, you can > try other properties. > > Have a nice day, > Talat > > 21-10-2013 04:31 tarihinde, A Laxmi yazdı: > >> Hi Talat - >> >> Since I am running HBase Pseudo mode for Nutch, I have changed some of >> your >> properties like *hbase.client.scanner.caching* to 1 and * >> hbase.regionserver.handler.**count* to 10 and >> *hbase.client.write.buffer* to >> >> 2097152. I emailed hbase user list with the out of memory error hoping to >> find some help. I don't understand why crawling goes on fine until like >> 5th >> iteration and in 5th iteration parsing stage of some urls, nutch just >> hangs >> with an out of memory error: heap space. I have a 6GB RAM but I am not >> crawling a million records, as a test run I am just trying to crawl a url >> with depth 7 and topN 1000. I am not sure what can be done in this case. >> >> Thanks for your help, >> Laxmi >> >> >> On Sun, Oct 20, 2013 at 3:55 AM, Talat UYARER <[email protected]>** >> wrote: >> >> Hey Laxmi, >>> >>> First of all please send your email our maillist. Maybe somebody tell >>> their experiences. >>> >>> If you use my settings with didnt change any value, your heap size will >>> be >>> out of memory. That is normal. I have 64 GB Ram on my datanodes. You >>> should >>> change my settings for sutiable your computer. >>> >>> 20-10-2013 05:13 tarihinde, A Laxmi yazdı: >>> >>> Hi Talat! >>>> >>>> Update - So I added some of the properties you recommended for tuning >>>> and >>>> I have some good and bad news. >>>> Good news is Region Server was not getting disconnected under heavy >>>> crawl >>>> (thanks to Talat!!!!) and the bad news is I am getting an Out of memory: >>>> heap space exception while I was in the 5th crawl iteration. >>>> >>>> I have set 8GB for Heap in hbase_env.sh. Not sure why I have this out of >>>> memory: heap space issue. Please comment. >>>> >>>> Thanks >>>> Laxmi >>>> >>>> >>>> >>>> >>>> On Fri, Oct 18, 2013 at 4:16 PM, A Laxmi <[email protected]<mailto: >>>> [email protected]**>> wrote: >>>> >>>> Thanks Talat! I will try it out again and you will be the first >>>> person to notify if mine works. I will keep you posted. >>>> >>>> Thanks >>>> Laxmi >>>> >>>> >>>> On Fri, Oct 18, 2013 at 3:52 PM, Talat UYARER >>>> <[email protected] <mailto:talat.uyarer@agmlab.****com< >>>> [email protected]>>> >>>> >>>> wrote: >>>> >>>> For the issue, I didnt see any problem. You need some property >>>> for tuning. But it is not our subject. I share my >>>> hbase-site.xml. If you dont wrong remember. This issue related >>>> with only hbase. >>>> >>>> <configuration> >>>> <property> >>>> <name>hbase.rootdir</name> >>>> <value>hdfs://hdpnn01.secret.****local:8080/hbase</value> >>>> >>>> </property> >>>> <property> >>>> <name>hbase.cluster.****distributed</name> >>>> >>>> <value>true</value> >>>> </property> >>>> <property> >>>> <name>hbase.hregion.max.****filesize</name> >>>> >>>> <value>10737418240</value> >>>> >>>> </property> >>>> <property> >>>> <name>hbase.zookeeper.quorum</****name> >>>> <value>hdpzk01.secret.local,****hdpzk02.secret.local,hdpzk03.* >>>> *** >>>> secret.local</value> >>>> </property> >>>> >>>> <property> >>>> <name>hbase.client.scanner.****caching</name> >>>> >>>> <value>200</value> >>>> </property> >>>> <property> >>>> <name>hbase.client.scanner.****timeout.period</name> >>>> >>>> <value>120000</value> >>>> </property> >>>> <property> >>>> <name>hbase.regionserver.****lease.period</name> >>>> >>>> <value>900000</value> >>>> </property> >>>> <property> >>>> <name>hbase.rpc.timeout</name> >>>> <value>900000</value> >>>> </property> >>>> >>>> >>>> <property> >>>> <name>dfs.support.append</****name> >>>> >>>> <value>true</value> >>>> </property> >>>> <property> >>>> <name>hbase.hregion.memstore.****mslab.enabled</name> >>>> >>>> >>>> <value>true</value> >>>> </property> >>>> <property> >>>> <name>hbase.regionserver.****handler.count</name> >>>> >>>> <value>20</value> >>>> </property> >>>> <property> >>>> <name>hbase.client.write.****buffer</name> >>>> >>>> <value>20971520</value> >>>> </property> >>>> >>>> >>>> 18-10-2013 20:34 tarihinde, A Laxmi yazdı: >>>> >>>> Thanks, Talat! I will try with those properties you >>>>> recommended. Please look at my other properties here and >>>>> please let me know your comments. >>>>> >>>>> HBase: 0.90.6 >>>>> Hadoop: 0.20.205.0 >>>>> Nutch: 2.2.1 >>>>> >>>>> Note: I have set 8GB for heap size in >>>>> *hbase/conf/hbase-env.sh* >>>>> >>>>> ==============================****=========== >>>>> Below is my *hbase/conf/hbase-site.xml: >>>>> >>>>> ==============================****=========== >>>>> <property> >>>>> <name>hbase.rootdir</name> >>>>> <value>hdfs://localhost:8020/****hbase</value> >>>>> </property> >>>>> <property> >>>>> <name>hbase.cluster.****distributed</name> >>>>> >>>>> <value>true</value> >>>>> </property> >>>>> <property> >>>>> <name>hbase.zookeeper.quorum</****name> >>>>> >>>>> <value>localhost</value> >>>>> </property> >>>>> <property> >>>>> <name>dfs.replication</name> >>>>> <value>1</value> >>>>> </property> >>>>> <property> >>>>> <name>hbase.zookeeper.****property.clientPort</name> >>>>> >>>>> <value>2181</value> >>>>> </property> >>>>> <property> >>>>> <name>hbase.zookeeper.****property.dataDir</name> >>>>> <value>/home/hadoop/hbase-0.****90.6/zookeeper</value> >>>>> </property> >>>>> <property> >>>>> <name>zookeeper.session.****timeout</name> >>>>> <value>60000</value> >>>>> </property> >>>>> >>>>> ==============================****========== >>>>> >>>>> * >>>>> Below is my*hadoop/conf/hdfs-site.xml >>>>> * >>>>> *=============================****=========== >>>>> >>>>> >>>>> <property> >>>>> <name>dfs.support.append</****name> >>>>> >>>>> <value>true</value> >>>>> </property> >>>>> <property> >>>>> <name>dfs.replication</name> >>>>> <value>1</value> >>>>> </property> >>>>> <property> >>>>> <name>dfs.safemode.extension</****name> >>>>> >>>>> <value>0</value> >>>>> </property> >>>>> <property> >>>>> <name>dfs.safemode.min.****datanodes</name> >>>>> >>>>> <value>1</value> >>>>> </property> >>>>> <property> >>>>> <name>dfs.permissions.enabled<****/name> >>>>> >>>>> <value>false</value> >>>>> </property> >>>>> <property> >>>>> <name>dfs.permissions</name> >>>>> <value>false</value> >>>>> </property> >>>>> <property> >>>>> <name>dfs.safemode.min.****datanodes</name> >>>>> >>>>> <value>1</value>* >>>>> *</property> >>>>> >>>>> <property> >>>>> <name>dfs.webhdfs.enabled</****name> >>>>> >>>>> <value>true</value> >>>>> </property> >>>>> <property> >>>>> <name>hadoop.tmp.dir</name> >>>>> <value>/home/hadoop/dfs/tmp</****value> >>>>> </property> >>>>> >>>>> ==============================****==== >>>>> >>>>> * >>>>> Below is my *hadoop/conf/core-site.xml* >>>>> ==============================****==== >>>>> *<property> >>>>> <name>fs.default.name <http://fs.default.name></****name> >>>>> >>>>> >>>>> <!-- <value>hdfs://0.0.0.0:8020 >>>>> <http://0.0.0.0:8020></value> --> >>>>> >>>>> <value>hdfs://localhost:8020</****value> >>>>> </property>* >>>>> >>>>> ==============================****===== >>>>> Below is my*hadoop/conf/**mapred-site.****xml* >>>>> ==============================****===== >>>>> *<configuration> >>>>> <property> >>>>> <name>mapred.job.tracker</****name> >>>>> >>>>> <value>0.0.0.0:8021 <http://0.0.0.0:8021></value> >>>>> >>>>> </property> >>>>> >>>>> <property> >>>>> <name>mapred.task.timeout</****name> >>>>> >>>>> <value>3600000</value> >>>>> </property> >>>>> >>>>> </configuration>* >>>>> ==============================****====== >>>>> >>>>> * >>>>> >>>>> * >>>>> >>>>> >>>>> On Fri, Oct 18, 2013 at 12:01 PM, Talat UYARER >>>>> <[email protected] <mailto:talat.uyarer@agmlab.****com< >>>>> [email protected]>>> >>>>> >>>>> wrote: >>>>> >>>>> Ooh no :(. Sorry Laxmi. I had same issue. I said wrong >>>>> settings :) I should set up >>>>> >>>>> - hbase.client.scanner.caching (My value: 200) >>>>> - hbase.regionserver.handler.****count (My Value: 20) >>>>> >>>>> - hbase.client.write.buffer (My Value:20971520) >>>>> >>>>> You should set your region load. It will be solved. >>>>> >>>>> Talat >>>>> >>>>> >>>>> 18-10-2013 17:52 tarihinde, A Laxmi yazdı: >>>>> >>>>> Hi Talat, >>>>>> I am sorry to say - I have not fixed it yet. I have >>>>>> spent literally sleepless nights to debug that issue. No >>>>>> matter what I do, RegionServer always used to get >>>>>> disconnected. :( >>>>>> >>>>>> Since now I have a deadline in two days, I will go with >>>>>> your advise to one of my email to use HBase *standalone >>>>>> *mode since I am crawling about 10 urls to achieve about >>>>>> >>>>>> 300,000 urls. Once I get that done, I will retry >>>>>> debugging the regionserver issue. I remember you use >>>>>> Hadoop 1.x and not 0.20.205.0 like me, I am not sure if >>>>>> there is any bug with I am using? >>>>>> >>>>>> Thanks, >>>>>> Laxmi >>>>>> >>>>>> >>>>>> On Fri, Oct 18, 2013 at 10:47 AM, Talat UYARER >>>>>> <[email protected] >>>>>> <mailto:talat.uyarer@agmlab.****com < >>>>>> [email protected]>>> >>>>>> >>>>>> wrote: >>>>>> >>>>>> Hi Laxmi, >>>>>> You are welcome. I know very well that feel. I >>>>>> havent used Cloudera. I use pure install >>>>>> hadoop-hbase cluster base on Centos. I am happy >>>>>> because of you fixed your issue. >>>>>> >>>>>> Talat >>>>>> >>>>>> >>>>>> 18-10-2013 17:36 tarihinde, A Laxmi yazdı: >>>>>> >>>>>> Thanks for the article, Talat! It was so annoying >>>>>>> to see RegionSever getting disconnected under heavy >>>>>>> load while everything works. Have you used Cloudera >>>>>>> for Nutch? >>>>>>> >>>>>>> >>>>>>> On Thu, Oct 17, 2013 at 6:50 PM, Talat UYARER >>>>>>> <[email protected] >>>>>>> <mailto:talat.uyarer@agmlab.****com< >>>>>>> [email protected]>>> >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> Hi Laxmi, >>>>>>> >>>>>>> It didnt come me. I understand, your >>>>>>> RegionServer is gone away. Because of this >>>>>>> problem is your Hbase heap size or xceivers >>>>>>> count is not enough. I had same issue. I set up >>>>>>> my xceivers count. I am not sure what it will >>>>>>> be count.But you should check your heaps size >>>>>>> usage. If it is enough you can set up your >>>>>>> property. This article[1] very good about this >>>>>>> property. >>>>>>> >>>>>>> [1] >>>>>>> >>>>>>> http://blog.cloudera.com/blog/****<http://blog.cloudera.com/blog/**> >>>>>>> 2012/03/hbase-hadoop-xceivers/**<http://blog.cloudera.com/** >>>>>>> blog/2012/03/hbase-hadoop-**xceivers/<http://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/> >>>>>>> > >>>>>>> >>>>>>> >>>>>>> Talat >>>>>>> >>>>>>> 17-10-2013 22:20 tarihinde, A Laxmi yazdı: >>>>>>> >>>>>>> Hi Talat, >>>>>>>> >>>>>>>> I hope all is well. Could you please advise me >>>>>>>> on this issue? >>>>>>>> >>>>>>>> Thanks for your help! >>>>>>>> >>>>>>>> ---------- Forwarded message ---------- >>>>>>>> From: *A Laxmi* <[email protected] >>>>>>>> <mailto:[email protected]****>> >>>>>>>> Date: Tue, Oct 15, 2013 at 12:07 PM >>>>>>>> Subject: HBase Pseudo mode - RegionServer >>>>>>>> disconnects after some time >>>>>>>> To: [email protected] >>>>>>>> <mailto:[email protected]> >>>>>>>> >>>>>>>> >>>>>>>> Hi - >>>>>>>> >>>>>>>> Please find the below log of HBase-master. I >>>>>>>> have tried all sorts of fixes mentioned in >>>>>>>> various threads yet I could not overcome this >>>>>>>> issue. I made sure I dont have 127.0.1.1 in >>>>>>>> /etc/hosts file. I pinged my localhost >>>>>>>> (hostname) which gives back the actual IP and >>>>>>>> not 127.0.0.1 using ping -c 1 localhost. I >>>>>>>> have 'localhost' in my /etc/hostname and >>>>>>>> actual IP address mapped to >>>>>>>> localhost.localdomain and localhost as alias - >>>>>>>> something like >>>>>>>> >>>>>>>> /etc/hosts - >>>>>>>> >>>>>>>> 192.***.*.*** localhost.localdomain localhost >>>>>>>> >>>>>>>> /etc/hostname - >>>>>>>> >>>>>>>> localhost >>>>>>>> >>>>>>>> I am using *Hadoop 0.20.205.0 and HBase 0.90.6 >>>>>>>> in Pseudo mode* for storing crawled data from >>>>>>>> >>>>>>>> a crawler - Apache Nutch 2.2.1. I can start >>>>>>>> Hadoop and HBase and when I do jps it shows >>>>>>>> all good, then after that when I start Nutch >>>>>>>> crawl after about 40mins of crawling or so, I >>>>>>>> can see Nutch hanging up while in about 4th >>>>>>>> iteration of parsing and at the same time when >>>>>>>> I do jps in HBase, I can see everything except >>>>>>>> HRegionServer. Below is the log. >>>>>>>> >>>>>>>> I tried all possible ways but couldn't >>>>>>>> overcome this issue. I really need someone >>>>>>>> from HBase list to help me with this issue. >>>>>>>> >>>>>>>> >>>>>>>> 2013-10-15 02:02:08,285 DEBUG >>>>>>>> org.apache.hadoop.hbase.** >>>>>>>> regionserver.wal.HLogSplitter: >>>>>>>> Pushed=56 entries from >>>>>>>> hdfs://localhost:8020/hbase/.****logs/ >>>>>>>> 127.0.0.1 >>>>>>>> <http://127.0.0.1>,60020,****1381814216471/ >>>>>>>> 127.0.0.1 >>>>>>>> <http://127.0.0.1>%3A60020.****1381816329235 >>>>>>>> >>>>>>>> >>>>>>>> 2013-10-15 02:02:08,285 DEBUG >>>>>>>> org.apache.hadoop.hbase.** >>>>>>>> >>>>>>>> regionserver.wal.HLogSplitter: >>>>>>>> Splitting hlog 28 of 29: >>>>>>>> hdfs://localhost:8020/hbase/.****logs/ >>>>>>>> 127.0.0.1 >>>>>>>> <http://127.0.0.1>,60020,****1381814216471/ >>>>>>>> 127.0.0.1 >>>>>>>> <http://127.0.0.1>%3A60020.****1381816367672, >>>>>>>> >>>>>>>> >>>>>>>> length=64818440 >>>>>>>> 2013-10-15 02:02:08,285 WARN >>>>>>>> org.apache.hadoop.hbase.util.****FSUtils: >>>>>>>> Running >>>>>>>> >>>>>>>> on HDFS without append enabled may result in >>>>>>>> data loss >>>>>>>> 2013-10-15 02:02:08,554 DEBUG >>>>>>>> org.apache.*hadoop.hbase.****master.HMaster: >>>>>>>> Not >>>>>>>> >>>>>>>> >>>>>>>> running balancer because processing dead >>>>>>>> regionserver(s): [127.0.0.1,60020*,** >>>>>>>> >>>>>>>> 1381814216471] >>>>>>>> >>>>>>>> 2013-10-15 02:02:08,556 INFO >>>>>>>> org.apache.hadoop.hbase.** >>>>>>>> >>>>>>>> catalo*g.CatalogTracker: >>>>>>>> >>>>>>>> Failed verification of .META.,,1 at >>>>>>>> address=127.0.0.1:60020 >>>>>>>> <http://127.0.0.1:60020>; >>>>>>>> java.net.ConnectException: Connection refused* >>>>>>>> >>>>>>>> 2013-10-15 02:02:08,559 INFO >>>>>>>> org.apache.hadoop.hbase.** >>>>>>>> >>>>>>>> catalog.*CatalogTracker: >>>>>>>> Current cached META location is not valid*, >>>>>>>> >>>>>>>> resetting >>>>>>>> 2013-10-15 02:02:08,601 WARN >>>>>>>> org.apache.hadoop.*hbase.**** >>>>>>>> master.CatalogJanitor: >>>>>>>> >>>>>>>> Failed >>>>>>>> >>>>>>>> scan of catalog table >>>>>>>> org.apache.hadoop.hbase.** >>>>>>>> NotAllMetaRegionsOnlineExcepti****on: >>>>>>>> Timed out (2147483647ms)* >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.**** >>>>>>>> catalog.CatalogTracker. >>>>>>>> **waitForMeta(CatalogTracker.****java:390) >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.**** >>>>>>>> catalog.CatalogTracker. >>>>>>>> ****waitForMetaServerConnectionDef****ault(CatalogTracker.java:** >>>>>>>> 422) >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.** >>>>>>>> catalog.MetaReader.fullScan(****MetaReader.java:255) >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.** >>>>>>>> catalog.MetaReader.fullScan(****MetaReader.java:237) >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.** >>>>>>>> master.CatalogJanitor.scan(****CatalogJanitor.java:120) >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.** >>>>>>>> master.CatalogJanitor.chore(****CatalogJanitor.java:88) >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.Chore.**** >>>>>>>> >>>>>>>> run(Chore.java:66) >>>>>>>> 2013-10-15 02:02:08,842 INFO >>>>>>>> org.apache.hadoop.hbase.**** >>>>>>>> regionserver.wal.** >>>>>>>> >>>>>>>> SequenceFileLogWriter: >>>>>>>> syncFs -- HDFS-200 -- not available, >>>>>>>> dfs.support.append=false >>>>>>>> 2013-10-15 02:02:08,842 DEBUG >>>>>>>> org.apache.hadoop.hbase.** >>>>>>>> regionserver.wal.HLogSplitter: >>>>>>>> Creating writer >>>>>>>> path=hdfs://localhost:8020/**** >>>>>>>> hbase/1_webpage/** >>>>>>>> 853ef78be7c0853208e865a9ff13d5****fb/recovered.edits/** >>>>>>>> 0000000000000001556.****tempregion=**** >>>>>>>> 853ef78be7c0853208e865a9ff13d5****fb >>>>>>>> >>>>>>>> 2013-10-15 02:02:09,443 DEBUG >>>>>>>> org.apache.hadoop.hbase.** >>>>>>>> regionserver.wal.HLogSplitter: >>>>>>>> Pushed=39 entries from >>>>>>>> hdfs://localhost:8020/hbase/.****logs/ >>>>>>>> 127.0.0.1 >>>>>>>> <http://127.0.0.1>,60020,****1381814216471/ >>>>>>>> 127.0.0.1 >>>>>>>> <http://127.0.0.1>%3A60020.****1381816367672 >>>>>>>> >>>>>>>> >>>>>>>> 2013-10-15 02:02:09,444 DEBUG >>>>>>>> org.apache.hadoop.hbase.** >>>>>>>> >>>>>>>> regionserver.wal.HLogSplitter: >>>>>>>> Splitting hlog 29 of 29: >>>>>>>> hdfs://localhost:8020/hbase/.****logs/ >>>>>>>> 127.0.0.1 >>>>>>>> <http://127.0.0.1>,60020,****1381814216471/ >>>>>>>> 127.0.0.1 >>>>>>>> <http://127.0.0.1>%3A60020.****1381816657239, >>>>>>>> >>>>>>>> length=0 >>>>>>>> >>>>>>>> Thanks for your help! >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >

