Hi Talat - Since I am running HBase Pseudo mode for Nutch, I have changed some of your properties like *hbase.client.scanner.caching* to 1 and * hbase.regionserver.handler.count* to 10 and *hbase.client.write.buffer* to 2097152. I emailed hbase user list with the out of memory error hoping to find some help. I don't understand why crawling goes on fine until like 5th iteration and in 5th iteration parsing stage of some urls, nutch just hangs with an out of memory error: heap space. I have a 6GB RAM but I am not crawling a million records, as a test run I am just trying to crawl a url with depth 7 and topN 1000. I am not sure what can be done in this case.
Thanks for your help, Laxmi On Sun, Oct 20, 2013 at 3:55 AM, Talat UYARER <[email protected]>wrote: > Hey Laxmi, > > First of all please send your email our maillist. Maybe somebody tell > their experiences. > > If you use my settings with didnt change any value, your heap size will be > out of memory. That is normal. I have 64 GB Ram on my datanodes. You should > change my settings for sutiable your computer. > > 20-10-2013 05:13 tarihinde, A Laxmi yazdı: > >> Hi Talat! >> >> Update - So I added some of the properties you recommended for tuning and >> I have some good and bad news. >> Good news is Region Server was not getting disconnected under heavy crawl >> (thanks to Talat!!!!) and the bad news is I am getting an Out of memory: >> heap space exception while I was in the 5th crawl iteration. >> >> I have set 8GB for Heap in hbase_env.sh. Not sure why I have this out of >> memory: heap space issue. Please comment. >> >> Thanks >> Laxmi >> >> >> >> >> On Fri, Oct 18, 2013 at 4:16 PM, A Laxmi <[email protected] <mailto: >> [email protected]**>> wrote: >> >> Thanks Talat! I will try it out again and you will be the first >> person to notify if mine works. I will keep you posted. >> >> Thanks >> Laxmi >> >> >> On Fri, Oct 18, 2013 at 3:52 PM, Talat UYARER >> <[email protected] >> <mailto:talat.uyarer@agmlab.**com<[email protected]>>> >> wrote: >> >> For the issue, I didnt see any problem. You need some property >> for tuning. But it is not our subject. I share my >> hbase-site.xml. If you dont wrong remember. This issue related >> with only hbase. >> >> <configuration> >> <property> >> <name>hbase.rootdir</name> >> <value>hdfs://hdpnn01.secret.**local:8080/hbase</value> >> >> </property> >> <property> >> <name>hbase.cluster.**distributed</name> >> <value>true</value> >> </property> >> <property> >> <name>hbase.hregion.max.**filesize</name> >> <value>10737418240</value> >> >> </property> >> <property> >> <name>hbase.zookeeper.quorum</**name> >> <value>hdpzk01.secret.local,**hdpzk02.secret.local,hdpzk03.** >> secret.local</value> >> </property> >> >> <property> >> <name>hbase.client.scanner.**caching</name> >> <value>200</value> >> </property> >> <property> >> <name>hbase.client.scanner.**timeout.period</name> >> <value>120000</value> >> </property> >> <property> >> <name>hbase.regionserver.**lease.period</name> >> <value>900000</value> >> </property> >> <property> >> <name>hbase.rpc.timeout</name> >> <value>900000</value> >> </property> >> >> >> <property> >> <name>dfs.support.append</**name> >> <value>true</value> >> </property> >> <property> >> <name>hbase.hregion.memstore.**mslab.enabled</name> >> >> <value>true</value> >> </property> >> <property> >> <name>hbase.regionserver.**handler.count</name> >> <value>20</value> >> </property> >> <property> >> <name>hbase.client.write.**buffer</name> >> <value>20971520</value> >> </property> >> >> >> 18-10-2013 20:34 tarihinde, A Laxmi yazdı: >> >>> Thanks, Talat! I will try with those properties you >>> recommended. Please look at my other properties here and >>> please let me know your comments. >>> >>> HBase: 0.90.6 >>> Hadoop: 0.20.205.0 >>> Nutch: 2.2.1 >>> >>> Note: I have set 8GB for heap size in *hbase/conf/hbase-env.sh* >>> >>> ==============================**=========== >>> Below is my *hbase/conf/hbase-site.xml: >>> >>> ==============================**=========== >>> <property> >>> <name>hbase.rootdir</name> >>> <value>hdfs://localhost:8020/**hbase</value> >>> </property> >>> <property> >>> <name>hbase.cluster.**distributed</name> >>> <value>true</value> >>> </property> >>> <property> >>> <name>hbase.zookeeper.quorum</**name> >>> <value>localhost</value> >>> </property> >>> <property> >>> <name>dfs.replication</name> >>> <value>1</value> >>> </property> >>> <property> >>> <name>hbase.zookeeper.**property.clientPort</name> >>> <value>2181</value> >>> </property> >>> <property> >>> <name>hbase.zookeeper.**property.dataDir</name> >>> <value>/home/hadoop/hbase-0.**90.6/zookeeper</value> >>> </property> >>> <property> >>> <name>zookeeper.session.**timeout</name> >>> <value>60000</value> >>> </property> >>> >>> ==============================**========== >>> * >>> Below is my*hadoop/conf/hdfs-site.xml >>> * >>> *=============================**=========== >>> >>> >>> <property> >>> <name>dfs.support.append</**name> >>> <value>true</value> >>> </property> >>> <property> >>> <name>dfs.replication</name> >>> <value>1</value> >>> </property> >>> <property> >>> <name>dfs.safemode.extension</**name> >>> <value>0</value> >>> </property> >>> <property> >>> <name>dfs.safemode.min.**datanodes</name> >>> <value>1</value> >>> </property> >>> <property> >>> <name>dfs.permissions.enabled<**/name> >>> <value>false</value> >>> </property> >>> <property> >>> <name>dfs.permissions</name> >>> <value>false</value> >>> </property> >>> <property> >>> <name>dfs.safemode.min.**datanodes</name> >>> <value>1</value>* >>> *</property> >>> >>> <property> >>> <name>dfs.webhdfs.enabled</**name> >>> <value>true</value> >>> </property> >>> <property> >>> <name>hadoop.tmp.dir</name> >>> <value>/home/hadoop/dfs/tmp</**value> >>> </property> >>> >>> ==============================**==== >>> * >>> Below is my *hadoop/conf/core-site.xml* >>> ==============================**==== >>> *<property> >>> <name>fs.default.name <http://fs.default.name></**name> >>> >>> <!-- <value>hdfs://0.0.0.0:8020 >>> <http://0.0.0.0:8020></value> --> >>> >>> <value>hdfs://localhost:8020</**value> >>> </property>* >>> >>> ==============================**===== >>> Below is my*hadoop/conf/**mapred-site.**xml* >>> ==============================**===== >>> *<configuration> >>> <property> >>> <name>mapred.job.tracker</**name> >>> <value>0.0.0.0:8021 <http://0.0.0.0:8021></value> >>> >>> </property> >>> >>> <property> >>> <name>mapred.task.timeout</**name> >>> <value>3600000</value> >>> </property> >>> >>> </configuration>* >>> ==============================**====== >>> * >>> >>> * >>> >>> >>> On Fri, Oct 18, 2013 at 12:01 PM, Talat UYARER >>> <[email protected] >>> <mailto:talat.uyarer@agmlab.**com<[email protected]>>> >>> wrote: >>> >>> Ooh no :(. Sorry Laxmi. I had same issue. I said wrong >>> settings :) I should set up >>> >>> - hbase.client.scanner.caching (My value: 200) >>> - hbase.regionserver.handler.**count (My Value: 20) >>> - hbase.client.write.buffer (My Value:20971520) >>> >>> You should set your region load. It will be solved. >>> >>> Talat >>> >>> >>> 18-10-2013 17:52 tarihinde, A Laxmi yazdı: >>> >>>> Hi Talat, >>>> I am sorry to say - I have not fixed it yet. I have >>>> spent literally sleepless nights to debug that issue. No >>>> matter what I do, RegionServer always used to get >>>> disconnected. :( >>>> >>>> Since now I have a deadline in two days, I will go with >>>> your advise to one of my email to use HBase *standalone >>>> *mode since I am crawling about 10 urls to achieve about >>>> >>>> 300,000 urls. Once I get that done, I will retry >>>> debugging the regionserver issue. I remember you use >>>> Hadoop 1.x and not 0.20.205.0 like me, I am not sure if >>>> there is any bug with I am using? >>>> >>>> Thanks, >>>> Laxmi >>>> >>>> >>>> On Fri, Oct 18, 2013 at 10:47 AM, Talat UYARER >>>> <[email protected] >>>> <mailto:talat.uyarer@agmlab.**com <[email protected]>>> >>>> wrote: >>>> >>>> Hi Laxmi, >>>> You are welcome. I know very well that feel. I >>>> havent used Cloudera. I use pure install >>>> hadoop-hbase cluster base on Centos. I am happy >>>> because of you fixed your issue. >>>> >>>> Talat >>>> >>>> >>>> 18-10-2013 17:36 tarihinde, A Laxmi yazdı: >>>> >>>>> Thanks for the article, Talat! It was so annoying >>>>> to see RegionSever getting disconnected under heavy >>>>> load while everything works. Have you used Cloudera >>>>> for Nutch? >>>>> >>>>> >>>>> On Thu, Oct 17, 2013 at 6:50 PM, Talat UYARER >>>>> <[email protected] >>>>> >>>>> <mailto:talat.uyarer@agmlab.**com<[email protected]>>> >>>>> wrote: >>>>> >>>>> Hi Laxmi, >>>>> >>>>> It didnt come me. I understand, your >>>>> RegionServer is gone away. Because of this >>>>> problem is your Hbase heap size or xceivers >>>>> count is not enough. I had same issue. I set up >>>>> my xceivers count. I am not sure what it will >>>>> be count.But you should check your heaps size >>>>> usage. If it is enough you can set up your >>>>> property. This article[1] very good about this >>>>> property. >>>>> >>>>> [1] >>>>> http://blog.cloudera.com/blog/** >>>>> 2012/03/hbase-hadoop-xceivers/<http://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/> >>>>> >>>>> Talat >>>>> >>>>> 17-10-2013 22:20 tarihinde, A Laxmi yazdı: >>>>> >>>>>> Hi Talat, >>>>>> >>>>>> I hope all is well. Could you please advise me >>>>>> on this issue? >>>>>> >>>>>> Thanks for your help! >>>>>> >>>>>> ---------- Forwarded message ---------- >>>>>> From: *A Laxmi* <[email protected] >>>>>> <mailto:[email protected]**>> >>>>>> Date: Tue, Oct 15, 2013 at 12:07 PM >>>>>> Subject: HBase Pseudo mode - RegionServer >>>>>> disconnects after some time >>>>>> To: [email protected] >>>>>> <mailto:[email protected]> >>>>>> >>>>>> >>>>>> Hi - >>>>>> >>>>>> Please find the below log of HBase-master. I >>>>>> have tried all sorts of fixes mentioned in >>>>>> various threads yet I could not overcome this >>>>>> issue. I made sure I dont have 127.0.1.1 in >>>>>> /etc/hosts file. I pinged my localhost >>>>>> (hostname) which gives back the actual IP and >>>>>> not 127.0.0.1 using ping -c 1 localhost. I >>>>>> have 'localhost' in my /etc/hostname and >>>>>> actual IP address mapped to >>>>>> localhost.localdomain and localhost as alias - >>>>>> something like >>>>>> >>>>>> /etc/hosts - >>>>>> >>>>>> 192.***.*.*** localhost.localdomain localhost >>>>>> >>>>>> /etc/hostname - >>>>>> >>>>>> localhost >>>>>> >>>>>> I am using *Hadoop 0.20.205.0 and HBase 0.90.6 >>>>>> in Pseudo mode* for storing crawled data from >>>>>> >>>>>> a crawler - Apache Nutch 2.2.1. I can start >>>>>> Hadoop and HBase and when I do jps it shows >>>>>> all good, then after that when I start Nutch >>>>>> crawl after about 40mins of crawling or so, I >>>>>> can see Nutch hanging up while in about 4th >>>>>> iteration of parsing and at the same time when >>>>>> I do jps in HBase, I can see everything except >>>>>> HRegionServer. Below is the log. >>>>>> >>>>>> I tried all possible ways but couldn't >>>>>> overcome this issue. I really need someone >>>>>> from HBase list to help me with this issue. >>>>>> >>>>>> >>>>>> 2013-10-15 02:02:08,285 DEBUG >>>>>> org.apache.hadoop.hbase.** >>>>>> regionserver.wal.HLogSplitter: >>>>>> Pushed=56 entries from >>>>>> hdfs://localhost:8020/hbase/.**logs/127.0.0.1 >>>>>> <http://127.0.0.1>,60020,**1381814216471/ >>>>>> 127.0.0.1 >>>>>> <http://127.0.0.1>%3A60020.**1381816329235 >>>>>> >>>>>> 2013-10-15 02:02:08,285 DEBUG >>>>>> org.apache.hadoop.hbase.** >>>>>> regionserver.wal.HLogSplitter: >>>>>> Splitting hlog 28 of 29: >>>>>> hdfs://localhost:8020/hbase/.**logs/127.0.0.1 >>>>>> <http://127.0.0.1>,60020,**1381814216471/ >>>>>> 127.0.0.1 >>>>>> <http://127.0.0.1>%3A60020.**1381816367672, >>>>>> >>>>>> length=64818440 >>>>>> 2013-10-15 02:02:08,285 WARN >>>>>> org.apache.hadoop.hbase.util.**FSUtils: Running >>>>>> on HDFS without append enabled may result in >>>>>> data loss >>>>>> 2013-10-15 02:02:08,554 DEBUG >>>>>> org.apache.*hadoop.hbase.**master.HMaster: Not >>>>>> >>>>>> running balancer because processing dead >>>>>> regionserver(s): [127.0.0.1,60020*,** >>>>>> 1381814216471] >>>>>> >>>>>> 2013-10-15 02:02:08,556 INFO >>>>>> org.apache.hadoop.hbase.** >>>>>> catalo*g.CatalogTracker: >>>>>> >>>>>> Failed verification of .META.,,1 at >>>>>> address=127.0.0.1:60020 >>>>>> <http://127.0.0.1:60020>; >>>>>> java.net.ConnectException: Connection refused* >>>>>> >>>>>> 2013-10-15 02:02:08,559 INFO >>>>>> org.apache.hadoop.hbase.** >>>>>> catalog.*CatalogTracker: >>>>>> Current cached META location is not valid*, >>>>>> >>>>>> resetting >>>>>> 2013-10-15 02:02:08,601 WARN >>>>>> org.apache.hadoop.*hbase.**master.CatalogJanitor: >>>>>> Failed >>>>>> >>>>>> scan of catalog table >>>>>> org.apache.hadoop.hbase.** >>>>>> NotAllMetaRegionsOnlineExcepti**on: >>>>>> Timed out (2147483647ms)* >>>>>> >>>>>> at >>>>>> org.apache.hadoop.hbase.**catalog.CatalogTracker. >>>>>> **waitForMeta(CatalogTracker.**java:390) >>>>>> at >>>>>> org.apache.hadoop.hbase.**catalog.CatalogTracker. >>>>>> **waitForMetaServerConnectionDef**ault(CatalogTracker.java:422) >>>>>> at >>>>>> org.apache.hadoop.hbase.** >>>>>> catalog.MetaReader.fullScan(**MetaReader.java:255) >>>>>> at >>>>>> org.apache.hadoop.hbase.** >>>>>> catalog.MetaReader.fullScan(**MetaReader.java:237) >>>>>> at >>>>>> org.apache.hadoop.hbase.** >>>>>> master.CatalogJanitor.scan(**CatalogJanitor.java:120) >>>>>> at >>>>>> org.apache.hadoop.hbase.** >>>>>> master.CatalogJanitor.chore(**CatalogJanitor.java:88) >>>>>> at >>>>>> org.apache.hadoop.hbase.Chore.** >>>>>> run(Chore.java:66) >>>>>> 2013-10-15 02:02:08,842 INFO >>>>>> org.apache.hadoop.hbase.**regionserver.wal.** >>>>>> SequenceFileLogWriter: >>>>>> syncFs -- HDFS-200 -- not available, >>>>>> dfs.support.append=false >>>>>> 2013-10-15 02:02:08,842 DEBUG >>>>>> org.apache.hadoop.hbase.** >>>>>> regionserver.wal.HLogSplitter: >>>>>> Creating writer >>>>>> path=hdfs://localhost:8020/**hbase/1_webpage/** >>>>>> 853ef78be7c0853208e865a9ff13d5**fb/recovered.edits/** >>>>>> 0000000000000001556.**tempregion=**853ef78be7c0853208e865a9ff13d5**fb >>>>>> 2013-10-15 02:02:09,443 DEBUG >>>>>> org.apache.hadoop.hbase.** >>>>>> regionserver.wal.HLogSplitter: >>>>>> Pushed=39 entries from >>>>>> hdfs://localhost:8020/hbase/.**logs/127.0.0.1 >>>>>> <http://127.0.0.1>,60020,**1381814216471/ >>>>>> 127.0.0.1 >>>>>> <http://127.0.0.1>%3A60020.**1381816367672 >>>>>> >>>>>> 2013-10-15 02:02:09,444 DEBUG >>>>>> org.apache.hadoop.hbase.** >>>>>> regionserver.wal.HLogSplitter: >>>>>> Splitting hlog 29 of 29: >>>>>> hdfs://localhost:8020/hbase/.**logs/127.0.0.1 >>>>>> <http://127.0.0.1>,60020,**1381814216471/ >>>>>> 127.0.0.1 >>>>>> <http://127.0.0.1>%3A60020.**1381816657239, >>>>>> length=0 >>>>>> >>>>>> Thanks for your help! >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> >> >

