HBase Regionserver randomly dies

2014-12-01 Thread Robert Kent
Hi, I have got a collection of HBase clusters. Each cluster is running separate instances of Zookeeper, Hadoop HBase. The clusters are either single node or three node setups. I am getting constant stability problems with the HBase Regionserver, it dies randomly everyday or every other

Re: HBase Regionserver randomly dies

2014-12-01 Thread Bharath Vissapragada
Looks like an HDFS issue. Are you sure your HDFS is working fine?

Re: After hadoop QJM failover,hbase can not write

2014-12-01 Thread Bharath Vissapragada
Did you override dfs.client.retry.policy.enabled to true in the regionserver configs? On Mon, Dec 1, 2014 at 9:13 AM, 聪聪 175998...@qq.com wrote: hi,there: I encount a problem,it let me upset. I use version of hadoop is hadoop-2.3.0-cdh5.1.0,namenode HA use the Quorum Journal Manager (QJM)

Re: HBase - Zookeeper Error.

2014-12-01 Thread Bharath Vissapragada
Did you propagate the following config to all regionservers and clients? property namezookeeper.znode.parent/name value/hbase-unsecure/value /property On Mon, Dec 1, 2014 at 12:06 PM, dhamodharan.ramalin...@tcs.com wrote: Hi, I am using Hadoop 2.5.1 and HBase 0.98.8. I an

RE: HBase Regionserver randomly dies

2014-12-01 Thread Robert Kent
Looks like an HDFS issue. Are you sure your HDFS is working fine? HDFS appears to be working correctly - HBase will process requests properly and everything appears to work correctly for hours/days, until the regionserver randomly falls over. If there were HDFS issues I would expect to see

Re: Newbie Question about 37TB binary storage on HBase

2014-12-01 Thread Aleks Laz
Dear Michael. Am 29-11-2014 23:49, schrieb Michael Segel: Guys, KISS. You can use a sequence file to store the images since the images are static. Sorry but what do you mean with this sentence? Use HBase to index the images. If you want… you could use ES or SOLR to take the HBase index

Re: Newbie Question about 37TB binary storage on HBase

2014-12-01 Thread Michael Segel
You receive images, You can store the images in sequence files. (Since HDFS is a WORM file system, you will have to do some work here, storing individual images in a folder on HDFS where you would sweep the images in to a single sequence file and then use HBase to track the location of the

回复: After hadoop QJM failover,hbase can not write

2014-12-01 Thread 聪聪
Thanks for you! According to your suggestion,I configure dfs.client.retry.policy.enabled to true in core-site.xml,and restart making effect.I find some changes in hbase master log. In mater log,retry information appear.But it still takes a long time to be able to write.I want ask how long

Re: HBase Regionserver randomly dies

2014-12-01 Thread Ted Yu
Can you check namenode log around the time 'Failed to close inode' error was thrown ? Thanks On Mon, Dec 1, 2014 at 4:10 AM, Robert Kent robert.k...@inps.co.uk wrote: Looks like an HDFS issue. Are you sure your HDFS is working fine? HDFS appears to be working correctly - HBase will process

RE: HBase Regionserver randomly dies

2014-12-01 Thread Robert Kent
From: Ted Yu [yuzhih...@gmail.com] Sent: 01 December 2014 15:31 To: user@hbase.apache.org Subject: Re: HBase Regionserver randomly dies Can you check namenode log around the time 'Failed to close inode' error was thrown ? Thanks Here are the errors from the logs: 2014-11-29 21:12:59,277

RE: HBase Regionserver randomly dies

2014-12-01 Thread Robert Kent
Sorry, those logs were from the Regionserver. The NameNode logs are: 2014-11-29 21:12:59,493 WARN [IPC Server handler 0 on 8020] blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseTarget(313)) - Failed to place enough replicas, still in need of 1 to reach 1. For

Re: HBase Regionserver randomly dies

2014-12-01 Thread Ted Yu
There could be multiple reasons why the single datanode became considered as dead. e.g. datanode went under load which it couldn't handle. I would recommend adding more datanode(s) so that client (hbase) can ride over (slow) datanode. Cheers On Mon, Dec 1, 2014 at 8:21 AM, Robert Kent

Re: UI tool

2014-12-01 Thread Jignesh Patel
can we use Apache Hbase/Hadoop with Hue? On Thu, Nov 6, 2014 at 12:41 AM, Dima Spivak dspi...@cloudera.com wrote: Yep, you just need to set up an HBase Thrift gateway that Hue can connect to (lots of tutorials online for that). Cheers, Dima On Wed, Nov 5, 2014 at 9:13 PM, jeevi tesh

Re: UI tool

2014-12-01 Thread Shahab Yunus
Yes you can. In fact in some of the vendor's distributions it comes with the standard installation. You can also use Hive and more elaborate, powerful but complex Phoenix. Regards, Shahab On Mon, Dec 1, 2014 at 6:15 PM, Jignesh Patel jigneshmpa...@gmail.com wrote: can we use Apache

Re: Is there anyway I can list out history of transitions made by a region?

2014-12-01 Thread Ted Yu
+1 to Sean's suggestion. On Mon, Dec 1, 2014 at 4:01 PM, Sean Busbey bus...@cloudera.com wrote: Is this something we should be adding to our AUDIT log? On Sun, Nov 30, 2014 at 10:07 PM, Ted Yu yuzhih...@gmail.com wrote: To my knowledge, there is no such tool. You can grep master log

RE: Is there anyway I can list out history of transitions made by a region?

2014-12-01 Thread Bijieshan
All possible scenarios: 1. Assignments during cluster start-up or restart. 2. RegionServer failover. 3. Load balance. 4. Movement triggered manually. I worry about that it will bring a lot of audit log. Should we figure out the key scenarios and log them as different level? Jieshan.

how to tell there is a OOM in regionserver

2014-12-01 Thread Liu, Ming (HPIT-GADSC)
Hi, all, Recently, one of our HBase 0.98.5 instance meet with issues: when run some specific workload, all region servers will suddenly shut down at same time, but master is still running. When I check the log, in master log, I can see messages like 2014-12-01 08:28:11,072 DEBUG

Re: how to tell there is a OOM in regionserver

2014-12-01 Thread Otis Gospodnetic
Hi Ming, 1) There typically is an OOM message from the JVM itself 2) I would monitor the server instead of relying on log messages mentioning OOMs. For example, in SPM http://sematext.com/spm/ we have hearbeat alerts that tell us when we stop hearing from RegionServers and other types of

Re: how to tell there is a OOM in regionserver

2014-12-01 Thread Bharath Vissapragada
I agree with Otis' response. Adding a few more details, there is a .out file in the logs/ directory, that is the stdout for each of these daemons and incase of an OOM crash, it prints something like this # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError=kill -9 %p #

RE: how to tell there is a OOM in regionserver

2014-12-01 Thread Liu, Ming (HPIT-GADSC)
Thank you both! Yes, I can see there is the '.out' file with clear proof of process was 'killed'. So we can prove this issue now! And it is also true that we must rely on JVM itself for proof that the kill operation is due to OOM. Thank you both, this is a very good learning. Thanks, Ming