>>Any advice is appreciated. Do not store your files in HBase, store only references.
Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: [email protected] ________________________________________ From: Bill Sanchez [[email protected]] Sent: Tuesday, December 03, 2013 3:45 PM To: [email protected] Subject: HBase Large Load Issue Hello, I am seeking some advice on my hbase issue. I am trying to configure a system that will eventually load and store approximately 50GB-80GB of data daily. This data consists of files that are roughly 3MB-5MB each with some reaching 20MB and some as small as 1MB. The load job does roughly 20,000 puts to the same table spread across an initial set of 20 pre-split regions on 20 region servers. During the first load I see some splitting (ending with around 50 regions) and in subsequent loads the number of regions will go much higher. After running similarly sized loads about 4 or 5 times I start to see the following behavior that I cannot explain. The table in question has VERSIONS=1 and some of these test loads use the same data, but not all. Below is a summary of the behavior along with a few of the configuration settings I have tried so far. Environment: HBase 0.94.13-security with Kerberos enabled Zookeeper 3.4.5 Hadoop 1.0.4 Symptoms: 1. Requests per second fall to 0 for all region servers 2. Log files show socket timeout exceptions after waiting for scans of META 3. Region servers sometimes eventually show up as dead 4. Once HBase reaches a broken state some regions show up as in a transition state indefinitely 5. All of these issues seem to happen around the time of major compaction events This issue seems to be sensitive to hbase.rpc.timeout which I increased significantly but only served to lengthen the amount of time until I see socket timeout exceptions. A few notes: 1. I don't see massive GC in the gc log. 2. Originally Snappy compression was enabled, but as a test I turned it off and it doesn't seem to make any difference in the testing. 3. The WAL is disabled for the table involved in the load 4. TeraSort appears to run normally in HDFS 5. The HBase randomWrite and randomRead tests appear to run normally on this cluster (although randomWrite does not write anywhere close to 3MB-5MB) 6. Ganglia is available in my environment Settings already altered: 1. hbase.rpc.timeout=900000 (I realize this may be too high) 2. hbase.regionserver.handler.count=100 3. ipc.server.max.callqueue.size=10737418240 4. hbase.regionserver.lease.period=900000 5. hbase.hregion.majorcompaction=0 (I have been manually compacting between loads with no difference in behavior) 6. hbase.hregion.memstore.flush.size=268435456 7. dfs.datanode.max.xcievers=131072 8. dfs.datanode.handler.count=100 9. ipc.server.listen.queue.size=256 10. -Xmx16384m XX:+UseConcMarkSweepGC -XX:+UseMembar -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/logs/gc.log -Xms16384m -XX:PrintFLSStatistics=1 -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseParNewGC 11. I have tried other GC settings but they don't seem to have any real impact on GC performance in this case Any advice is appreciated. Thanks Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or [email protected] and delete or destroy any copy of this message and its attachments.
