RE: HBase Large Load Issue

Vladimir Rodionov Tue, 03 Dec 2013 17:25:21 -0800

>>Any advice is appreciated.

Do not store your files in HBase, store only references.


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [email protected]

________________________________________
From: Bill Sanchez [[email protected]]
Sent: Tuesday, December 03, 2013 3:45 PM
To: [email protected]
Subject: HBase Large Load Issue

Hello,

I am seeking some advice on my hbase issue.  I am trying to configure a
system that will eventually load and store approximately 50GB-80GB of data
daily.  This data consists of files that are roughly 3MB-5MB each with some
reaching 20MB and some as small as 1MB.  The load job does roughly 20,000
puts to the same table spread across an initial set of 20 pre-split regions
on 20 region servers.  During the first load I see some splitting (ending
with around 50 regions) and in subsequent loads the number of regions will
go much higher.

After running similarly sized loads about 4 or 5 times I start to see the
following behavior that I cannot explain.  The table in question has
VERSIONS=1 and some of these test loads use the same data, but not all.
Below is a summary of the behavior along with a few of the configuration
settings I have tried so far.

Environment:

HBase 0.94.13-security with Kerberos enabled
Zookeeper 3.4.5
Hadoop 1.0.4

Symptoms:

1.  Requests per second fall to 0 for all region servers
2.  Log files show socket timeout exceptions after waiting for scans of META
3.  Region servers sometimes eventually show up as dead
4.  Once HBase reaches a broken state some regions show up as in a
transition state indefinitely
5.  All of these issues seem to happen around the time of major compaction
events

This issue seems to be sensitive to hbase.rpc.timeout which I increased
significantly but only served to lengthen the amount of time until I see
socket timeout exceptions.

A few notes:

1.  I don't see massive GC in the gc log.
2.  Originally Snappy compression was enabled, but as a test I turned it
off and it doesn't seem to make any difference in the testing.
3.  The WAL is disabled for the table involved in the load
4.  TeraSort appears to run normally in HDFS
5.  The HBase randomWrite and randomRead tests appear to run normally on
this cluster (although randomWrite does not write anywhere close to 3MB-5MB)
6.  Ganglia is available in my environment

Settings already altered:

1.  hbase.rpc.timeout=900000 (I realize this may be too high)
2.  hbase.regionserver.handler.count=100
3.  ipc.server.max.callqueue.size=10737418240
4.  hbase.regionserver.lease.period=900000
5.  hbase.hregion.majorcompaction=0 (I have been manually compacting
between loads with no difference in behavior)
6.  hbase.hregion.memstore.flush.size=268435456
7.  dfs.datanode.max.xcievers=131072
8.  dfs.datanode.handler.count=100
9.  ipc.server.listen.queue.size=256
10.  -Xmx16384m XX:+UseConcMarkSweepGC -XX:+UseMembar -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/logs/gc.log -Xms16384m
-XX:PrintFLSStatistics=1 -XX:+CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=70 -XX:+UseParNewGC
11. I have tried other GC settings but they don't seem to have any real
impact on GC performance in this case

Any advice is appreciated.

Thanks

Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or [email protected] and delete or destroy any 
copy of this message and its attachments.

RE: HBase Large Load Issue

Reply via email to