About your new crash:
2010-07-15 04:09:03,248 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
Total=3.3544312MB (3517376), Free=405.42056MB (425114272),
Max=408.775MB (428631648), Counts: Blocks=0, Access=1985476, Hit=0,
Miss=1985476, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss
Ratio=100.0%, Evicted/Run=NaN
2010-07-15 04:11:32,791 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
Total=3.3544312MB (3517376), Free=405.42056MB (425114272),
Max=408.775MB (428631648), Counts: Blocks=0, Access=1985759, Hit=0,
Miss=1985759, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss
Ratio=100.0%, Evicted/Run=NaN
2010-07-15 04:11:32,965 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
Total=3.3544312MB (3517376), Free=405.42056MB (425114272),
Max=408.775MB (428631648), Counts: Blocks=0, Access=1985768, Hit=0,
Miss=1985768, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss
Ratio=100.0%, Evicted/Run=NaN
2010-07-15 04:11:33,330 INFO org.apache.zookeeper.ClientCnxn: Client
session timed out, have not heard from server in 176029ms for
sessionid 0x129c87a7f980045, closing socket connection and attempting
reconnect
2010-07-15 04:11:33,330 INFO org.apache.zookeeper.ClientCnxn: Client
session timed out, have not heard from server in 175969ms for
sessionid 0x129c87a7f980052, closing socket connection and attempting
reconnect
Your machine wasn't able to talk to the server for almost 2 minutes,
and we can see that it stopped logging for approx that time. Again,
I'd like to point out that this is a JVM/hardware issue.
My other comments inline.
J-D
On Thu, Jul 15, 2010 at 9:55 AM, Jinsong Hu <[email protected]>
wrote:
I tested the hbase with client throttling and I noticed that in the
beginning, the process is limited by CPU speed. after running for several
hours, I noticed that the speed is limited by disk. Then I realized that
there is a serious flaw in hbase design:
In the beginning , the number of regions is very small. So hbase only
does compaction for those limited number of regions. But as insertion
continues, we will have more and more regions,the clients continue to
insert
more records to the regions, preferably uniformly . At that time lots of
regions need to be compacted. Most of the clusters has relatively stable
number of physical machines. More regions leads to more compactions, that
means for a given physical machine, it needs to support more compaction
as
time go on. And the compaction most involves disk IO, so sooner or later,
the disk IO capacity will be exhausted. At that moment, the machine is
not
responsive any longer, and finally time out happens , and hbase
regionserver
crash.
To me it looks like a GC pause with swapping and/or CPU starvation,
which leads to the region server shutting down itself.
This seems to be a design flaw for hbase. In my testing, even sustained
insertion rate of 1000 can crash the cluster. Is there any solution for
this
?
Well first the values you are inserting are a lot bigger than what
HBase is optimized for out of the box.
WRT your IO exhaustion description, what you really described is the
limit of scalability for a single region server. If HBase had a single
server architecture, I'd say that you're right, but this is not the
case. In the Bigtable paper, they say they usually support 100 tablets
(regions) per TableServer (our RegionServer). When you grow more than
that, you need to add more machines! This is the scalability model,
HBase can grow to hundreds/thousands of servers without any fancy
third party software.
WRT solutions:
1- if this is your initial data import, I'd recommend you use the
HFileOutputFormat since it will be much faster and you don't have to
wait after splits/compactions to happen. See
http://hbase.apache.org/docs/r0.20.5/api/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.html
2- if you are trying to see how HBase handles your expected "normal"
load, then I would recommend getting production-grade machines.
3- if you wish to keep the same hardware, you need to give more
ressources to HBase. Running maps and reduces on the same machine,
when you have only 4 cores, easily leads to resource contention. We
recommend running HBase with at least 4GB, you have only 2GB. Also it
needs cores, at StumbleUpon for example we use 2xi7 which accounts to
16 CPUs when counting hyperthreading, so we run 8 maps and 6 reducers
per machine leaving at least 2 for HBase (it's a database, it needs
processing power). Even when we did the initial import of our data, we
didn't have any GC issues. This is also the setup for our production
cluster, although we (almost) don't run MR jobs on it since they
usually are IO-heavy.
Jimmy.