Hi, I'm using HBase 0.94.12 above Hadoop 1.2.1 and I have one node for
zookeeper, one node for a Namenode/Hmaster and three Datanode/Regionservers.
All the machines are on Amazon EC2, instance m2.xlarge.
I set the replication at two, so I'm expecting if I kill a
HregionServer/Datanode (for example by killing all java processes), all the
regions on that node are recover on one of the other two alive
HRegionservers.
But when I kill the node, I lost the regions on it and, worst of all, if on
that node there is .META. or -ROOT- table, the entire cluster is not working
at all!
If it could be helpfull, I load 500000 of rows in 'usertable' table with
YCSB tool and these are the status 'simple' and /hadoop fsck /hbase output
before/after the kill of the node:
before:
hbase(main):001:0> status 'simple'
3 live servers
ip-10-235-11-139:60020 1385632293907
requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=57,
maxHeapMB=14983
ip-10-253-29-220:60020 1385632293955
requestsPerSecond=0, numberOfOnlineRegions=2, usedHeapMB=74,
maxHeapMB=14983
ip-10-253-29-249:60020 1385632294162
requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=1935,
maxHeapMB=14983
0 dead servers
Aggregate load: 0, regions: 4
FSCK started by ubuntu from /10.253.91.250 for path /hbase at Thu Nov 28
09:57:20 UTC 2013
..................................Status: HEALTHY
Total size: 2122147158 B
Total dirs: 31
Total files: 34 (Files currently being written: 3)
Total blocks (validated): 59 (avg. block size 35968595 B) (Total open
file blocks (not validated): 2)
Minimally replicated blocks: 59 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Thu Nov 28 09:57:20 UTC 2013 in 23 milliseconds
The filesystem under path '/hbase' is HEALTHY
-------------------------------------------------------------------------
-------------------------------------------------------------------------
and after (about 15 minutes):
hbase(main):001:0> status 'simple'
2 live servers
ip-10-235-11-139:60020 1385632293907
requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=63,
maxHeapMB=14983
ip-10-253-29-220:60020 1385632293955
requestsPerSecond=0, numberOfOnlineRegions=2, usedHeapMB=117,
maxHeapMB=14983
1 dead servers
ip-10-253-29-249,60020,1385632294162
Aggregate load: 0, regions: 3
FSCK started by ubuntu from /10.253.91.250 for path /hbase at Thu Nov 28
10:13:29 UTC 2013
....................Status: HEALTHY
Total size: 948168097 B
Total dirs: 27
Total files: 20 (Files currently being written: 3)
Total blocks (validated): 29 (avg. block size 32695451 B) (Total open
file blocks (not validated): 2)
Minimally replicated blocks: 29 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 2
Number of racks: 1
FSCK ended at Thu Nov 28 10:13:29 UTC 2013 in 7 milliseconds
The filesystem under path '/hbase' is HEALTHY
I hope to have been clear and to provide sufficiently information, or I can
post the hbase-site.xml and hdfs-site.xml configuration.
Thank you for your help!
Andrea