So 1 region of usertable got lost ? Can you pastebin master server log around the time you killed the region server ?
Thanks On Nov 28, 2013, at 2:13 AM, Andrea <[email protected]> wrote: > Hi, I'm using HBase 0.94.12 above Hadoop 1.2.1 and I have one node for > zookeeper, one node for a Namenode/Hmaster and three Datanode/Regionservers. > All the machines are on Amazon EC2, instance m2.xlarge. > > I set the replication at two, so I'm expecting if I kill a > HregionServer/Datanode (for example by killing all java processes), all the > regions on that node are recover on one of the other two alive > HRegionservers. > > But when I kill the node, I lost the regions on it and, worst of all, if on > that node there is .META. or -ROOT- table, the entire cluster is not working > at all! > > If it could be helpfull, I load 500000 of rows in 'usertable' table with > YCSB tool and these are the status 'simple' and /hadoop fsck /hbase output > before/after the kill of the node: > > before: > > hbase(main):001:0> status 'simple' > 3 live servers > ip-10-235-11-139:60020 1385632293907 > requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=57, > maxHeapMB=14983 > ip-10-253-29-220:60020 1385632293955 > requestsPerSecond=0, numberOfOnlineRegions=2, usedHeapMB=74, > maxHeapMB=14983 > ip-10-253-29-249:60020 1385632294162 > requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=1935, > maxHeapMB=14983 > 0 dead servers > Aggregate load: 0, regions: 4 > > > FSCK started by ubuntu from /10.253.91.250 for path /hbase at Thu Nov 28 > 09:57:20 UTC 2013 > ..................................Status: HEALTHY > Total size: 2122147158 B > Total dirs: 31 > Total files: 34 (Files currently being written: 3) > Total blocks (validated): 59 (avg. block size 35968595 B) (Total open > file blocks (not validated): 2) > Minimally replicated blocks: 59 (100.0 %) > Over-replicated blocks: 0 (0.0 %) > Under-replicated blocks: 0 (0.0 %) > Mis-replicated blocks: 0 (0.0 %) > Default replication factor: 2 > Average block replication: 2.0 > Corrupt blocks: 0 > Missing replicas: 0 (0.0 %) > Number of data-nodes: 3 > Number of racks: 1 > FSCK ended at Thu Nov 28 09:57:20 UTC 2013 in 23 milliseconds > > > The filesystem under path '/hbase' is HEALTHY > > ------------------------------------------------------------------------- > ------------------------------------------------------------------------- > > and after (about 15 minutes): > > hbase(main):001:0> status 'simple' > 2 live servers > ip-10-235-11-139:60020 1385632293907 > requestsPerSecond=0, numberOfOnlineRegions=1, usedHeapMB=63, > maxHeapMB=14983 > ip-10-253-29-220:60020 1385632293955 > requestsPerSecond=0, numberOfOnlineRegions=2, usedHeapMB=117, > maxHeapMB=14983 > 1 dead servers > ip-10-253-29-249,60020,1385632294162 > Aggregate load: 0, regions: 3 > > > FSCK started by ubuntu from /10.253.91.250 for path /hbase at Thu Nov 28 > 10:13:29 UTC 2013 > ....................Status: HEALTHY > Total size: 948168097 B > Total dirs: 27 > Total files: 20 (Files currently being written: 3) > Total blocks (validated): 29 (avg. block size 32695451 B) (Total open > file blocks (not validated): 2) > Minimally replicated blocks: 29 (100.0 %) > Over-replicated blocks: 0 (0.0 %) > Under-replicated blocks: 0 (0.0 %) > Mis-replicated blocks: 0 (0.0 %) > Default replication factor: 2 > Average block replication: 2.0 > Corrupt blocks: 0 > Missing replicas: 0 (0.0 %) > Number of data-nodes: 2 > Number of racks: 1 > FSCK ended at Thu Nov 28 10:13:29 UTC 2013 in 7 milliseconds > > > The filesystem under path '/hbase' is HEALTHY > > > I hope to have been clear and to provide sufficiently information, or I can > post the hbase-site.xml and hdfs-site.xml configuration. > > Thank you for your help! > > Andrea >
