Hi,
We're running a 25-node regionserver hbase cluster, using cdh3u0.
1. We run into several jvm crashes on master today. It seems like jvm
issues, as I attached the hs_error_pid files
with this message. Just want to confirm that if this is really a jvm issue,
or maybe some master issue trigger the
low level one.
2. We also have two regionservers down today, after the regionserver
restarted, the regions assigned to them
is much less than the others. The master logs indicates that
====
2011-07-20 23:14:42,842 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=
hd0022-s4.c.wx-gj.sdo.com,60020,1310764957312, load=(requests=0, regions=0,
usedHeap=0, maxHeap=0) returned
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Received close for
S3Table,ku6_ku6upload_1307149487260,1311102507829.41491bc74321aeb578f00aae2725eefc.
but we are not serving it for 41491bc74321aeb578f00aae2725eefc
2011-07-20 23:14:43,228 DEBUG org.apache.hadoop.hbase.master.HMaster: Not
running balancer because 2 region(s) in transition:
{1efae368e8d64cc59aeadbb3289fddac=S3Table,EStore_everbox_HlYLqw4bMBDuLsfFZXg0kesvVjg=,1305255180483.1efae368e8d64cc59aeadbb3289fddac.
state=PENDING_CLOSE, ts=1311174872837,
41491bc74321aeb578f00aae2725eefc=S3Table,ku6_ku6upload_1307149487260,13111025078...
2011-07-20 23:15:02,845 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
timed out:
S3Table,EStore_everbox_HlYLqw4bMBDuLsfFZXg0kesvVjg=,1305255180483.1efae368e8d64cc59aeadbb3289fddac.
state=PENDING_CLOSE, ts=1311174872837
2011-07-20 23:15:02,845 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Region has been
PENDING_CLOSE for too long, running forced unassign again on
region=S3Table,EStore_everbox_HlYLqw4bMBDuLsfFZXg0kesvVjg=,1305255180483.1efae368e8d64cc59aeadbb3289fddac.
2011-07-20 23:15:02,845 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Region has been
PENDING_CLOSE for too long, running forced unassign again on
region=S3Table,EStore_everbox_HlYLqw4bMBDuLsfFZXg0kesvVjg=,1305255180483.1efae368e8d64cc59aeadbb3289fddac.
2011-07-20 23:15:02,845 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of
region
S3Table,EStore_everbox_HlYLqw4bMBDuLsfFZXg0kesvVjg=,1305255180483.1efae368e8d64cc59aeadbb3289fddac.
(offlining)
2011-07-20 23:15:02,845 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Attempting to unassign
region
S3Table,EStore_everbox_HlYLqw4bMBDuLsfFZXg0kesvVjg=,1305255180483.1efae368e8d64cc59aeadbb3289fddac.
which is already pending close but forcing an additional close
2011-07-20 23:15:02,850 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=
hd0022-s4.c.wx-gj.sdo.com,60020,1310764957312, load=(requests=0, regions=0,
usedHeap=0, maxHeap=0) returned
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Received close for
S3Table,EStore_everbox_HlYLqw4bMBDuLsfFZXg0kesvVjg=,1305255180483.1efae368e8d64cc59aeadbb3289fddac.
but we are not serving it for 1efae368e8d64cc59aeadbb3289fddac
2011-07-20 23:15:12,848 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
timed out:
S3Table,ku6_ku6upload_1307149487260,1311102507829.41491bc74321aeb578f00aae2725eefc.
state=PENDING_CLOSE, ts=1311174882840
2011-07-20 23:15:12,848 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Region has been
PENDING_CLOSE for too long, running forced unassign again on
region=S3Table,ku6_ku6upload_1307149487260,1311102507829.41491bc74321aeb578f00aae2725eefc.
2011-07-20 23:15:12,849 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of
region
S3Table,ku6_ku6upload_1307149487260,1311102507829.41491bc74321aeb578f00aae2725eefc.
(offlining)
2011-07-20 23:15:12,849 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Attempting to unassign
region
S3Table,ku6_ku6upload_1307149487260,1311102507829.41491bc74321aeb578f00aae2725eefc.
which is already pending close but forcing an additional close
2011-07-20 23:15:12,853 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=
hd0022-s4.c.wx-gj.sdo.com,60020,1310764957312, load=(requests=0, regions=0,
usedHeap=0, maxHeap=0) returned
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Received close for
S3Table,ku6_ku6upload_1307149487260,1311102507829.41491bc74321aeb578f00aae2725eefc.
but we are not serving it for 41491bc74321aeb578f00aae2725eefc
2011-07-20 23:15:32,854 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition
timed out:
S3Table,EStore_everbox_HlYLqw4bMBDuLsfFZXg0kesvVjg=,1305255180483.1efae368e8d64cc59aeadbb3289fddac.
state=PENDING_CLOSE, ts=1311174902846
2011-07-20 23:15:32,854 INFO
org.apache.hadoop.hbase.master.AssignmentManager: Region has been
PENDING_CLOSE for too long, running forced unassign again on
region=S3Table,EStore_everbox_HlYLqw4bMBDuLsfFZXg0kesvVjg=,1305255180483.1efae368e8d64cc59aeadbb3289fddac.
2011-07-20 23:15:32,855 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of
region
S3Table,EStore_everbox_HlYLqw4bMBDuLsfFZXg0kesvVjg=,1305255180483.1efae368e8d64cc59aeadbb3289fddac.
(offlining)
2011-07-20 23:15:32,855 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Attempting to unassign
region
S3Table,EStore_everbox_HlYLqw4bMBDuLsfFZXg0kesvVjg=,1305255180483.1efae368e8d64cc59aeadbb3289fddac.
which is already pending close but forcing an additional close
2011-07-20 23:15:32,859 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager: Server serverName=
hd0022-s4.c.wx-gj.sdo.com,60020,1310764957312, load=(requests=0, regions=0,
usedHeap=0, maxHeap=0) returned
org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Received close for
S3Table,EStore_everbox_HlYLqw4bMBDuLsfFZXg0kesvVjg=,1305255180483.1efae368e8d64cc59aeadbb3289fddac.
but we are not serving it for 1efae368e8d64cc59aeadbb3289fddac
====
It looks like the balancer is blocked by two transition regions, however,
the master run into troubles to close regions.
And operation can I make to bypass this issue?
Thanks and regards,
Mao Xu-Feng