Hi,

I'm seeing an issue on our hbase cluster which is preventing snapshots from 
working. So far the only way i can get it working again is to restart all the 
regionservers which is not ideal.

I'm running hbase 0.98.3 on hadoop 2.4.0, the stacktraces are

>From host given the snapshot command via hbase shell 

2014-09-23 10:00:16,116 INFO  [regionserver16020-EventThread] 
procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
event: /hbase/online-snapshot/acquired
2014-09-23 10:00:16,116 DEBUG [regionserver16020-EventThread] 
procedure.ZKProcedureMemberRpcs: Looking for new procedures under 
znode:'/hbase/online-snapshot/acquired'
2014-09-23 10:00:16,117 DEBUG [regionserver16020-EventThread] 
procedure.ZKProcedureMemberRpcs: Found procedure znode: 
/hbase/online-snapshot/acquired/offlineScans-20140923-100009
2014-09-23 10:00:16,117 DEBUG [regionserver16020-EventThread] 
procedure.ZKProcedureMemberRpcs: start proc data length is 59
2014-09-23 10:00:16,117 DEBUG [regionserver16020-EventThread] 
procedure.ZKProcedureMemberRpcs: Found data for 
znode:/hbase/online-snapshot/acquired/offlineScans-20140923-100009
2014-09-23 10:00:16,118 DEBUG [regionserver16020-EventThread] 
snapshot.RegionServerSnapshotManager: Launching subprocedure for snapshot 
offlineScans-20140923-100009 from table offlineScans
2014-09-23 10:00:16,118 DEBUG [regionserver16020-EventThread] 
procedure.ProcedureMember: Submitting new 
Subprocedure:offlineScans-20140923-100009
2014-09-23 10:00:16,120 DEBUG [member: '############,16020,1410126483043' 
subprocedure-pool1-thread-314] procedure.Subprocedure: Starting subprocedure 
'offlineScans-20140923-100009' with timeout 60000ms
2014-09-23 10:00:16,120 DEBUG [member: '############,16020,1410126483043' 
subprocedure-pool1-thread-314] errorhandling.TimeoutExceptionInjector: 
Scheduling process timer to run in: 60000 ms
2014-09-23 10:00:16,121 DEBUG [member: '############,16020,1410126483043' 
subprocedure-pool1-thread-314] procedure.Subprocedure: Subprocedure 
'offlineScans-20140923-100009' starting 'acquire' stage
2014-09-23 10:00:16,121 DEBUG [member: '############,16020,1410126483043' 
subprocedure-pool1-thread-314] procedure.Subprocedure: Subprocedure 
'offlineScans-20140923-100009' locally acquired
2014-09-23 10:00:16,121 DEBUG [member: '############,16020,1410126483043' 
subprocedure-pool1-thread-314] procedure.ZKProcedureMemberRpcs: Member: 
'##########,16020,1410126483043' joining acquired barrier for procedure 
(offlineScans-20140923-100009) in zk
2014-09-23 10:00:16,123 DEBUG [member: '############,16020,1410126483043' 
subprocedure-pool1-thread-314] procedure.ZKProcedureMemberRpcs: Watch for 
global barrier 
reached:/hbase/online-snapshot/reached/offlineScans-20140923-100009
2014-09-23 10:00:16,124 DEBUG [member: '############,16020,1410126483043' 
subprocedure-pool1-thread-314] procedure.Subprocedure: Subprocedure 
'offlineScans-20140923-100009' coordinator notified of 'acquire', waiting on 
'reached' or 'abort' from coordinator
2014-09-23 10:00:16,133 INFO  [regionserver16020-EventThread] 
procedure.ZKProcedureMemberRpcs: Received created 
event:/hbase/online-snapshot/abort/offlineScans-20140923-100009
2014-09-23 10:00:16,134 DEBUG [regionserver16020-EventThread] 
procedure.ZKProcedureMemberRpcs: Aborting procedure member for znode 
/hbase/online-snapshot/abort/offlineScans-20140923-100009
2014-09-23 10:00:16,134 DEBUG [regionserver16020-EventThread] 
procedure.ProcedureMember: Request received to abort procedure 
offlineScans-20140923-100009
org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via 
###############,16020,1408633802567:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
 
        at 
org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:171)
        at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:320)
        at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeCreated(ZKProcedureMemberRpcs.java:94)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:327)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: 
org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
        at 
org.apache.hadoop.hbase.procedure.Subprocedure.cancel(Subprocedure.java:270)
        at 
org.apache.hadoop.hbase.procedure.ProcedureMember.submitSubprocedure(ProcedureMember.java:171)
        at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSubprocedure(ZKProcedureMemberRpcs.java:214)
        at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.waitForNewProcedures(ZKProcedureMemberRpcs.java:172)
        at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$100(ZKProcedureMemberRpcs.java:55)
        at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:107)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
        ... 2 more



>From the regionserver it was trying to contact


2014-09-23 10:00:16,124 INFO  [regionserver16020-EventThread] 
procedure.ZKProcedureMemberRpcs: Received procedure start children changed 
event: /hbase/online-snapshot/acquired
2014-09-23 10:00:16,124 DEBUG [regionserver16020-EventThread] 
procedure.ZKProcedureMemberRpcs: Looking for new procedures under 
znode:'/hbase/online-snapshot/acquired'
2014-09-23 10:00:16,125 DEBUG [regionserver16020-EventThread] 
procedure.ZKProcedureMemberRpcs: Found procedure znode: 
/hbase/online-snapshot/acquired/offlineScans-20140923-100009
2014-09-23 10:00:16,127 DEBUG [regionserver16020-EventThread] 
procedure.ZKProcedureMemberRpcs: start proc data length is 59
2014-09-23 10:00:16,127 DEBUG [regionserver16020-EventThread] 
procedure.ZKProcedureMemberRpcs: Found data for 
znode:/hbase/online-snapshot/acquired/offlineScans-20140923-100009
2014-09-23 10:00:16,127 DEBUG [regionserver16020-EventThread] 
snapshot.RegionServerSnapshotManager: Launching subprocedure for snapshot 
offlineScans-20140923-100009 from table offlineScans
2014-09-23 10:00:16,127 DEBUG [regionserver16020-EventThread] 
procedure.ProcedureMember: Submitting new 
Subprocedure:offlineScans-20140923-100009
2014-09-23 10:00:16,128 ERROR [regionserver16020-EventThread] 
procedure.Subprocedure: Subprocedure pool is full!
2014-09-23 10:00:16,128 DEBUG [regionserver16020-EventThread] 
errorhandling.ForeignExceptionDispatcher: offlineScans-20140923-100009 
accepting received exception
2014-09-23 10:00:16,128 DEBUG [regionserver16020-EventThread] 
procedure.ZKProcedureMemberRpcs: Aborting procedure 
(offlineScans-20140923-100009) in zk
2014-09-23 10:00:16,134 DEBUG [regionserver16020-EventThread] 
procedure.ZKProcedureMemberRpcs: Finished creating abort 
znode:/hbase/online-snapshot/abort/offlineScans-20140923-100009
2014-09-23 10:00:16,134 ERROR [regionserver16020-EventThread] 
procedure.ProcedureMember: Failed to start subprocedure 
'offlineScans-20140923-100009'
2014-09-23 10:00:16,134 INFO  [regionserver16020-EventThread] 
procedure.ZKProcedureMemberRpcs: Received created 
event:/hbase/online-snapshot/abort/offlineScans-20140923-100009
2014-09-23 10:00:16,134 DEBUG [regionserver16020-EventThread] 
procedure.ZKProcedureMemberRpcs: Aborting procedure member for znode 
/hbase/online-snapshot/abort/offlineScans-20140923-100009
2014-09-23 10:00:16,145 DEBUG [regionserver16020-EventThread] 
procedure.ProcedureMember: Request received to abort procedure 
offlineScans-20140923-100009
org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via 
###########,16020,1408633802567:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
        at 
org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:171)
        at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:320)
        at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeCreated(ZKProcedureMemberRpcs.java:94)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:327)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: 
org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
        at 
org.apache.hadoop.hbase.procedure.Subprocedure.cancel(Subprocedure.java:270)
        at 
org.apache.hadoop.hbase.procedure.ProcedureMember.submitSubprocedure(ProcedureMember.java:171)
        at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSubprocedure(ZKProcedureMemberRpcs.java:214)
        at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.waitForNewProcedures(ZKProcedureMemberRpcs.java:172)
        at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$100(ZKProcedureMemberRpcs.java:55)
        at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:107)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:348)
        ... 2 more

Any ideas on how I can resolve this?

-Ian

Reply via email to