Hi all,

I've been running into these issues after restoring from snapshots:

https://issues.apache.org/jira/browse/HBASE-16464
https://issues.apache.org/jira/browse/HBASE-17992

Essentially, HRegion#addRegionToSnapshot has been timing out in 
TakeSnapshotHandler, resulting in some leftover tmp files. The leftover tmp 
files causes archivedHFileCleaner, which manifests in an extremely large 
archive folder that doesn't get cleaned up.

HBASE-16464 solves the bloating archive folder by preventing the 
SnapshotRegionManifest from being written if the operation has timed out (see: 
https://github.com/apache/hbase/commit/ab011391ab392f1a62b6ea9bdca87fc950af42a9#diff-4ec74c1b12f2be4f52c33260fd8b73efR86)

My question is: is it safe to ignore these TimeoutExceptions? if the 
SnapshotRegionManifests are not being written due to a timeout does that mean 
we are losing data or getting inconsistencies?

If so, what are some potential remedies for this? I'm thinking we can just 
increase the timeout 'hbase.snapshot.master.timeout.millis' but is there a 
better way?

Thanks

Reply via email to