seems no very simple way to do this. not sure if close/unassign regions 
gradually via script before dropping can help a little.

the pain derives from current master assignment design which relies on ZK to 
track the assign/split progress/status, and for creating/dropping/restarting 
tables with very big number of regions the ZK can be overwhelmed by very heavy 
creation/update/deletion operations at almost the same time. 

I wonder this is a kind of abuse of ZK in that by design ZK is expected to 
store small amount of meta/config data with with sparse access, not to store 
such huge(if region number reach 20K-100K) amount of data/nodes with intensive 
access.

Why not store the assignment progress/status info in another system table, as 
META table, rather than in ZK?
________________________________________
发件人: Michael Webster [[email protected]]
发送时间: 2013年9月10日 7:36
收件人: [email protected]
主题: Dropping a very large table

Hello,

I have a very large HBase table running on 0.90, large meaning >20K regions
with a max region size of 1GB. This table is legacy and can be dropped, but
 we aren't sure what impact disabling/dropping that large of a table will
have on our cluster.

We are using dropAsync and polling HTable#isEnabled instead of the standard
shell disable command to avoid a timeout during disable like in
https://issues.apache.org/jira/browse/HBASE-3432.
Is there any risk to overwhelming zookeeper or the master with region
closed events during the disable, or would it be comparable to what happens
during a cluster restart when RS closes out regions?  Additionally, are
there any concerns with wiping out that much data in HDFS at once during
the drop?

Thank you in advance,
Michael
--

Reply via email to