seems no very simple way to do this. not sure if close/unassign regions gradually via script before dropping can help a little.
the pain derives from current master assignment design which relies on ZK to track the assign/split progress/status, and for creating/dropping/restarting tables with very big number of regions the ZK can be overwhelmed by very heavy creation/update/deletion operations at almost the same time. I wonder this is a kind of abuse of ZK in that by design ZK is expected to store small amount of meta/config data with with sparse access, not to store such huge(if region number reach 20K-100K) amount of data/nodes with intensive access. Why not store the assignment progress/status info in another system table, as META table, rather than in ZK? ________________________________________ 发件人: Michael Webster [[email protected]] 发送时间: 2013年9月10日 7:36 收件人: [email protected] 主题: Dropping a very large table Hello, I have a very large HBase table running on 0.90, large meaning >20K regions with a max region size of 1GB. This table is legacy and can be dropped, but we aren't sure what impact disabling/dropping that large of a table will have on our cluster. We are using dropAsync and polling HTable#isEnabled instead of the standard shell disable command to avoid a timeout during disable like in https://issues.apache.org/jira/browse/HBASE-3432. Is there any risk to overwhelming zookeeper or the master with region closed events during the disable, or would it be comparable to what happens during a cluster restart when RS closes out regions? Additionally, are there any concerns with wiping out that much data in HDFS at once during the drop? Thank you in advance, Michael --
