After several hours I have figured out how to get the Disable command to work and how to delete manually, but in the process there are 4 problems I encountered that I think are areas that could be improved (or my understanding improved).
1) The client timeout is used for the disable command which was my problem. Does this totally make sense? Should a DML minded timeout be used for DDL statements that we know can take a very long time normally with a large cluster? 2) If the disable command fails the first time it does not "roll back". The ONLY way to proceed is to enable and then try to disable again. The first disable attempt is all that seems to work. Subsequent disable statements usually work without errors but never seem to "work". The entire table should be disabled after issuing this command or the entire table should still be enabled. I was caught in this half disabled or mostly disabled which was very frustating. 3) The biggest issue of all is why certain regions do not report back to the disable command. What are the various states of a region that could cause this? Compaction I know is one, what else could cause the disable command to take too long? Shouldn't a disable force itself through and wait long enough to be able to disable every region? Again a long wait time or a more forceful operation would help. 4) Through all of the attempts to disable I saw regions coming and going and nothing was consistent. The UI showed the table as disabled and listed 1 region in the table (there were 1000s). The node view listed several other regions but not the same one as the table view. It was a very strange situation. The UI to browse the tables and regions is great but it would be even better if it gave a 100% view of regions and their current states. A summary view of region counts per table based on state or status would be fantastic. There is a compaction count, but what about in split, read/rite lock, disabled, etc. What is the precise list of regions states that could occur and show a summary count per state as well as detailed state for each specific region in the list. Fundamentally this is the health monitor of the system and as a dba I really need to know the 100% count of regions and where they are all at in terms of availability. Are they disabled, blocked for writes, blocked for reads, in compaction, etc. etc. If there are various states that cause disabling to be blocked it can be reported here so that I at least know when a disable command can be executed successfully (and this should be documented). Thanks On Thu, Jan 20, 2011 at 9:01 PM, Wayne <[email protected]> wrote: > I need to delete some tables and I am not sure the best way to do it. The > shell does not work. The disable command says it runs ok but every time I > run drop or truncate I get an exception that says the table is not > disabled. The UI shows it as disabled but truncate/drop still do not work. > I have even tried to restart the cluster as sometimes that makes the disable > "stick". > > What is the best way to delete a table manually? My assumption is that with > 10k regions in 3 tables that I need to delete that the shell is not going to > work. How can I do this without a completely fresh install of everything? > How can the data/tables be removed manually without too much pain? > > Thanks. >
