After several hours I have figured out how to get the Disable command to
work and how to delete manually, but in the process there are 4 problems I
encountered that I think are areas that could be improved (or my
understanding improved).

1) The client timeout is used for the disable command which was my problem.
Does this totally make sense? Should a DML minded timeout be used for DDL
statements that we know can take a very long time normally with a large
cluster?

2) If the disable command fails the first time it does not "roll back". The
ONLY way to proceed is to enable and then try to disable again. The first
disable attempt is all that seems to work. Subsequent disable statements
usually work without errors but never seem to "work". The entire table
should be disabled after issuing this command or the entire table should
still be enabled. I was caught in this half disabled or mostly disabled
which was very frustating.

3) The biggest issue of all is why certain regions do not report back to the
disable command. What are the various states of a region that could cause
this? Compaction I know is one, what else could cause the disable command to
take too long? Shouldn't a disable force itself through and wait long enough
to be able to disable every region? Again a long wait time or a more
forceful operation would help.

4) Through all of the attempts to disable I saw regions coming and going and
nothing was consistent. The UI showed the table as disabled and listed 1
region in the table (there were 1000s). The node view listed several other
regions but not the same one as the table view. It was a very strange
situation. The UI to browse the tables and regions is great but it would be
even better if it gave a 100% view of regions and their current states. A
summary view of region counts per table based on state or status would be
fantastic. There is a compaction count, but what about in split, read/rite
lock, disabled, etc. What is the precise list of regions states that could
occur and show a summary count per state as well as detailed state for each
specific region in the list. Fundamentally this is the health monitor of the
system and as a dba I really need to know the 100% count of regions and
where they are all at in terms of availability. Are they disabled, blocked
for writes, blocked for reads, in compaction, etc. etc. If there are various
states that cause disabling to be blocked it can be reported here so that I
at least know when a disable command can be executed successfully (and this
should be documented).

Thanks

On Thu, Jan 20, 2011 at 9:01 PM, Wayne <[email protected]> wrote:

> I need to delete some tables and I am not sure the best way to do it. The
> shell does not work. The disable command says it runs ok but every time I
> run drop or truncate I get an exception that says the table is not
> disabled.  The UI shows it as disabled but truncate/drop still do not work.
> I have even tried to restart the cluster as sometimes that makes the disable
> "stick".
>
> What is the best way to delete a table manually? My assumption is that with
> 10k regions in 3 tables that I need to delete that the shell is not going to
> work. How can I do this without a completely fresh install of everything?
> How can the data/tables be removed manually without too much pain?
>
> Thanks.
>

Reply via email to