It seems there is a typo: "we'll no interrupt the running compaction" should be "we'll now interrupt the running compaction"
On Fri, Jan 21, 2011 at 10:47 AM, Stack <[email protected]> wrote: > On Fri, Jan 21, 2011 at 4:51 AM, Wayne <[email protected]> wrote: > > After several hours I have figured out how to get the Disable command to > > work and how to delete manually, but in the process there are 4 problems > I > > encountered that I think are areas that could be improved (or my > > understanding improved). > > > > 1) The client timeout is used for the disable command which was my > problem. > > Does this totally make sense? Should a DML minded timeout be used for DDL > > statements that we know can take a very long time normally with a large > > cluster? > > > > Sorry Wayne. I meant to respond yesterday to your original query. > > Enable/Disable has been redone in 0.90. Now there are added > enabling/disabling states that are maintained up in zk and in shell > there are commands is_enabled and is_disabled. We still have the same > (DML) timeout (sortof -- see below for more) but at least now if it > times out, you are not hosed. The disable or enable process is still > running and you can query its state. There is also notion of async > enable/disable though this latter facility is not exposed in shell, > only in the HBaseAdmin API. > > > > 2) If the disable command fails the first time it does not "roll back". > The > > ONLY way to proceed is to enable and then try to disable again. The first > > disable attempt is all that seems to work. Subsequent disable statements > > usually work without errors but never seem to "work". The entire table > > should be disabled after issuing this command or the entire table should > > still be enabled. I was caught in this half disabled or mostly disabled > > which was very frustating. > > > > Sorry about that. Should be better in 0.90.0. > > Things should run a bit faster in 0.90.0 too because disable used to > include an update of .META. per region plus a close of all regions > that make up the table. In 0.90.0 there is no longer the .META. > update and close is more prompt now; in the past close would wait on > any running compactions to complete before proceeding. In 0.90.0 > we'll no interrupt the running compaction so close happens the sooner. > > There is room for a bunch more improvement. For example, deleting a > table, there should be short-circuit that punts on flush of in-memory > state and clean-close of open regions. > > > 3) The biggest issue of all is why certain regions do not report back to > the > > disable command. What are the various states of a region that could cause > > this? Compaction I know is one, what else could cause the disable command > to > > take too long? Shouldn't a disable force itself through and wait long > enough > > to be able to disable every region? Again a long wait time or a more > > forceful operation would help. > > > > It wasn't that smart in 0.20/0.89. Its still pretty dumb but better in > 0.90.0. > > Master process runs the enable/disable process in both old and new > HBase. In 0.20/0.89, it was a sync process w/ master waiting on > regions to flip to 'offline' after successful close. The state of > disabledness was when all regions in table had 'offline' state. Any > hiccup, a problem closing or a failure to update .META. w/ offline per > region would bork the disabling process. It was super fragile. We > tried to talk it up as so. > > In 0.90, client queues in master an executor that flips table to > disabling in zk and then in parallel sends out unassigns of all table > regions. The executor then hangs around with a more DDL-like timeout > of hbase.bulk.assignment.waiton.empty.rit (10minutes by default). > Meantime clients can check state of the disable. After all unassigns > complete, the table is flipped to disabled. > > > > 4) Through all of the attempts to disable I saw regions coming and going > and > > nothing was consistent. The UI showed the table as disabled and listed 1 > > region in the table (there were 1000s). The node view listed several > other > > regions but not the same one as the table view. It was a very strange > > situation. The UI to browse the tables and regions is great but it would > be > > even better if it gave a 100% view of regions and their current states. A > > summary view of region counts per table based on state or status would be > > fantastic. > > Please file a JIRA. Sounds like good idea. We could hoist stuff up > out of hbck tool up into UI. > > > > There is a compaction count, but what about in split, read/rite > > lock, disabled, etc. What is the precise list of regions states that > could > > occur and show a summary count per state as well as detailed state for > each > > specific region in the list. Fundamentally this is the health monitor of > the > > system and as a dba I really need to know the 100% count of regions and > > where they are all at in terms of availability. Are they disabled, > blocked > > for writes, blocked for reads, in compaction, etc. etc. If there are > various > > states that cause disabling to be blocked it can be reported here so that > I > > at least know when a disable command can be executed successfully (and > this > > should be documented). > > > > > Please file a JIRA. This is great stuff. > > Sorry for pain caused messing w/ broke enable/disable. It should be > better in 0.90 and easier to fix if bugs. > > St.Ack > > > > Thanks > > > > On Thu, Jan 20, 2011 at 9:01 PM, Wayne <[email protected]> wrote: > > > >> I need to delete some tables and I am not sure the best way to do it. > The > >> shell does not work. The disable command says it runs ok but every time > I > >> run drop or truncate I get an exception that says the table is not > >> disabled. The UI shows it as disabled but truncate/drop still do not > work. > >> I have even tried to restart the cluster as sometimes that makes the > disable > >> "stick". > >> > >> What is the best way to delete a table manually? My assumption is that > with > >> 10k regions in 3 tables that I need to delete that the shell is not > going to > >> work. How can I do this without a completely fresh install of > everything? > >> How can the data/tables be removed manually without too much pain? > >> > >> Thanks. > >> > > >
