I enthusiastically created a ticket: https://issues.apache.org/jira/browse/HBASE-3463
This might be a dumb question I should already know the answer to...but when is .90 coming out and what is its current state? Isn't there an RC out? We are on 0.89.20100924 and thought that was the latest for us to work off of... As always thanks for the detailed responses. FYI: I ended up reformatting as I could drop all tables and get phantom regions to go away after several restarts but the .META. table was still stuck reporting as 150MB with no tables and after issuing major_compact (which never seemed to have any affect)... Thanks. On Fri, Jan 21, 2011 at 1:47 PM, Stack <[email protected]> wrote: > On Fri, Jan 21, 2011 at 4:51 AM, Wayne <[email protected]> wrote: > > After several hours I have figured out how to get the Disable command to > > work and how to delete manually, but in the process there are 4 problems > I > > encountered that I think are areas that could be improved (or my > > understanding improved). > > > > 1) The client timeout is used for the disable command which was my > problem. > > Does this totally make sense? Should a DML minded timeout be used for DDL > > statements that we know can take a very long time normally with a large > > cluster? > > > > Sorry Wayne. I meant to respond yesterday to your original query. > > Enable/Disable has been redone in 0.90. Now there are added > enabling/disabling states that are maintained up in zk and in shell > there are commands is_enabled and is_disabled. We still have the same > (DML) timeout (sortof -- see below for more) but at least now if it > times out, you are not hosed. The disable or enable process is still > running and you can query its state. There is also notion of async > enable/disable though this latter facility is not exposed in shell, > only in the HBaseAdmin API. > > > > 2) If the disable command fails the first time it does not "roll back". > The > > ONLY way to proceed is to enable and then try to disable again. The first > > disable attempt is all that seems to work. Subsequent disable statements > > usually work without errors but never seem to "work". The entire table > > should be disabled after issuing this command or the entire table should > > still be enabled. I was caught in this half disabled or mostly disabled > > which was very frustating. > > > > Sorry about that. Should be better in 0.90.0. > > Things should run a bit faster in 0.90.0 too because disable used to > include an update of .META. per region plus a close of all regions > that make up the table. In 0.90.0 there is no longer the .META. > update and close is more prompt now; in the past close would wait on > any running compactions to complete before proceeding. In 0.90.0 > we'll no interrupt the running compaction so close happens the sooner. > > There is room for a bunch more improvement. For example, deleting a > table, there should be short-circuit that punts on flush of in-memory > state and clean-close of open regions. > > > 3) The biggest issue of all is why certain regions do not report back to > the > > disable command. What are the various states of a region that could cause > > this? Compaction I know is one, what else could cause the disable command > to > > take too long? Shouldn't a disable force itself through and wait long > enough > > to be able to disable every region? Again a long wait time or a more > > forceful operation would help. > > > > It wasn't that smart in 0.20/0.89. Its still pretty dumb but better in > 0.90.0. > > Master process runs the enable/disable process in both old and new > HBase. In 0.20/0.89, it was a sync process w/ master waiting on > regions to flip to 'offline' after successful close. The state of > disabledness was when all regions in table had 'offline' state. Any > hiccup, a problem closing or a failure to update .META. w/ offline per > region would bork the disabling process. It was super fragile. We > tried to talk it up as so. > > In 0.90, client queues in master an executor that flips table to > disabling in zk and then in parallel sends out unassigns of all table > regions. The executor then hangs around with a more DDL-like timeout > of hbase.bulk.assignment.waiton.empty.rit (10minutes by default). > Meantime clients can check state of the disable. After all unassigns > complete, the table is flipped to disabled. > > > > 4) Through all of the attempts to disable I saw regions coming and going > and > > nothing was consistent. The UI showed the table as disabled and listed 1 > > region in the table (there were 1000s). The node view listed several > other > > regions but not the same one as the table view. It was a very strange > > situation. The UI to browse the tables and regions is great but it would > be > > even better if it gave a 100% view of regions and their current states. A > > summary view of region counts per table based on state or status would be > > fantastic. > > Please file a JIRA. Sounds like good idea. We could hoist stuff up > out of hbck tool up into UI. > > > > There is a compaction count, but what about in split, read/rite > > lock, disabled, etc. What is the precise list of regions states that > could > > occur and show a summary count per state as well as detailed state for > each > > specific region in the list. Fundamentally this is the health monitor of > the > > system and as a dba I really need to know the 100% count of regions and > > where they are all at in terms of availability. Are they disabled, > blocked > > for writes, blocked for reads, in compaction, etc. etc. If there are > various > > states that cause disabling to be blocked it can be reported here so that > I > > at least know when a disable command can be executed successfully (and > this > > should be documented). > > > > > Please file a JIRA. This is great stuff. > > Sorry for pain caused messing w/ broke enable/disable. It should be > better in 0.90 and easier to fix if bugs. > > St.Ack > > > > Thanks > > > > On Thu, Jan 20, 2011 at 9:01 PM, Wayne <[email protected]> wrote: > > > >> I need to delete some tables and I am not sure the best way to do it. > The > >> shell does not work. The disable command says it runs ok but every time > I > >> run drop or truncate I get an exception that says the table is not > >> disabled. The UI shows it as disabled but truncate/drop still do not > work. > >> I have even tried to restart the cluster as sometimes that makes the > disable > >> "stick". > >> > >> What is the best way to delete a table manually? My assumption is that > with > >> 10k regions in 3 tables that I need to delete that the shell is not > going to > >> work. How can I do this without a completely fresh install of > everything? > >> How can the data/tables be removed manually without too much pain? > >> > >> Thanks. > >> > > >
