On Sat, Jan 22, 2011 at 11:25 AM, Ted Dunning <[email protected]> wrote: > Is it necessary to checkpoint ZK when checkpointing an hbase cluster? >
I'd think that on restart Ted, it should be ok. We clear zk state on clean restart of cluster. St.Ack > On Sat, Jan 22, 2011 at 10:27 AM, Bill Graham <[email protected]> wrote: > >> Hi, >> >> Last night while experimenting with getting lzo set up I managed to >> somehow lose all .META. data and all my tables. My regions still exist >> in HDFS, but the shell tells me I have no tables. At this point I'm >> pretty sure I need to reinstall HBase clean-slate on HDFS, hence >> losing all data, but I'm sharing my story in case there are JIRAs to >> be created or lessons to be learned. >> >> Specifics: >> - 4 Node cluster running 0.90.0.rc1 >> - 1 table of a few GBs and 24 regions, let's call it TableA >> - CDH3b2 >> >> 1. Just for kicks I decided to issue an alter table command to change >> COMPRESSION to 'lzo' for TableA to see what would happen. I hadn't yet >> taken any steps to install the native lzo libs in HBase (they exist in >> HDFS), so this was probably a stupid thing to do. After issuing the >> command I wasn't able to re-enable the table, nor could I fully >> disable it. I was in a state somewhere in between the two, as >> described in a thread earlier this week. The shell said enabled, the >> master.jsp said disabled. Calls to do either would time out. The >> master server was logging the same exceptions as in HBASE-3406 ad >> infinitum. hbck -fix wasn't doing anything. After bouncing the entire >> cluster a few times (master, RSs, zookeepers), I was able to finally >> get back to normal state, with COMPRESSION set to 'none' with hbck >> -fix. >> >> Besides HBASE-3406, maybe there's another JIRA here where the shell >> permits setting COMPRESSION => 'lzo' when lzo isn't set up and leaves >> the table in a nasty state. >> >> At this point I should have been grateful and called in a night, but >> noooooo... Instead I shut down the cluster again and symlinked >> lib/native to the same dir in my hadoop home, which is lzo-enabled and >> I restarted the cluster. All seemed ok. >> >> 2. At this point I decided to experiment with a new table after >> reading http://wiki.apache.org/hadoop/UsingLzoCompression more >> closely. After creating 'mytable' with lzo enabled, I saw similar >> behavior as I did in 1. so I used the same techniques to just try to >> just drop the table. After bouncing the cluster and issuing a hbck >> -fix, the shell reported that HBase had no tables at all. It seemed >> like all the .META. data was wiped out but I still had all of my >> orphaned regions in HDFS. This was very bad. >> >> It was clear that these tables weren't coming back so in a last ditch >> effort I stopped the HBase cluster, the SNN and the NN and I restored >> HDFS from the checkpoint taken about an hour before. Now everything >> was out of whack and HBase wouldn't even come up and -ROOT- couldn't >> be located, .log/ files weren't being read properly and things were a >> mess. >> >> One could make the argument that I was beating on HBase a bit and >> maybe even trying to break things, but it didn't take a lot of effort >> to get to a pretty dire state. >> >> thanks, >> BIll >> >
