Ah!! I always forget to check the region server log: java.io.IOException: Compression algorithm 'lzo' previously failed test. at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:77) at org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:2555) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2544) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2532) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:262) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:94) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619)
Our upgrade script unpacked the LZO libs in the wrong place. I put them back where they should have been and the problem resolved itself. Thanks J-D! On Tue, Apr 12, 2011 at 6:38 PM, Jean-Daniel Cryans <[email protected]>wrote: > Could you upgrade to the newly released CDH3 instead? It has a few more > fixes. > > So regarding your issue, I don't see regions stuck. The first one did > timeout on opening but then it was reassigned (and then I can't see > anything in the log that says it timed out again). > > By the way can you check what the region server was doing instead of > opening it? Maybe it just has too many to open and it took some time > to get it opened? I've seen that on our clusters but it eventually > gets ok. > > J-D > > On Tue, Apr 12, 2011 at 3:23 PM, George P. Stathis <[email protected]> > wrote: > > In the middle of upgrading our dev environment from 0.89 to 0.90.2CDH3B4. > > When we did the upgrade locally (Macs), no issues came up. Different > story > > on our EC2 dev box it seems. > > > > Background: > > - dev is running in pseudo-cluster mode > > - we neglected to set replication to 1 from 2 the first time we started > it > > but we shut it off and fixed that setting > > > > It seems now that some regions are perpetually stuck in transition mode: > > https://gist.github.com/916562 > > > > Looked at https://issues.apache.org/jira/browse/HBASE-3406 and > > https://issues.apache.org/jira/browse/HBASE-3637 trying to find > similarities > > but I'm not sure it's quite the same issue. > > > > hbase hbck -fix does not seem to rectify the problem. Here is its output: > > https://gist.github.com/916567 > > > > Any pointers are appreciated. Happy to give more info. > > > > -GS > > >
