We've been working on a Backup / Restore program Hbacker (https://github.com/rberger/hbacker) that uses the HBase Hadoop Export/Import jobs.
Its pretty tuned to our use case (we just do appends, never delete, this allows us to do very simple incremental backups). Its still really rough and only appropriate for folks who might want to help us make it work. (Yeah, I've been talking about this for a long time, but finally got some help to get it finished) The Export phase will write the tables to s3n which we thought is HBase version independent format. It also stores each Table's Column Description in a Mysql Db. The Import phase uses the Table Column Description stored in MySQL to create the new table on the destination HBase Cluster. One of our goals is to be able to do backups of our now ancient production HBase Cluster running 0.20.3 and use that backup to populate a shiny new 0.90 cluster. The Export is run on the 0.20.3 cluster saves to S3. Then the import is run on the 0.90 HBase Cluster to import from the S3 files to HBase. But in recent testing we had a problem where the import on 0.90 (CDH3 in Psuedo Distributed mode for testing) where the Import gets thru 50% or so and then we get the following error on the CDH3 machine (Error log also at https://gist.github.com/1189897). Do we need to make some changes to the HBase 0.20.3 Column Description before we use them to create the new table on 0.90? Or is it something completely different? 2011-09-02 19:47:43,605 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for furtive_production_frylock_merchant_consumer_summary_e52c36e1-7851-08c1-bbdf-3fc2a84a1cb6,,1314992665740.8307caeae1de46237c1b85f80a92a0d0., current region memstore size 64.9m 2011-09-02 19:47:43,606 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting, commencing flushing stores 2011-09-02 19:47:43,907 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server serverName=10.2.119.218,60020,1314992502715, load=(requests=11461, regions=3, usedHeap=86, maxHeap=998): Replay of HLog required. Forcing server shutdown org.apache.hadoop.hbase.DroppedSnapshotException: region: furtive_production_frylock_merchant_consumer_summary_e52c36e1-7851-08c1-bbdf-3fc2a84a1cb6,,1314992665740.8307caeae1de46237c1b85f80a92a0d0. at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:995) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:900) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:852) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:392) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:366) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:240) Caused by: java.lang.IllegalArgumentException: No enum const class org.apache.hadoop.hbase.regionserver.StoreFile$BloomType.0 at java.lang.Enum.valueOf(Enum.java:196) at org.apache.hadoop.hbase.regionserver.StoreFile$BloomType.valueOf(StoreFile.java:90) __________________ Robert J Berger - CTO Runa Inc. +1 408-838-8896 http://blog.ibd.com
