I was thinking about this and have a couple thoughts... While Stack's solution above would work, it means a couple things: 1. if you haven't saved splits, your going to have to figure out how to pre-split for a full restore. 2. you have to wait for the data re-sort at recovery time instead of backup time so recovery time will be substantially longer.
It seems like we should make a new script like export that automatically exports the data as bulk importable along with all of the table's schema and split information. We then could make an import script that simply creates the backed up table (to potentially a different target name) and then bulk imports it, pre-splitting using the splits defined on export. (We actually did something like this recently to migrate data from one format to another.) It wouldn't work for the case where you are trying to do a merged restore (e.g. pre-existing table) but it seems like recovery would be really quick. I suppose you could allow it to support importing into an existing table but then you may have to wait for splits on a bunch of the files (I know the bulk import script is designed to do this but i'm not sure how it would handle a large amount of splits if your target table has diverged substantially from when the backup was done). Jacques On Mon, Feb 20, 2012 at 9:19 PM, Stack <[email protected]> wrote: > On Mon, Feb 20, 2012 at 1:58 PM, Paul Mackles <[email protected]> wrote: > > Actually, an hbase export to "bulk load" facility sounds like a great > idea. We have been using bulk loads to migrate data from an older data > store and they have worked awesome for us. It also doesn't seem like it > would be that hard to implement. So what am I missing? > > > > Little? > > Check out the Import.java in mapreduce package. See how its pulling > from SequenceFiles into a map that outputs to a TableOutputFormat > inside in the map. Make a new MR job that has same input but that > outputs to HFileOutputFormat instead (you'll need the total order > partitioner and a reducer in the mix which Import doesn't have). > > St.Ack >
