I'll go with the snapshots since you can avoid all the I/O of the import/export but the consistency model is different, and you don't have the start/end time option... you should delete the rows < tstart and > tend after the clone
Matteo On Tue, May 14, 2013 at 1:48 AM, Jean-Marc Spaggiari < [email protected]> wrote: > Hi Jeremy, > > Thanks for sharing this. > > I will take a look at it, and also most probably give a try to the snapshot > option.... > > JM > > 2013/5/7 Jeremy Carroll <[email protected]> > > > > > > https://github.com/phobos182/hadoop-hbase-tools/blob/master/hbase/copy_table.rb > > > > I wrote a quick script to do it with mechanize + ruby. I have a new tool > > which I'm polishing up that does the same thing in Python but using the > > HBase REST interface to get the data. > > > > > > On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari < > > [email protected] > > > wrote: > > > > > Hi, > > > > > > When we are doing an export, we are only exporting the data. Then when > > > we are importing that back, we need to make sure the table is > > > pre-splitted correctly else we might hotspot some servers. > > > > > > If you simply export then import without pre-splitting at all, you > > > will most probably brought some servers down because they will be > > > overwhelmed with splits and compactions. > > > > > > Do we have any tool to pre-split a table the same way another table is > > > already pre-splitted? > > > > > > Something like > > > > duplicate 'source_table', 'target_table' > > > > > > Which will create a new table called 'target_table' with exactly the > > > same parameters as 'source_table' and the same regions boundaries? > > > > > > If we don't have, will it be useful to have one? > > > > > > Or event something like: > > > > create 'target_table', 'f1', {SPLITS_MODEL => 'source_table'} > > > > > > > > > JM > > > > > >
