I almost forgot: for 0.94.6.1 and newer releases, you can: 1. take a snapshot of the original table 2. export the snapshot to target cluster 3. clone the exported snapshot to a new table.
Cheers On Tue, May 7, 2013 at 4:11 PM, Ted Yu <[email protected]> wrote: > Currently the Import tool doesn't create the table on target cluster, if > we choose approach #2, Import tool should be enhanced with table creation > capability. > > Cheers > > > On Tue, May 7, 2013 at 4:02 PM, Jean-Marc Spaggiari < > [email protected]> wrote: > >> @Mohammad: The end goal is really more regarding the splits more than >> the model. So I don't think Lars' options are good for this usecase. >> @Mike: I agree that things were not configured correctly. User should >> have had split the table before doing the import. I like the idea of >> looking at the files to get the regions boundaries. That way you don't >> need to have the source_table still there... >> >> So we have 2 different things here. >> 1) a command on the shell to duplicate a table structure >> 2) an option on the import command to split the table regions based on >> the files names. >> >> If we agree on that I will open one JIRA for each... >> >> JM >> >> 2013/5/7 Michael Segel <[email protected]>: >> > Silly question... >> > >> > If you're doing a simple export, then you end up with all of your prior >> regions as separate files in a directory, right? >> > >> > So in theory, you could find the first row and the last complete row of >> each file and then do your pre-splits based on the start key and end key >> that you find. >> > >> > That would be your tool so to speak. >> > >> > But to the point that reading back in these files will cause you to >> crash your RS and HBase? >> > That doesn't sound like its well tuned or right. >> > >> > HTH >> > -Mike >> > >> > On May 7, 2013, at 5:29 PM, Ted Yu <[email protected]> wrote: >> > >> >> I am not aware of a tool which can pre-split table using another >> table's >> >> region boundaries as template. >> >> >> >> Such a tool would be nice to have. >> >> >> >> Cheers >> >> >> >> On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari < >> [email protected] >> >>> wrote: >> >> >> >>> Hi, >> >>> >> >>> When we are doing an export, we are only exporting the data. Then when >> >>> we are importing that back, we need to make sure the table is >> >>> pre-splitted correctly else we might hotspot some servers. >> >>> >> >>> If you simply export then import without pre-splitting at all, you >> >>> will most probably brought some servers down because they will be >> >>> overwhelmed with splits and compactions. >> >>> >> >>> Do we have any tool to pre-split a table the same way another table is >> >>> already pre-splitted? >> >>> >> >>> Something like >> >>>> duplicate 'source_table', 'target_table' >> >>> >> >>> Which will create a new table called 'target_table' with exactly the >> >>> same parameters as 'source_table' and the same regions boundaries? >> >>> >> >>> If we don't have, will it be useful to have one? >> >>> >> >>> Or event something like: >> >>>> create 'target_table', 'f1', {SPLITS_MODEL => 'source_table'} >> >>> >> >>> >> >>> JM >> >>> >> > >> > >
