Keep in mind that if the writes to this table are not paused, there would be some data coming in between steps #1 and #2 which would not be in the snapshot.
Cheers On Mon, Feb 15, 2016 at 6:21 PM, Anil Gupta <[email protected]> wrote: > I dont think there is any atomic operations in hbase to support ddl across > 2 tables. > > But, maybe you can use hbase snapshots. > 1.Create a hbase snapshot. > 2.Truncate the table. > 3.Write data to the table. > 4.Create a table from snapshot taken in step #1 as table_old. > > Now you have two tables. One with current run data and other with last run > data. > I think above process will suffice. But, keep in mind that it is not > atomic. > > HTH, > Anil > Sent from my iPhone > > > On Feb 15, 2016, at 4:25 PM, Pat Ferrel <[email protected]> wrote: > > > > Any other way to do what I was asking. With Spark this is a very normal > thing to treat a table as immutable and create another to replace the old. > > > > Can you lock two tables and rename them in 2 actions then unlock in a > very short period of time? > > > > Or an alias for table names? > > > > Didn’t see these in any docs or Googling, any help is appreciated. > Writing all this data back to the original table would be a huge load on a > table being written to by external processes and therefore under large load > to begin with. > > > >> On Feb 14, 2016, at 5:03 PM, Ted Yu <[email protected]> wrote: > >> > >> There is currently no native support for renaming two tables in one > atomic > >> action. > >> > >> FYI > >> > >>> On Sun, Feb 14, 2016 at 4:18 PM, Pat Ferrel <[email protected]> > wrote: > >>> > >>> I use Spark to take an old table, clean it up to create an RDD of > cleaned > >>> data. What I’d like to do is write all of the data to a new table in > HBase, > >>> then rename the table to the old name. If possible it could be done by > >>> changing an alias to point to the new table as long as all external > code > >>> uses the alias, or by a 2 table rename operation. But I don’t see how > to do > >>> this for HBase. I am dealing with a lot of data so don’t want to do > table > >>> modifications with deletes and upserts, this would be incredibly slow. > >>> Furthermore I don’t want to disable the table for more than a tiny > span of > >>> time. > >>> > >>> Is it possible to have 2 tables and rename both in an atomic action, or > >>> change some alias to point to the new table in an atomic action. If not > >>> what is the quickest way to achieve this to minimize time disabled. > > >
