Please see http://hbase.apache.org/book.html#ops.snapshots for background on snapshots.
In Anil's description, table_old is the result of cloning the snapshot which is taken in step #1. See http://hbase.apache.org/book.html#ops.snapshots.clone Cheers On Tue, Feb 16, 2016 at 6:35 AM, Pat Ferrel <[email protected]> wrote: > I think I can work out the algorithm if I knew precisely what a “snapshot" > does. From my reading it seems to be a lightweight fast alias (for lack of > a better word) since it creates something that refers to the same physical > data.So if I create a new table with cleaned data, call it table_new. Then > I drop table_old and “snapshot” table_new into table_old? Is this what is > suggested? > > This leaves me with a small time where there is no table_old, which is the > time between dropping table_old and creating a snapshot. Is it feasible to > lock the DB for this time? > > > On Feb 15, 2016, at 7:13 PM, Ted Yu <[email protected]> wrote: > > > > Keep in mind that if the writes to this table are not paused, there would > > be some data coming in between steps #1 and #2 which would not be in the > > snapshot. > > > > Cheers > > > > On Mon, Feb 15, 2016 at 6:21 PM, Anil Gupta <[email protected]> > wrote: > > > >> I dont think there is any atomic operations in hbase to support ddl > across > >> 2 tables. > >> > >> But, maybe you can use hbase snapshots. > >> 1.Create a hbase snapshot. > >> 2.Truncate the table. > >> 3.Write data to the table. > >> 4.Create a table from snapshot taken in step #1 as table_old. > >> > >> Now you have two tables. One with current run data and other with last > run > >> data. > >> I think above process will suffice. But, keep in mind that it is not > >> atomic. > >> > >> HTH, > >> Anil > >> Sent from my iPhone > >> > >>> On Feb 15, 2016, at 4:25 PM, Pat Ferrel <[email protected]> wrote: > >>> > >>> Any other way to do what I was asking. With Spark this is a very normal > >> thing to treat a table as immutable and create another to replace the > old. > >>> > >>> Can you lock two tables and rename them in 2 actions then unlock in a > >> very short period of time? > >>> > >>> Or an alias for table names? > >>> > >>> Didn’t see these in any docs or Googling, any help is appreciated. > >> Writing all this data back to the original table would be a huge load > on a > >> table being written to by external processes and therefore under large > load > >> to begin with. > >>> > >>>> On Feb 14, 2016, at 5:03 PM, Ted Yu <[email protected]> wrote: > >>>> > >>>> There is currently no native support for renaming two tables in one > >> atomic > >>>> action. > >>>> > >>>> FYI > >>>> > >>>>> On Sun, Feb 14, 2016 at 4:18 PM, Pat Ferrel <[email protected]> > >> wrote: > >>>>> > >>>>> I use Spark to take an old table, clean it up to create an RDD of > >> cleaned > >>>>> data. What I’d like to do is write all of the data to a new table in > >> HBase, > >>>>> then rename the table to the old name. If possible it could be done > by > >>>>> changing an alias to point to the new table as long as all external > >> code > >>>>> uses the alias, or by a 2 table rename operation. But I don’t see how > >> to do > >>>>> this for HBase. I am dealing with a lot of data so don’t want to do > >> table > >>>>> modifications with deletes and upserts, this would be incredibly > slow. > >>>>> Furthermore I don’t want to disable the table for more than a tiny > >> span of > >>>>> time. > >>>>> > >>>>> Is it possible to have 2 tables and rename both in an atomic action, > or > >>>>> change some alias to point to the new table in an atomic action. If > not > >>>>> what is the quickest way to achieve this to minimize time disabled. > >>> > >> > >
