Keep in mind that if the writes to this table are not paused, there would
be some data coming in between steps #1 and #2 which would not be in the
snapshot.

Cheers

On Mon, Feb 15, 2016 at 6:21 PM, Anil Gupta <[email protected]> wrote:

> I dont think there is any atomic operations in hbase to support ddl across
> 2 tables.
>
> But, maybe you can use hbase snapshots.
> 1.Create a hbase snapshot.
> 2.Truncate the table.
> 3.Write data to the table.
> 4.Create a table from snapshot taken in step #1 as table_old.
>
> Now you have two tables. One with current run data and other with last run
> data.
> I think above process will suffice. But, keep in mind that it is not
> atomic.
>
> HTH,
> Anil
> Sent from my iPhone
>
> > On Feb 15, 2016, at 4:25 PM, Pat Ferrel <[email protected]> wrote:
> >
> > Any other way to do what I was asking. With Spark this is a very normal
> thing to treat a table as immutable and create another to replace the old.
> >
> > Can you lock two tables and rename them in 2 actions then unlock in a
> very short period of time?
> >
> > Or an alias for table names?
> >
> > Didn’t see these in any docs or Googling, any help is appreciated.
> Writing all this data back to the original table would be a huge load on a
> table being written to by external processes and therefore under large load
> to begin with.
> >
> >> On Feb 14, 2016, at 5:03 PM, Ted Yu <[email protected]> wrote:
> >>
> >> There is currently no native support for renaming two tables in one
> atomic
> >> action.
> >>
> >> FYI
> >>
> >>> On Sun, Feb 14, 2016 at 4:18 PM, Pat Ferrel <[email protected]>
> wrote:
> >>>
> >>> I use Spark to take an old table, clean it up to create an RDD of
> cleaned
> >>> data. What I’d like to do is write all of the data to a new table in
> HBase,
> >>> then rename the table to the old name. If possible it could be done by
> >>> changing an alias to point to the new table as long as all external
> code
> >>> uses the alias, or by a 2 table rename operation. But I don’t see how
> to do
> >>> this for HBase. I am dealing with a lot of data so don’t want to do
> table
> >>> modifications with deletes and upserts, this would be incredibly slow.
> >>> Furthermore I don’t want to disable the table for more than a tiny
> span of
> >>> time.
> >>>
> >>> Is it possible to have 2 tables and rename both in an atomic action, or
> >>> change some alias to point to the new table in an atomic action. If not
> >>> what is the quickest way to achieve this to minimize time disabled.
> >
>

Reply via email to