I think I can work out the algorithm if I knew precisely what a “snapshot" 
does. From my reading it seems to be a lightweight fast alias (for lack of a 
better word) since it creates something that refers to the same physical 
data.So if I create a new table with cleaned data, call it table_new. Then I 
drop table_old and “snapshot” table_new into table_old? Is this what is 
suggested?

This leaves me with a small time where there is no table_old, which is the time 
between dropping table_old and creating a snapshot. Is it feasible to lock the 
DB for this time?

> On Feb 15, 2016, at 7:13 PM, Ted Yu <[email protected]> wrote:
> 
> Keep in mind that if the writes to this table are not paused, there would
> be some data coming in between steps #1 and #2 which would not be in the
> snapshot.
> 
> Cheers
> 
> On Mon, Feb 15, 2016 at 6:21 PM, Anil Gupta <[email protected]> wrote:
> 
>> I dont think there is any atomic operations in hbase to support ddl across
>> 2 tables.
>> 
>> But, maybe you can use hbase snapshots.
>> 1.Create a hbase snapshot.
>> 2.Truncate the table.
>> 3.Write data to the table.
>> 4.Create a table from snapshot taken in step #1 as table_old.
>> 
>> Now you have two tables. One with current run data and other with last run
>> data.
>> I think above process will suffice. But, keep in mind that it is not
>> atomic.
>> 
>> HTH,
>> Anil
>> Sent from my iPhone
>> 
>>> On Feb 15, 2016, at 4:25 PM, Pat Ferrel <[email protected]> wrote:
>>> 
>>> Any other way to do what I was asking. With Spark this is a very normal
>> thing to treat a table as immutable and create another to replace the old.
>>> 
>>> Can you lock two tables and rename them in 2 actions then unlock in a
>> very short period of time?
>>> 
>>> Or an alias for table names?
>>> 
>>> Didn’t see these in any docs or Googling, any help is appreciated.
>> Writing all this data back to the original table would be a huge load on a
>> table being written to by external processes and therefore under large load
>> to begin with.
>>> 
>>>> On Feb 14, 2016, at 5:03 PM, Ted Yu <[email protected]> wrote:
>>>> 
>>>> There is currently no native support for renaming two tables in one
>> atomic
>>>> action.
>>>> 
>>>> FYI
>>>> 
>>>>> On Sun, Feb 14, 2016 at 4:18 PM, Pat Ferrel <[email protected]>
>> wrote:
>>>>> 
>>>>> I use Spark to take an old table, clean it up to create an RDD of
>> cleaned
>>>>> data. What I’d like to do is write all of the data to a new table in
>> HBase,
>>>>> then rename the table to the old name. If possible it could be done by
>>>>> changing an alias to point to the new table as long as all external
>> code
>>>>> uses the alias, or by a 2 table rename operation. But I don’t see how
>> to do
>>>>> this for HBase. I am dealing with a lot of data so don’t want to do
>> table
>>>>> modifications with deletes and upserts, this would be incredibly slow.
>>>>> Furthermore I don’t want to disable the table for more than a tiny
>> span of
>>>>> time.
>>>>> 
>>>>> Is it possible to have 2 tables and rename both in an atomic action, or
>>>>> change some alias to point to the new table in an atomic action. If not
>>>>> what is the quickest way to achieve this to minimize time disabled.
>>> 
>> 

Reply via email to