Re: swap data in Kudu table

Boris Tyukin Fri, 23 Feb 2018 14:16:24 -0800

you are guys are awesome, thanks!

Todd, I like ALTER TABLE TBLPROPERTIES idea - will test it next week. Views
might work as well but for a number of reasons want to keep it as my last
resort :)


On Fri, Feb 23, 2018 at 4:32 PM, Todd Lipcon <[email protected]> wrote:

> A couple other ideas from the Impala side:
>
> - could you use a view and alter the view to point to a different table?
> Then all readers would be pointed at the view, and security permissions
> could be on that view rather than the underlying tables?
>
> - I think if you use an external table in Impala you could use an ALTER
> TABLE TBLPROPERTIES ... statement to change kudu.table_name to point to a
> different table. Then issue a 'refresh' on the impalads so that they load
> the new metadata. Subsequent queries would hit the new underlying Kudu
> table, but permissions and stats would be unchanged.
>
> -Todd
>
> On Fri, Feb 23, 2018 at 1:16 PM, Mike Percy <[email protected]> wrote:
>
>> Hi Boris, those are good ideas. Currently Kudu does not have atomic bulk
>> load capabilities or staging abilities. Theoretically renaming a partition
>> atomically shouldn't be that hard to implement, since it's just a master
>> metadata operation which can be done atomically, but it's not yet
>> implemented.
>>
>> There is a JIRA to track a generic bulk load API here:
>> https://issues.apache.org/jira/browse/KUDU-1370
>>
>> Since I couldn't find anything to track the specific features you
>> mentioned, I just filed the following improvement JIRAs so we can track it:
>>
>>    - KUDU-2326: Support atomic bulk load operation
>>    <https://issues.apache.org/jira/browse/KUDU-2326>
>>    - KUDU-2327: Support atomic swap of tables or partitions
>>    <https://issues.apache.org/jira/browse/KUDU-2327>
>>
>> Mike
>>
>> On Thu, Feb 22, 2018 at 6:39 AM, Boris Tyukin <[email protected]>
>> wrote:
>>
>>> Hello,
>>>
>>> I am trying to figure out the best and safest way to swap data in a
>>> production Kudu table with data from a staging table.
>>>
>>> Basically, once in a while we need to perform a full reload of some
>>> tables (once in a few months). These tables are pretty large with billions
>>> of rows and we want to minimize the risk and downtime for users if
>>> something bad happens in the middle of that process.
>>>
>>> With Hive and Impala on HDFS, we can use a very cool handy command LOAD
>>> DATA INPATH. We can prepare data for reload in a staging table upfront and
>>> this process might take many hours. Once staging table is ready, we can
>>> issue LOAD DATA INPATH command which will move underlying HDFS files to a
>>> production table - this operation is almost instant and the very last step
>>> in our pipeline.
>>>
>>> Alternatively, we can swap partitions using ALTER TABLE EXCHANGE
>>> PARTITION command.
>>>
>>> Now with Kudu, I cannot seem to find a good strategy. The only thing
>>> came to my mind is to drop the production table and rename a staging table
>>> to production table as the last step of the job, but in this case we are
>>> going to lose statistics and security permissions.
>>>
>>> Any other ideas?
>>>
>>> Thanks!
>>> Boris
>>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: swap data in Kudu table

Reply via email to