Re: swap data in Kudu table

Mike Percy Fri, 23 Feb 2018 13:17:20 -0800

Hi Boris, those are good ideas. Currently Kudu does not have atomic bulk
load capabilities or staging abilities. Theoretically renaming a partition
atomically shouldn't be that hard to implement, since it's just a master
metadata operation which can be done atomically, but it's not yet
implemented.


There is a JIRA to track a generic bulk load API here:
https://issues.apache.org/jira/browse/KUDU-1370

Since I couldn't find anything to track the specific features you
mentioned, I just filed the following improvement JIRAs so we can track it:

   - KUDU-2326: Support atomic bulk load operation
   <https://issues.apache.org/jira/browse/KUDU-2326>
   - KUDU-2327: Support atomic swap of tables or partitions
   <https://issues.apache.org/jira/browse/KUDU-2327>

Mike

On Thu, Feb 22, 2018 at 6:39 AM, Boris Tyukin <[email protected]> wrote:

> Hello,
>
> I am trying to figure out the best and safest way to swap data in a
> production Kudu table with data from a staging table.
>
> Basically, once in a while we need to perform a full reload of some tables
> (once in a few months). These tables are pretty large with billions of rows
> and we want to minimize the risk and downtime for users if something bad
> happens in the middle of that process.
>
> With Hive and Impala on HDFS, we can use a very cool handy command LOAD
> DATA INPATH. We can prepare data for reload in a staging table upfront and
> this process might take many hours. Once staging table is ready, we can
> issue LOAD DATA INPATH command which will move underlying HDFS files to a
> production table - this operation is almost instant and the very last step
> in our pipeline.
>
> Alternatively, we can swap partitions using ALTER TABLE EXCHANGE PARTITION
> command.
>
> Now with Kudu, I cannot seem to find a good strategy. The only thing came
> to my mind is to drop the production table and rename a staging table to
> production table as the last step of the job, but in this case we are going
> to lose statistics and security permissions.
>
> Any other ideas?
>
> Thanks!
> Boris
>

Re: swap data in Kudu table

Reply via email to