you are guys are awesome, thanks! Todd, I like ALTER TABLE TBLPROPERTIES idea - will test it next week. Views might work as well but for a number of reasons want to keep it as my last resort :)
On Fri, Feb 23, 2018 at 4:32 PM, Todd Lipcon <[email protected]> wrote: > A couple other ideas from the Impala side: > > - could you use a view and alter the view to point to a different table? > Then all readers would be pointed at the view, and security permissions > could be on that view rather than the underlying tables? > > - I think if you use an external table in Impala you could use an ALTER > TABLE TBLPROPERTIES ... statement to change kudu.table_name to point to a > different table. Then issue a 'refresh' on the impalads so that they load > the new metadata. Subsequent queries would hit the new underlying Kudu > table, but permissions and stats would be unchanged. > > -Todd > > On Fri, Feb 23, 2018 at 1:16 PM, Mike Percy <[email protected]> wrote: > >> Hi Boris, those are good ideas. Currently Kudu does not have atomic bulk >> load capabilities or staging abilities. Theoretically renaming a partition >> atomically shouldn't be that hard to implement, since it's just a master >> metadata operation which can be done atomically, but it's not yet >> implemented. >> >> There is a JIRA to track a generic bulk load API here: >> https://issues.apache.org/jira/browse/KUDU-1370 >> >> Since I couldn't find anything to track the specific features you >> mentioned, I just filed the following improvement JIRAs so we can track it: >> >> - KUDU-2326: Support atomic bulk load operation >> <https://issues.apache.org/jira/browse/KUDU-2326> >> - KUDU-2327: Support atomic swap of tables or partitions >> <https://issues.apache.org/jira/browse/KUDU-2327> >> >> Mike >> >> On Thu, Feb 22, 2018 at 6:39 AM, Boris Tyukin <[email protected]> >> wrote: >> >>> Hello, >>> >>> I am trying to figure out the best and safest way to swap data in a >>> production Kudu table with data from a staging table. >>> >>> Basically, once in a while we need to perform a full reload of some >>> tables (once in a few months). These tables are pretty large with billions >>> of rows and we want to minimize the risk and downtime for users if >>> something bad happens in the middle of that process. >>> >>> With Hive and Impala on HDFS, we can use a very cool handy command LOAD >>> DATA INPATH. We can prepare data for reload in a staging table upfront and >>> this process might take many hours. Once staging table is ready, we can >>> issue LOAD DATA INPATH command which will move underlying HDFS files to a >>> production table - this operation is almost instant and the very last step >>> in our pipeline. >>> >>> Alternatively, we can swap partitions using ALTER TABLE EXCHANGE >>> PARTITION command. >>> >>> Now with Kudu, I cannot seem to find a good strategy. The only thing >>> came to my mind is to drop the production table and rename a staging table >>> to production table as the last step of the job, but in this case we are >>> going to lose statistics and security permissions. >>> >>> Any other ideas? >>> >>> Thanks! >>> Boris >>> >> >> > > > -- > Todd Lipcon > Software Engineer, Cloudera >
