A couple other ideas from the Impala side: - could you use a view and alter the view to point to a different table? Then all readers would be pointed at the view, and security permissions could be on that view rather than the underlying tables?
- I think if you use an external table in Impala you could use an ALTER TABLE TBLPROPERTIES ... statement to change kudu.table_name to point to a different table. Then issue a 'refresh' on the impalads so that they load the new metadata. Subsequent queries would hit the new underlying Kudu table, but permissions and stats would be unchanged. -Todd On Fri, Feb 23, 2018 at 1:16 PM, Mike Percy <mpe...@apache.org> wrote: > Hi Boris, those are good ideas. Currently Kudu does not have atomic bulk > load capabilities or staging abilities. Theoretically renaming a partition > atomically shouldn't be that hard to implement, since it's just a master > metadata operation which can be done atomically, but it's not yet > implemented. > > There is a JIRA to track a generic bulk load API here: > https://issues.apache.org/jira/browse/KUDU-1370 > > Since I couldn't find anything to track the specific features you > mentioned, I just filed the following improvement JIRAs so we can track it: > > - KUDU-2326: Support atomic bulk load operation > <https://issues.apache.org/jira/browse/KUDU-2326> > - KUDU-2327: Support atomic swap of tables or partitions > <https://issues.apache.org/jira/browse/KUDU-2327> > > Mike > > On Thu, Feb 22, 2018 at 6:39 AM, Boris Tyukin <bo...@boristyukin.com> > wrote: > >> Hello, >> >> I am trying to figure out the best and safest way to swap data in a >> production Kudu table with data from a staging table. >> >> Basically, once in a while we need to perform a full reload of some >> tables (once in a few months). These tables are pretty large with billions >> of rows and we want to minimize the risk and downtime for users if >> something bad happens in the middle of that process. >> >> With Hive and Impala on HDFS, we can use a very cool handy command LOAD >> DATA INPATH. We can prepare data for reload in a staging table upfront and >> this process might take many hours. Once staging table is ready, we can >> issue LOAD DATA INPATH command which will move underlying HDFS files to a >> production table - this operation is almost instant and the very last step >> in our pipeline. >> >> Alternatively, we can swap partitions using ALTER TABLE EXCHANGE >> PARTITION command. >> >> Now with Kudu, I cannot seem to find a good strategy. The only thing came >> to my mind is to drop the production table and rename a staging table to >> production table as the last step of the job, but in this case we are going >> to lose statistics and security permissions. >> >> Any other ideas? >> >> Thanks! >> Boris >> > > -- Todd Lipcon Software Engineer, Cloudera