Vihang Karajgaonkar created HIVE-15880: ------------------------------------------
Summary: Allow insert overwrite query to use auto.purge table property Key: HIVE-15880 URL: https://issues.apache.org/jira/browse/HIVE-15880 Project: Hive Issue Type: Improvement Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar It seems inconsistent that auto.purge property is not considered when we do a INSERT OVERWRITE while it is when we do a DROP TABLE Drop table doesn't move table data to Trash when auto.purge is set to true {noformat} > create table temp(col1 string, col2 string); No rows affected (0.064 seconds) > alter table temp set tblproperties('auto.purge'='true'); No rows affected (0.083 seconds) > insert into temp values ('test', 'test'), ('test2', 'test2'); No rows affected (25.473 seconds) # hdfs dfs -ls /user/hive/warehouse/temp Found 1 items -rwxrwxrwt 3 hive hive 22 2017-02-09 13:03 /user/hive/warehouse/temp/000000_0 # > drop table temp; No rows affected (0.242 seconds) # hdfs dfs -ls /user/hive/warehouse/temp ls: `/user/hive/warehouse/temp': No such file or directory # # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse # {noformat} INSERT OVERWRITE query moves the table data to Trash even when auto.purge is set to true {noformat} > create table temp(col1 string, col2 string); > alter table temp set tblproperties('auto.purge'='true'); > insert into temp values ('test', 'test'), ('test2', 'test2'); # hdfs dfs -ls /user/hive/warehouse/temp Found 1 items -rwxrwxrwt 3 hive hive 22 2017-02-09 13:07 /user/hive/warehouse/temp/000000_0 # > insert overwrite table temp select * from dummy; # hdfs dfs -ls /user/hive/warehouse/temp Found 1 items -rwxrwxrwt 3 hive hive 26 2017-02-09 13:08 /user/hive/warehouse/temp/000000_0 # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse Found 1 items drwx------ - hive hive 0 2017-02-09 13:08 /user/hive/.Trash/Current/user/hive/warehouse/temp # {noformat} While move operations are not very costly on HDFS it could be significant overhead on slow FileSystems like S3. This could improve the performance of {{INSERT OVERWRITE TABLE}} queries especially when there are large number of partitions on tables located on S3 should the user wish to set auto.purge property to true -- This message was sent by Atlassian JIRA (v6.3.15#6346)