Vihang Karajgaonkar created HIVE-15880:
------------------------------------------

             Summary: Allow insert overwrite query to use auto.purge table 
property
                 Key: HIVE-15880
                 URL: https://issues.apache.org/jira/browse/HIVE-15880
             Project: Hive
          Issue Type: Improvement
            Reporter: Vihang Karajgaonkar
            Assignee: Vihang Karajgaonkar


It seems inconsistent that auto.purge property is not considered when we do a 
INSERT OVERWRITE while it is when we do a DROP TABLE

Drop table doesn't move table data to Trash when auto.purge is set to true
{noformat}
> create table temp(col1 string, col2 string);
No rows affected (0.064 seconds)
> alter table temp set tblproperties('auto.purge'='true');
No rows affected (0.083 seconds)
> insert into temp values ('test', 'test'), ('test2', 'test2');
No rows affected (25.473 seconds)

# hdfs dfs -ls /user/hive/warehouse/temp
Found 1 items
-rwxrwxrwt   3 hive hive         22 2017-02-09 13:03 
/user/hive/warehouse/temp/000000_0
#

> drop table temp;
No rows affected (0.242 seconds)

# hdfs dfs -ls /user/hive/warehouse/temp
ls: `/user/hive/warehouse/temp': No such file or directory
#
# sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
#
{noformat}

INSERT OVERWRITE query moves the table data to Trash even when auto.purge is 
set to true
{noformat}
> create table temp(col1 string, col2 string);
> alter table temp set tblproperties('auto.purge'='true');
> insert into temp values ('test', 'test'), ('test2', 'test2');

# hdfs dfs -ls /user/hive/warehouse/temp
Found 1 items
-rwxrwxrwt   3 hive hive         22 2017-02-09 13:07 
/user/hive/warehouse/temp/000000_0
#

> insert overwrite table temp select * from dummy;

# hdfs dfs -ls /user/hive/warehouse/temp
Found 1 items
-rwxrwxrwt   3 hive hive         26 2017-02-09 13:08 
/user/hive/warehouse/temp/000000_0
# sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
Found 1 items
drwx------   - hive hive          0 2017-02-09 13:08 
/user/hive/.Trash/Current/user/hive/warehouse/temp
#
{noformat}

While move operations are not very costly on HDFS it could be significant 
overhead on slow FileSystems like S3. This could improve the performance of 
{{INSERT OVERWRITE TABLE}} queries especially when there are large number of 
partitions on tables located on S3 should the user wish to set auto.purge 
property to true



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to