wangzhihao created HIVE-18927:
---------------------------------

             Summary: Hive "insert overwrite" doesn't replace the destination 
files if no partition in metastore for the files
                 Key: HIVE-18927
                 URL: https://issues.apache.org/jira/browse/HIVE-18927
             Project: Hive
          Issue Type: Bug
          Components: Hive
            Reporter: wangzhihao


[This 
post|http://www.ericlin.me/2015/05/hive-insert-overwrite-does-not-remove-existing-data/]
 describe a way to produce this issue:
{noformat}
# Add some files into file system but no partition in metastore to track it.
hdfs dfs -put test.txt test/p=p1

# Insert overwrite the partition(p = p1)
DROP TABLE IF EXISTS partition_test;
CREATE EXTERNAL TABLE partition_test (a int) PARTITIONED BY (p string);
INSERT OVERWRITE TABLE partition_test PARTITION (p = 'p1') SELECT 123;

# verify the text.txt is not removed.
hdfs dfs -ls test/p=p1
Found 2 items
-rwxr-xr-x   3 hdfs supergroup     194965 2015-05-05 00:15 test/p=p1/000000_0
-rw-r--r--   3 hdfs supergroup          8 2015-05-05 00:10 test/p=p1/test.txt
{noformat}
The reason is that 
[Hive.loadPartition|https://github.com/apache/hive/blob/9b36ffa92cc4e0f47ea03d8d167debe743342f5b/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1652]
 will try to {{replaceFiles}} only if {{oldPath}} exists. Since metastore have 
no partition for the files, the {{oldPath}} is null and thus the files get no 
chance to be cleaned. We should also clean {{destf}} in method 
[Hive.replaceFiles|https://github.com/apache/hive/blob/b362de3871764731d8371657b07140e37a3c5105/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3817]
 to fix the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to