Daniel Becker has uploaded this change for review. ( http://gerrit.cloudera.org:8080/23165
Change subject: IMPALA-14189: Cleanup subdirectories in insert overwrite ...................................................................... IMPALA-14189: Cleanup subdirectories in insert overwrite If an external table contains data files in subdirectories, and recursive listing is enabled, Impala considers the files in the subdirectories as part of the table. However, currently INSERT OVERWRITE and TRUNCATE do not always delete these files, leading to data corruption. This change takes care of INSERT OVERWRITE. Before this change, for unpartitioned external tables, only top-level data files were deleted and data files in subdirectories (whether hidden, ignored or normal) were kept. After this change, directories are also deleted in addition to (non-hidden) data files, with the exception of hidden and ignored directories. (Note: for ignored directories, see --ignored_dir_prefix_list). Note that for partitioned tables, INSERT OVERWRITE completely removes the partition directories that are affected, and this change does not alter that. Testing: - extended the tests in test_recursive_listing.py::TestRecursiveListing Change-Id: I1a40a22e18e6a384da982d300422ac8995ed0273 --- M be/src/runtime/dml-exec-state.cc M tests/metadata/test_recursive_listing.py M tests/query_test/test_insert_behaviour.py 3 files changed, 99 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/23165/2 -- To view, visit http://gerrit.cloudera.org:8080/23165 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I1a40a22e18e6a384da982d300422ac8995ed0273 Gerrit-Change-Number: 23165 Gerrit-PatchSet: 2 Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>