Hello Jason Fehr, Michael Smith, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/20386
to look at the new patch set (#11).
Change subject: IMPALA-12389: Use -skipTrash to avoid accumulating trash
......................................................................
IMPALA-12389: Use -skipTrash to avoid accumulating trash
The default behavior for deleting files on Hadoop is to
move them to a trash folder. The trash folder can be
aged out, but Impala's developer environment sets the
trash to live a long time. This is a problem, because the
trash contents will continue to accumulate.
This combines multiple changes to avoid accumulating trash:
1. This changes HadoopFsCommandLineClient's delete_file_dir
to use -skipTrash to avoid accumulating the trash for
this case. It also modifies DelegatingHdfsClient to use
HadoopFsCommandLineClient for delete_file_dir. The WebHDFS
client doesn't have the option to skip trash. This does
a quick existence check using WebHDFS to avoid the overhead
of invoking the commandline for a location that doesn't exist.
2. This changes the unique_database fixture to delete the
database directory before dropping the database. Non-external
tables deleted as part of DROP DATABASE .. CASCADE are
moved to the trash. Deleting the database directory ourselves
avoids sending these files to the trash.
3. "hdfs dfs -expunge -immediate" can recover the disk space, but
it is very slow. This increases the dfs.block.invalidate.limit
to allow HDFS to delete more blocks in a single heartbeat.
To support this change, there were other test-only changes:
- TestHdfsEncryption and TestHdfsPermissions used WebHDFS-style
paths without the leading slash. This is incompatible with
using the HDFS commandline for delete_file_dir, so it switches
those tests to normal paths. This should be safe, because we
always use the delegating client which removes slashes when
it uses the WebHDFS client.
- This relaxes the timing for TestRecursiveListing, because
deletes via the Hadoop commandline are slower than deletes
through the WebHDFS client.
- This updates a few tests that placed tables outside of the
unique_database. In particular, Iceberg tests using
create_iceberg_table_from_directory() were putting tables
outside the database.
Testing:
- Ran tests locally and examined the trash directory
Change-Id: I2d304113596aaf70a122202a33276fc7c3d599e8
---
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M tests/common/file_utils.py
M tests/conftest.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_recursive_listing.py
M tests/query_test/test_scanners.py
M tests/query_test/test_udfs.py
M tests/util/hdfs_util.py
M tests/util/iceberg_metadata_util.py
10 files changed, 48 insertions(+), 31 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/86/20386/11
--
To view, visit http://gerrit.cloudera.org:8080/20386
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2d304113596aaf70a122202a33276fc7c3d599e8
Gerrit-Change-Number: 20386
Gerrit-PatchSet: 11
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>