Michael Smith has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20386 )

Change subject: IMPALA-12389: Use -skipTrash to avoid accumulating trash
......................................................................

IMPALA-12389: Use -skipTrash to avoid accumulating trash

The default behavior for deleting files on Hadoop is to
move them to a trash folder. The trash folder can be
aged out, but Impala's developer environment sets the
trash to live a long time. This is a problem, because the
trash contents will continue to accumulate.

This combines multiple changes to avoid accumulating trash:
1. This changes HadoopFsCommandLineClient's delete_file_dir
   to use -skipTrash to avoid accumulating the trash for
   this case. This helps on non-HDFS test environments.
2. This changes the unique_database fixture to delete the
   database directory before dropping the database. Non-external
   tables deleted as part of DROP DATABASE .. CASCADE are
   moved to the trash. Deleting the database directory ourselves
   avoids sending these files to the trash.
3. "hdfs dfs -expunge -immediate" can recover the disk space, but
   it is very slow. This increases the dfs.block.invalidate.limit
   to allow HDFS to delete more blocks in a single heartbeat.

To support this change, there were other test-only changes:
 - This updates a few tests that placed tables outside of the
   unique_database. In particular, Iceberg tests using
   create_iceberg_table_from_directory() were putting tables
   outside the database.
 - TestHdfsEncryption and TestHdfsPermissions used WebHDFS-style
   paths without the leading slash. This is harmless, but this
   cleans them up to use normal paths. This is safe, because the
   delegating client converts it internally when using the WebHDFS
   client.

Testing:
 - Ran tests locally and examined the trash directory

Change-Id: I2d304113596aaf70a122202a33276fc7c3d599e8
Reviewed-on: http://gerrit.cloudera.org:8080/20386
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Michael Smith <[email protected]>
---
M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl
M tests/common/file_utils.py
M tests/conftest.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/query_test/test_scanners.py
M tests/query_test/test_udfs.py
M tests/util/hdfs_util.py
M tests/util/iceberg_metadata_util.py
9 files changed, 35 insertions(+), 25 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Michael Smith: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/20386
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2d304113596aaf70a122202a33276fc7c3d599e8
Gerrit-Change-Number: 20386
Gerrit-PatchSet: 14
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>

Reply via email to