Yida Wu has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/22378 )
Change subject: IMPALA-13677: Support remote scratch directory cleanup at Impala daemon startup ...................................................................... IMPALA-13677: Support remote scratch directory cleanup at Impala daemon startup This patch introduces a new feature for cleaning up remote scratch files during Impala daemon startup, ensuring that potential leftover files from abnormal shutdowns are removed. To allow efficient cleanup, this patch also refines the remote scratch directory hierarchy by adding a host-level directory, changing it from: <base_dir>/<backend_id>_<query_id>/<file_name> to: <base_dir>/<hostname>/<backend_id>_<query_id>/<file_name> <base_dir> is <scratch_dir_config_path>/impala-scratch. During startup, if the host-level directory exists, it will be removed entirely. This design assumes one Impala daemon per host, and it also assumes that multiple Impala clusters don't share the same scratch_dir path on remote filesystem. Even if they share the same prefix, each Impala cluster should have dedicated paths: --scratch_dirs=hdfs://remote_dir/scratch/impala1 --scratch_dirs=hdfs://remote_dir/scratch/impala2 Also added one flag remote_scratch_cleanup_on_startup to control whether the host-level directory is cleaned during Impala daemon startup. By default, this feature is enabled. If multiple daemons on a host or multiple clusters share the same remote scratch_dir path, we can set this to false to prevent unintended cleanup. Tests: Passed exhaustive tests. Adds testcase test_scratch_dirs_remote_spill_leftover_files_removal. Change-Id: Iadd49b7384d52bac5ddab4e86cd9f39dc2c88e1b Reviewed-on: http://gerrit.cloudera.org:8080/22378 Reviewed-by: Abhishek Rawat <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M tests/custom_cluster/test_scratch_disk.py 5 files changed, 100 insertions(+), 15 deletions(-) Approvals: Abhishek Rawat: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/22378 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iadd49b7384d52bac5ddab4e86cd9f39dc2c88e1b Gerrit-Change-Number: 22378 Gerrit-PatchSet: 7 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]>
