Yida Wu has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/22378 )

Change subject: IMPALA-13677: Support remote scratch directory cleanup at 
Impala daemon startup
......................................................................

IMPALA-13677: Support remote scratch directory cleanup at Impala daemon startup

This patch introduces a new feature for cleaning up remote scratch
files during Impala daemon startup, ensuring that potential leftover
files from abnormal shutdowns are removed.

To allow efficient cleanup, this patch also refines the remote
scratch directory hierarchy by adding a host-level directory,
changing it from:
<base_dir>/<backend_id>_<query_id>/<file_name>
to:
<base_dir>/<hostname>/<backend_id>_<query_id>/<file_name>
<base_dir> is <scratch_dir_config_path>/impala-scratch.

During startup, if the host-level directory exists, it will be
removed entirely.

This design assumes one Impala daemon per host, and it also
assumes that multiple Impala clusters don't share the same
scratch_dir path on remote filesystem. Even if they share the
same prefix, each Impala cluster should have dedicated paths:
--scratch_dirs=hdfs://remote_dir/scratch/impala1
--scratch_dirs=hdfs://remote_dir/scratch/impala2

Also added one flag remote_scratch_cleanup_on_startup to control
whether the host-level directory is cleaned during Impala daemon
startup. By default, this feature is enabled. If multiple daemons
on a host or multiple clusters share the same remote scratch_dir
path, we can set this to false to prevent unintended cleanup.

Tests:
Passed exhaustive tests.
Adds testcase test_scratch_dirs_remote_spill_leftover_files_removal.

Change-Id: Iadd49b7384d52bac5ddab4e86cd9f39dc2c88e1b
Reviewed-on: http://gerrit.cloudera.org:8080/22378
Reviewed-by: Abhishek Rawat <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/tmp-file-mgr-test.cc
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M tests/custom_cluster/test_scratch_disk.py
5 files changed, 100 insertions(+), 15 deletions(-)

Approvals:
  Abhishek Rawat: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/22378
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iadd49b7384d52bac5ddab4e86cd9f39dc2c88e1b
Gerrit-Change-Number: 22378
Gerrit-PatchSet: 7
Gerrit-Owner: Yida Wu <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>

Reply via email to