Prabhu Joseph created FLINK-34142: ------------------------------------- Summary: TaskManager WorkingDirectory is not removed during shutdown Key: FLINK-34142 URL: https://issues.apache.org/jira/browse/FLINK-34142 Project: Flink Issue Type: Bug Components: Deployment / YARN Affects Versions: 1.17.1, 1.16.0 Reporter: Prabhu Joseph
TaskManager WorkingDirectory is not removed during shutdown. *Repro* {code:java} 1. Execute a Flink batch job within a Flink on YARN Session flink-yarn-session -d flink run -d /usr/lib/flink/examples/batch/WordCount.jar --input s3://prabhuflinks3/INPUT --output s3://prabhuflinks3/OUT {code} The batch job completes successfully, but the taskmanager working directory is not being removed. {code:java} [root@ip-1-2-3-4 container_1705470896818_0017_01_000002]# ls -R -lrt /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002 /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002: total 0 drwxr-xr-x 2 yarn yarn 6 Jan 18 08:34 tmp drwxr-xr-x 4 yarn yarn 66 Jan 18 08:34 blobStorage drwxr-xr-x 2 yarn yarn 6 Jan 18 08:34 slotAllocationSnapshots drwxr-xr-x 2 yarn yarn 6 Jan 18 08:34 localState /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/tmp: total 0 /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/blobStorage: total 0 drwxr-xr-x 2 yarn yarn 94 Jan 18 08:34 job_d11f7085314ef1fb04c4e12fe292185a drwxr-xr-x 2 yarn yarn 6 Jan 18 08:34 incoming /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/blobStorage/job_d11f7085314ef1fb04c4e12fe292185a: total 12 -rw-r--r-- 1 yarn yarn 10323 Jan 18 08:34 blob_p-cdd441a64b3ea6eed0058df02c6c10fd208c94a8-86d84864273dad1e8084d8ef0f5aad52 /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/blobStorage/incoming: total 0 /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/slotAllocationSnapshots: total 0 /mnt2/yarn/usercache/hadoop/appcache/application_1705470896818_0017/tm_container_1705470896818_0017_01_000002/localState: total 0 {code} *Analysis* 1. The TaskManagerRunner removes the working directory only when its 'close' method is called, which never happens. {code:java} public void close() throws Exception { try { closeAsync().get(); } catch (ExecutionException e) { ExceptionUtils.rethrowException(ExceptionUtils.stripExecutionException(e)); } } public CompletableFuture<Result> closeAsync() { return closeAsync(Result.SUCCESS); } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)