stczwd commented on pull request #28280: URL: https://github.com/apache/spark/pull/28280#issuecomment-700496802
> Can you give more details about the use cases? There might be better ways to solve it. @cloud-fan Thanks for your reply. Actually, this have been discussed in #28129. There are always some temporary directory left behind after Application Finished. This is happened when we run `InsertIntoHiveTable` or `InsertIntoHiveDirCommand` with `spark.speculation=true`. The execution environment has a slow response from the executor to driver, which causes some tasks to retry, and some tasks survive after the job ends. The surviving tasks continued writing result into temp dir, and this make temp dir uncleaned after application finished. Here is the driver log for this. ``` 2020-04-07 04:36:19 [dag-scheduler-event-loop] INFO [DAGScheduler]: ResultStage 16 (sql at NativeMethodAccessorImpl.java:0) finished in 2.222 s 2020-04-07 04:36:19 [pool-3-thread-1] INFO [DAGScheduler]: Job 2 finished: sql at NativeMethodAccessorImpl.java:0, took 23.883106 s 2020-04-07 04:36:19 [pool-3-thread-1] INFO [FileFormatWriter]: Job null committed. 2020-04-07 04:36:19 [pool-3-thread-1] INFO [FileFormatWriter]: Finished processing stats for job null. 2020-04-07 04:36:21 [task-result-getter-0] WARN [TaskSetManager]: Lost task 752.0 in stage 16.0 (executor 42): TaskKilled (another attempt succeeded) 2020-04-07 04:36:21 [task-result-getter-0] INFO [TaskSetManager]: Task 752.0 in stage 16.0 failed, but the task will not be re-executed (either because the task failed with a shuffle data fetch failure, so the previous stage needs to be re-run, or because a different copy of the task has already succeeded). 2020-04-07 04:36:21 [task-result-getter-3] WARN [TaskSetManager]: Lost task 543.0 in stage 16.0 (executor 146): TaskKilled (another attempt succeeded) 2020-04-07 04:36:21 [task-result-getter-3] INFO [TaskSetManager]: Task 543.0 in stage 16.0 failed, but the task will not be re-executed (either because the task failed with a shuffle data fetch failure, so the previous stage needs to be re-run, or because a different copy of the task has already succeeded). 2020-04-07 04:36:23 [task-result-getter-1] WARN [TaskSetManager]: Lost task 143.0 in stage 16.0 (executor 146): TaskKilled (another attempt succeeded) 2020-04-07 04:36:23 [task-result-getter-1] INFO [TaskSetManager]: Task 143.0 in stage 16.0 failed, but the task will not be re-executed (either because the task failed with a shuffle data fetch failure, so the previous stage needs to be re-run, or because a different copy of the task has already succeeded). 2020-04-07 04:36:28 [task-result-getter-1] INFO [YarnClusterScheduler]: Removed TaskSet 16.0, whose tasks have all completed, from pool default 2020-04-07 04:36:28 [RPC-Handler-7] INFO [SparkUI]: Stopped Spark web UI at http://192.168.1.125:48909 ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
