[GitHub] [spark] stczwd commented on pull request #28280: [SPARK-31438][CORE][SQL] Support JobCleaned Status in SparkListener

GitBox Tue, 29 Sep 2020 00:08:48 -0700


stczwd commented on pull request #28280:
URL: https://github.com/apache/spark/pull/28280#issuecomment-700496802



   > Can you give more details about the use cases? There might be better ways 
to solve it.
   
   @cloud-fan Thanks for your reply. Actually, this have been discussed in 
#28129. There are always some temporary directory left behind after Application 
Finished.
   
   This is happened when we run `InsertIntoHiveTable` or 
`InsertIntoHiveDirCommand` with `spark.speculation=true`.
   The execution environment has a slow response from the executor to driver, 
which causes some tasks to retry, and some tasks survive after the job ends. 
The surviving tasks continued writing result into temp dir, and this make temp 
dir uncleaned after application finished.
   
   Here is the driver log for this.
   ```
   2020-04-07 04:36:19 [dag-scheduler-event-loop]  INFO [DAGScheduler]: 
ResultStage 16 (sql at NativeMethodAccessorImpl.java:0) finished in 2.222 s
   2020-04-07 04:36:19 [pool-3-thread-1]  INFO [DAGScheduler]: Job 2 finished: 
sql at NativeMethodAccessorImpl.java:0, took 23.883106 s
   2020-04-07 04:36:19 [pool-3-thread-1]  INFO [FileFormatWriter]: Job null 
committed.
   2020-04-07 04:36:19 [pool-3-thread-1]  INFO [FileFormatWriter]: Finished 
processing stats for job null.
   2020-04-07 04:36:21 [task-result-getter-0]  WARN [TaskSetManager]: Lost task 
752.0 in stage 16.0 (executor 42): TaskKilled (another attempt succeeded)
   2020-04-07 04:36:21 [task-result-getter-0]  INFO [TaskSetManager]: Task 
752.0 in stage 16.0 failed, but the task will not be re-executed (either 
because the task failed with a shuffle data fetch failure, so the previous 
stage needs to be re-run, or because a different copy of the task has already 
succeeded).
   2020-04-07 04:36:21 [task-result-getter-3]  WARN [TaskSetManager]: Lost task 
543.0 in stage 16.0 (executor 146): TaskKilled (another attempt succeeded)
   2020-04-07 04:36:21 [task-result-getter-3]  INFO [TaskSetManager]: Task 
543.0 in stage 16.0 failed, but the task will not be re-executed (either 
because the task failed with a shuffle data fetch failure, so the previous 
stage needs to be re-run, or because a different copy of the task has already 
succeeded).
   2020-04-07 04:36:23 [task-result-getter-1]  WARN [TaskSetManager]: Lost task 
143.0 in stage 16.0 (executor 146): TaskKilled (another attempt succeeded)
   2020-04-07 04:36:23 [task-result-getter-1]  INFO [TaskSetManager]: Task 
143.0 in stage 16.0 failed, but the task will not be re-executed (either 
because the task failed with a shuffle data fetch failure, so the previous 
stage needs to be re-run, or because a different copy of the task has already 
succeeded).
   2020-04-07 04:36:28 [task-result-getter-1]  INFO [YarnClusterScheduler]: 
Removed TaskSet 16.0, whose tasks have all completed, from pool default
   2020-04-07 04:36:28 [RPC-Handler-7]  INFO [SparkUI]: Stopped Spark web UI at 
http://192.168.1.125:48909
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] stczwd commented on pull request #28280: [SPARK-31438][CORE][SQL] Support JobCleaned Status in SparkListener

Reply via email to