[ 
https://issues.apache.org/jira/browse/SPARK-27210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-27210.
----------------------------------
       Resolution: Fixed
         Assignee: Jungtaek Lim
    Fix Version/s: 3.0.0

> Cleanup incomplete output files in ManifestFileCommitProtocol if task is 
> aborted
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-27210
>                 URL: https://issues.apache.org/jira/browse/SPARK-27210
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.0.0
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Minor
>             Fix For: 3.0.0
>
>
> Unlike HadoopMapReduceCommitProtocol, ManifestFileCommitProtocol doesn't 
> clean up incomplete output files for both cases: task is aborted as well as 
> job is aborted.
> In HadoopMapReduceCommitProtocol, it leverages stage directory to write 
> intermediate files so once job is aborted it can simply delete stage 
> directory to clean up everything. Even HadoopMapReduceCommitProtocol puts 
> more effort on cleaning up intermediate files on task side if task is aborted.
> ManifestFileCommitProtocol doesn't do anything for cleaning up but just 
> maintains the metadata which list of complete output files are written. It 
> should be better if ManifestFileCommitProtocol can do the best effort to 
> clean up: not sure it can do job level cleanup since it doesn't leverage 
> stage directory, but it's clear that it can still put best effort to do task 
> level cleanup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to