[ https://issues.apache.org/jira/browse/SPARK-27210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shixiong Zhu resolved SPARK-27210. ---------------------------------- Resolution: Fixed Assignee: Jungtaek Lim Fix Version/s: 3.0.0 > Cleanup incomplete output files in ManifestFileCommitProtocol if task is > aborted > -------------------------------------------------------------------------------- > > Key: SPARK-27210 > URL: https://issues.apache.org/jira/browse/SPARK-27210 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 3.0.0 > Reporter: Jungtaek Lim > Assignee: Jungtaek Lim > Priority: Minor > Fix For: 3.0.0 > > > Unlike HadoopMapReduceCommitProtocol, ManifestFileCommitProtocol doesn't > clean up incomplete output files for both cases: task is aborted as well as > job is aborted. > In HadoopMapReduceCommitProtocol, it leverages stage directory to write > intermediate files so once job is aborted it can simply delete stage > directory to clean up everything. Even HadoopMapReduceCommitProtocol puts > more effort on cleaning up intermediate files on task side if task is aborted. > ManifestFileCommitProtocol doesn't do anything for cleaning up but just > maintains the metadata which list of complete output files are written. It > should be better if ManifestFileCommitProtocol can do the best effort to > clean up: not sure it can do job level cleanup since it doesn't leverage > stage directory, but it's clear that it can still put best effort to do task > level cleanup. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org