[ https://issues.apache.org/jira/browse/SPARK-18024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin updated SPARK-18024: -------------------------------- Summary: Introduce an internal commit protocol API along with OutputCommitter implementation (was: Introduce a commit protocol API along with OutputCommitter implementation) > Introduce an internal commit protocol API along with OutputCommitter > implementation > ----------------------------------------------------------------------------------- > > Key: SPARK-18024 > URL: https://issues.apache.org/jira/browse/SPARK-18024 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Reynold Xin > Assignee: Reynold Xin > Fix For: 2.1.0 > > > This commit protocol API should wrap around Hadoop's output committer. Later > we can expand the API to cover streaming commits. > The existing Hadoop output committer API is insufficient for streaming use > cases: > 1. It has no way for tasks to pass information back to the driver. > 2. It relies on the weird Hadoop hashmap to pass information from the driver > to the executors, largely because there is no support for language > integration and serialization in Hadoop MapReduce. Spark has more natural > support for passing information through automatic closure serialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org