We (Databricks) use our own DirectOutputCommitter implementation, which is a couple tens of lines of Scala code. The class would almost entirely be a no-op except we took some care to properly handle the _SUCCESS file.
On Fri, Feb 20, 2015 at 3:52 PM, Mingyu Kim <m...@palantir.com> wrote: > I didn’t get any response. It’d be really appreciated if anyone using a > special OutputCommitter for S3 can comment on this! > > Thanks, > Mingyu > > From: Mingyu Kim <m...@palantir.com> > Date: Monday, February 16, 2015 at 1:15 AM > To: "user@spark.apache.org" <user@spark.apache.org> > Subject: Which OutputCommitter to use for S3? > > HI all, > > The default OutputCommitter used by RDD, which is FileOutputCommitter, > seems to require moving files at the commit step, which is not a constant > operation in S3, as discussed in > http://mail-archives.apache.org/mod_mbox/spark-user/201410.mbox/%3c543e33fa.2000...@entropy.be%3E > <https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_spark-2Duser_201410.mbox_-253C543E33FA.2000802-40entropy.be-253E&d=AwMFAg&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=ennQJq47pNnObsDh-88a9YUrUulcYQoV8giPASqXB84&m=CQfyLCSSjJfOHcbsMrRNihcDeMtHvLkCD5_O0J786BY&s=2t0BawrpQPkJJgxklG_YX6LFzD1VaHTgDXI-w37smyc&e=>. > People seem to develop their own NullOutputCommitter implementation or use > DirectFileOutputCommitter (as mentioned in SPARK-3595 > <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D3595&d=AwMFAg&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=ennQJq47pNnObsDh-88a9YUrUulcYQoV8giPASqXB84&m=CQfyLCSSjJfOHcbsMrRNihcDeMtHvLkCD5_O0J786BY&s=i-gC5iPL8kGUDicLXowgLl5ncIyDknsulTlh7o23W_g&e=>), > but I wanted to check if there is a de facto standard, publicly available > OutputCommitter to use for S3 in conjunction with Spark. > > Thanks, > Mingyu >