I would not recommend using the direct output committer with HDFS.  Its
intended only as an optimization for S3.

On Fri, Mar 25, 2016 at 4:03 AM, Vinoth Chandar <vin...@uber.com> wrote:

> Hi,
>
> We are doing the following to save a dataframe in parquet (using
> DirectParquetOutputCommitter) as follows.
>
> dfWriter.format("parquet")
>   .mode(SaveMode.Overwrite)
>   .save(outputPath)
>
> The problem is even if an executor fails once while writing file (say some
> transient HDFS issue), when its re-spawn, it fails again because the file
> exists already, eventually failing the entire job.
>
> Is this a known issue? Any workarounds?
>
> Thanks
> Vinoth
>

Reply via email to