Hi, We are doing the following to save a dataframe in parquet (using DirectParquetOutputCommitter) as follows.
dfWriter.format("parquet")
.mode(SaveMode.Overwrite)
.save(outputPath)
The problem is even if an executor fails once while writing file (say some
transient HDFS issue), when its re-spawn, it fails again because the file
exists already, eventually failing the entire job.
Is this a known issue? Any workarounds?
Thanks
Vinoth
