I have a system where I’m saving parquet files to S3 via Spark. They are partitioned a couple of ways first by date and then by a partition key. There are multiple parquet files per combination over long period of time. So the structure is like this:
s3://bucketname/date=2016-02-29/partionkey=2342/filename.parquet.gz There’s been disagreement on how the SaveMode should be used for in saving out the data. If we keep the SaveMode as ErrorIfExists, will that means additional partitions or parquet files that are written out later with the same parts of the subpath won’t be able to be written successfully? Also, does the SaveMode apply to Tasks too. Say, we are using the Direct Output Committer, and there’s a failure in a task that causes some files to be written and others in the task to not be written. Would it automatically inherit the SaveMode in the individual file’s case. or is the SaveMode only apply to the files as a whole? Peter Halliday --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org