As the subject suggest I want to output an parquet to S3. I know this was rather troublesome in the past because of S3 not having a move but needed to do a copy+delete. This issues has been discussed before see: http://apache-spark-user-list.1001560.n3.nabble.com/Writing-files-to-s3-with-out-temporary-directory-tc28088.html
Now Hadoop-13786 <https://issues.apache.org/jira/browse/HADOOP-13786> is fixing this problem in Hadoop 3.1.0 and later. How can I use that with spark 2.3.3? I usually orchestrate my cluster on EC2 with flintrock <https://github.com/nchammas/flintrock>. Do I just set in the flintrock config HDFS to 3.1.1 and everything "just works"? Or do I also have to set a committer algorithm like this when I create my spark context in pyspark: .set('spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version','some_kind_of_Version') thanks for the help!