I encountered a similar problem when trying to:
ds.write().save(“s3a://some-bucket/some/path/table”);
which writes the content as a bunch of parquet files in the “folder” named
“table”.
I am using a Flintrock cluster with the Spark 3.0 preview FWIW.
Anyway, I just used the AWS SDK to remove it
Maybe set spark.hadoop.validateOutputSpecs=false?
发件人: Gautham Acharya
发送时间: 2020年3月15日 3:23
收件人: user@spark.apache.org
主题: [PySpark] How to write HFiles as an 'append' to the same directory?
I have a process in Apache Spark that attempts to write HF
I have a process in Apache Spark that attempts to write HFiles to S3 in a
batched process. I want the resulting HFiles in the same directory, as they are
in the same column family. However, I'm getting a 'directory already exists
error' when I try to run this on AWS EMR. How can I write Hfiles v