Re: [PySpark] How to write HFiles as an 'append' to the same directory?

2020-03-16 Thread Stephen Coy
I encountered a similar problem when trying to: ds.write().save(“s3a://some-bucket/some/path/table”); which writes the content as a bunch of parquet files in the “folder” named “table”. I am using a Flintrock cluster with the Spark 3.0 preview FWIW. Anyway, I just used the AWS SDK to remove it

回复: [PySpark] How to write HFiles as an 'append' to the same directory?

2020-03-16 Thread Zhang Victor
Maybe set spark.hadoop.validateOutputSpecs=false? 发件人: Gautham Acharya 发送时间: 2020年3月15日 3:23 收件人: user@spark.apache.org 主题: [PySpark] How to write HFiles as an 'append' to the same directory? I have a process in Apache Spark that attempts to write HF

[PySpark] How to write HFiles as an 'append' to the same directory?

2020-03-14 Thread Gautham Acharya
I have a process in Apache Spark that attempts to write HFiles to S3 in a batched process. I want the resulting HFiles in the same directory, as they are in the same column family. However, I'm getting a 'directory already exists error' when I try to run this on AWS EMR. How can I write Hfiles v