I'm just circling back to this now. Is the commit protocol an acceptable way
of making this configureable? I could make the temp path (currently
"_temporary") configureable if that is what you are referring to.
Michael Armbrust wrote
> We didn't go this way initially because it doesn't work on st
We didn't go this way initially because it doesn't work on storage systems
that have weaker guarantees than HDFS with respect to rename. That said,
I'm happy to look at other options if we want to make this configurable.
On Fri, Feb 9, 2018 at 2:53 PM, Dave Cameron
wrote:
> Hi
>
>
> I have a
Hi
I have a Spark structured streaming job that reads from Kafka and writes
parquet files to Hive/HDFS. The files are not very large, but the Kafka
source is noisy so each spark job takes a long time to complete. There is a
significant window during which the parquet files are incomplete and othe