Re: [Structured Streaming] Commit protocol to move temp files to dest path only when complete, with code

2018-03-20 Thread dcam
I'm just circling back to this now. Is the commit protocol an acceptable way of making this configureable? I could make the temp path (currently "_temporary") configureable if that is what you are referring to. Michael Armbrust wrote > We didn't go this way initially because it doesn't work on

Re: [Structured Streaming] Commit protocol to move temp files to dest path only when complete, with code

2018-02-09 Thread Michael Armbrust
We didn't go this way initially because it doesn't work on storage systems that have weaker guarantees than HDFS with respect to rename. That said, I'm happy to look at other options if we want to make this configurable. On Fri, Feb 9, 2018 at 2:53 PM, Dave Cameron

[Structured Streaming] Commit protocol to move temp files to dest path only when complete, with code

2018-02-09 Thread Dave Cameron
Hi I have a Spark structured streaming job that reads from Kafka and writes parquet files to Hive/HDFS. The files are not very large, but the Kafka source is noisy so each spark job takes a long time to complete. There is a significant window during which the parquet files are incomplete and