Re: [Structured Streaming SPARK-23966] Why non-atomic rename is problem in State Store ?

2018-10-02 Thread Jungtaek Lim
Thanks Steve to answer in detail. I was under same feeling with Chandan from the line as well: it was against my knowledge as rename operation itself in HDFS is atomic, and I didn't imagine it was for tackling object store. I learned a lot for object store from your answer. Thanks again. Jungtaek

Re: [Structured Streaming SPARK-23966] Why non-atomic rename is problem in State Store ?

2018-10-02 Thread chandan prakash
Thanks a lot Steve and Jungtaek for your answers. Steve, You explained really well in depth. I understood that the existing old implementation was not correct for object store like S3. The new implementation will address that. And for better performance we should better choose a Direct Write base

Re: [Structured Streaming SPARK-23966] Why non-atomic rename is problem in State Store ?

2018-10-01 Thread Steve Loughran
On 11 Aug 2018, at 17:33, chandan prakash mailto:chandanbaran...@gmail.com>> wrote: Hi All, I was going through this pull request about new CheckpointFileManager abstraction in structured streaming coming in 2.4 : https://issues.apache.org/jira/browse/SPARK-23966 https://github.com/apache/spar

Re: [Structured Streaming SPARK-23966] Why non-atomic rename is problem in State Store ?

2018-09-30 Thread Jungtaek Lim
Removing user@ since cross-posting multiple mailing lists are considered as not-good practice. My knowledge is based on the codebase after SPARK-23966, so I'm reading SPARK-23966 back and try to explain what I can see in the patch. Anyone please correct me if I'm missing here. You may want to not

Re: [Structured Streaming SPARK-23966] Why non-atomic rename is problem in State Store ?

2018-09-30 Thread chandan prakash
Anyone who can clear doubts on the questions asked here ? Regards, Chandan On Sat, Aug 11, 2018 at 10:03 PM chandan prakash wrote: > Hi All, > I was going through this pull request about new CheckpointFileManager > abstraction in structured streaming coming in 2.4 : > https://issues.apache.or

[Structured Streaming SPARK-23966] Why non-atomic rename is problem in State Store ?

2018-08-11 Thread chandan prakash
Hi All, I was going through this pull request about new CheckpointFileManager abstraction in structured streaming coming in 2.4 : https://issues.apache.org/jira/browse/SPARK-23966 https://github.com/apache/spark/pull/21048 I went through the code in detail and found it will indtroduce a very nice