[GitHub] spark issue #19404: [SPARK-21760] [Streaming] Fix for Structured streaming t...

steveloughran Thu, 26 Apr 2018 06:16:10 -0700

Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/19404
  
    I think the sync is important, but that you just need to handle the case of 
"fs doesn't support it".
    
    Thinking about this a bit more, I didn't like my proposed patch. Better to 
have
    
    * probe for feature after open(), through a check for implementing 
Syncable, and then calling `hflush()`. It's the lower cost call and if you 
implement one, you have to implement the other.
    * if hflush fails, don't use sync, so set `syncable: Optional<Syncable>` to 
None
    * when checkpointing, go `syncable.map(_.hsync())`. Which is the core of 
your current patch
    
    you will take a perf hit on the sync, as on HDFS you won't get it returning 
until it has been written down the entire replication chain. But after that, 
you've got a guarantee of durability, which is what checkpoints tend to 
expect...
    
    (side topic: some JIRAs on Flink checkpointing to other stores, especially 
[FLINK-9061](https://issues.apache.org/jira/browse/FLINK-9061)




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #19404: [SPARK-21760] [Streaming] Fix for Structured streaming t...

Reply via email to