Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19404
I think the sync is important, but that you just need to handle the case of
"fs doesn't support it".
Thinking about this a bit more, I didn't like my proposed patch. Better to
have
* probe for feature after open(), through a check for implementing
Syncable, and then calling `hflush()`. It's the lower cost call and if you
implement one, you have to implement the other.
* if hflush fails, don't use sync, so set `syncable: Optional<Syncable>` to
None
* when checkpointing, go `syncable.map(_.hsync())`. Which is the core of
your current patch
you will take a perf hit on the sync, as on HDFS you won't get it returning
until it has been written down the entire replication chain. But after that,
you've got a guarantee of durability, which is what checkpoints tend to
expect...
(side topic: some JIRAs on Flink checkpointing to other stores, especially
[FLINK-9061](https://issues.apache.org/jira/browse/FLINK-9061)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]