Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19404
Problem here is that a stream which doesn't implement hflush/hsync is
required to throw an exception; it's a way of guaranteeing that if hsync/hflush
does complete, the action has done what you want - HBase &c utterly depend on
this.
The fact that FSDataOutputStream implements Syncable and yet streams it may
relay to may not is the whole reason for
[HDFS-11644](https://issues.apache.org/jira/browse/HDFS-11644) and the
`StreamCapabilities` method. As with Erasure Coding, even HDFS streams may not
support hflush/hsync
This patch is at risk of raising an exception whenever it tries to call
`hflush()` on non HDFS store or HDFS with Erasure Coding enabled. IF you were
targeting Hadoop 2.9+ you could just check `hasCapability("hsync")` use it if
present. For Hadoop 2.6+ you'll have to call `out.hflush()` on the first
attempt, if any exception (IOE, UnsupportedOperationException, RTE) is raised,
catch, swallow and never try to hflush again.
Sorry, it's messy: its why I'd like that `hasCapability(`) probe up for all
features which are only intermittently available. Can complicate caller code if
you want to know these things, but stops you getting caught out when you really
want to know the durability semantics of the FS.
see also WiP
[OutputStream](https://github.com/steveloughran/hadoop/blob/s3/HADOOP-13327-outputstream-trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md)
(thanks for mentioning me BTW; this is one of those things that would
probably work well in local tests but blow up in production somewhere)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]