[GitHub] spark issue #19404: [SPARK-21760] [Streaming] Fix for Structured streaming t...

steveloughran Mon, 23 Apr 2018 10:43:35 -0700

Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/19404
  
    Problem here is that a stream which doesn't implement hflush/hsync is 
required to throw an exception; it's a way of guaranteeing that if hsync/hflush 
does complete, the action has done what you want - HBase &c utterly depend on 
this.
    
    The fact that FSDataOutputStream implements Syncable and yet streams it may 
relay to may not is the whole reason for 
[HDFS-11644](https://issues.apache.org/jira/browse/HDFS-11644) and the 
`StreamCapabilities` method. As with Erasure Coding, even HDFS streams may not 
support hflush/hsync
    
    This patch is at risk of raising an exception whenever it tries to call 
`hflush()` on non HDFS store or HDFS with Erasure Coding enabled. IF you were 
targeting Hadoop 2.9+ you could just check `hasCapability("hsync")` use it if 
present. For Hadoop 2.6+ you'll have to call `out.hflush()` on the first 
attempt, if any exception (IOE, UnsupportedOperationException, RTE) is raised, 
catch, swallow and never try to hflush again. 
    
    Sorry, it's messy: its why I'd like that `hasCapability(`) probe up for all 
features which are only intermittently available. Can complicate caller code if 
you want to know these things, but stops you getting caught out when you really 
want to know the durability semantics of the FS.
    
    see also WiP 
[OutputStream](https://github.com/steveloughran/hadoop/blob/s3/HADOOP-13327-outputstream-trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md)
    
    (thanks for mentioning me BTW; this is one of those things that would 
probably work well in local tests but blow up in production somewhere)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #19404: [SPARK-21760] [Streaming] Fix for Structured streaming t...

Reply via email to