[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

steveloughran Thu, 22 Nov 2018 05:20:02 -0800

Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/21066
  
    +1
    
    one thing to consider here is to be ruthless about when there are things in 
bits of the HDFS APIs/libraries which don't suit, and rather than think "how do 
we work around this", think "what do we need to do to get this fixed". 
    
    This includes (base on the HBase & Hive experiences)
    * what's marked stable
    * serialization of classes
    * pulling up of operations from HDFS to the public FileSystem API (source 
of some contention there between myself and the hdfs team as to what 
constitutes acceptable specification and tests)
    * thread safety (HBase & encrypted IO)
    * various constants in HDFS interfaces tagged as private.
    etc.
    
    BTW, I'm thinking of retiring the MRv1 commit APIs: initially marking as 
deprecated. I'd match that with something to pre-emptively move spark onto the 
V2 one. After all, it's all bridged internally.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21066: [SPARK-23977][CLOUD][WIP] Add commit protocol binding to...

Reply via email to