Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/21066 +1 one thing to consider here is to be ruthless about when there are things in bits of the HDFS APIs/libraries which don't suit, and rather than think "how do we work around this", think "what do we need to do to get this fixed". This includes (base on the HBase & Hive experiences) * what's marked stable * serialization of classes * pulling up of operations from HDFS to the public FileSystem API (source of some contention there between myself and the hdfs team as to what constitutes acceptable specification and tests) * thread safety (HBase & encrypted IO) * various constants in HDFS interfaces tagged as private. etc. BTW, I'm thinking of retiring the MRv1 commit APIs: initially marking as deprecated. I'd match that with something to pre-emptively move spark onto the V2 one. After all, it's all bridged internally.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org