[GitHub] spark issue #21756: [SPARK-24764] [CORE] Add ServiceLoader implementation fo...

vanzin Tue, 21 Aug 2018 14:57:11 -0700

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21756
  
    I don't understand the purpose of this change. Can you explain exactly what 
part of `SparkHadoopUtil` do you need to customize, and why that change can't 
just be made to the shared code?
    
    The bug you filed even contains incorrect information. There is a single 
implementation of `SparkHadoopUtil` used by all cluster managers - if you look 
at "YarnSparkHadoopUtil.scala", there is only an object there, with some 
YARN-specific methods, but the YARN code re-uses the shared `SparkHadoopUtil`. 
I actually made that change; and the only reason `SparkHadoopUtil` didn't just 
become an object is because it's a semi-public API.
    
    Your code also breaks if you have more than one custom implementation in 
the classpath. If your goal is to allow cluster managers to override it, that's 
a pretty big hole, since you can't have the built-in cluster managers 
overriding them since more often than not they're all in the classpath at the 
same time.
    
    The only reason this class was overridden in YARN in the past is because 
Spark used to support both hadoop1 and hadoop2, and code in `core/` could not 
call hadoop2 APIs (but code in the yarn module could). That is not the case 
anymore.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21756: [SPARK-24764] [CORE] Add ServiceLoader implementation fo...

Reply via email to