Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/21756
I don't understand the purpose of this change. Can you explain exactly what
part of `SparkHadoopUtil` do you need to customize, and why that change can't
just be made to the shared code?
The bug you filed even contains incorrect information. There is a single
implementation of `SparkHadoopUtil` used by all cluster managers - if you look
at "YarnSparkHadoopUtil.scala", there is only an object there, with some
YARN-specific methods, but the YARN code re-uses the shared `SparkHadoopUtil`.
I actually made that change; and the only reason `SparkHadoopUtil` didn't just
become an object is because it's a semi-public API.
Your code also breaks if you have more than one custom implementation in
the classpath. If your goal is to allow cluster managers to override it, that's
a pretty big hole, since you can't have the built-in cluster managers
overriding them since more often than not they're all in the classpath at the
same time.
The only reason this class was overridden in YARN in the past is because
Spark used to support both hadoop1 and hadoop2, and code in `core/` could not
call hadoop2 APIs (but code in the yarn module could). That is not the case
anymore.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]