dbtsai opened a new pull request #28376: URL: https://github.com/apache/spark/pull/28376
### What changes were proposed in this pull request? We are adding a new Yarn configuration to not populate hadoop classpath from `yarn.application.classpath` and ``mapreduce.application.classpath`. ### Why are the changes needed? Spark Yarn client will populate hadoop classpath from `yarn.application.classpath` and `mapreduce.application.classpath`. However, for Spark with embedded hadoop build, it can cause jar conflicts because spark distribution can contain different version of hadoop jars. Typically situation is when an user uses an Apache Spark distribution with its-own embedded hadoop, and submits a job to Cloudera or Hortonworks Yarn clusters, because of two different incompatible hadoop jars in the classpath, it can run into errors. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? It's hard to add UTs for this configuration since this requires using different incompatible versions of hadoop. We manually tested this PR, and we are able to submit a Spark job using Spark distribution built with Apache Hadoop 2.10 to CDH 5.6 without populating CDH classpath. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
