[GitHub] [spark] dbtsai opened a new pull request #28376: [SPARK-31582] [Yarn] Able to not populate Hadoop classpath

GitBox Mon, 27 Apr 2020 16:00:55 -0700


dbtsai opened a new pull request #28376:
URL: https://github.com/apache/spark/pull/28376



   ### What changes were proposed in this pull request?
   We are adding a new Yarn configuration to not populate hadoop classpath from 
`yarn.application.classpath` and ``mapreduce.application.classpath`.
   
   ### Why are the changes needed?
   Spark Yarn client will populate hadoop classpath from 
`yarn.application.classpath` and `mapreduce.application.classpath`. 
   
   However, for Spark with embedded hadoop build, it can cause jar conflicts 
because spark distribution can contain different version of hadoop jars.
   
   Typically situation is when an user uses an Apache Spark distribution with 
its-own embedded hadoop, and submits a job to Cloudera or Hortonworks Yarn 
clusters, because of two different incompatible hadoop jars in the classpath, 
it can run into errors.
   
   ### Does this PR introduce any user-facing change?
   No.
   
   ### How was this patch tested?
   It's hard to add UTs for this configuration since this requires using 
different incompatible versions of hadoop. We manually tested this PR, and we 
are able to submit a Spark job using Spark distribution built with Apache 
Hadoop 2.10 to CDH 5.6 without populating CDH classpath.   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dbtsai opened a new pull request #28376: [SPARK-31582] [Yarn] Able to not populate Hadoop classpath

Reply via email to