HyukjinKwon opened a new pull request #30133:
URL: https://github.com/apache/spark/pull/30133


   ### What changes were proposed in this pull request?
   
   This PR proposes to exclude 
`org.apache.hadoop:hadoop-yarn-server-resourcemanager:jar:tests` from 
`hadoop-yarn-server-tests`.
   
   For some reasons, after SBT 1.3 upgrade at `SPARK-21708`, SBT starts to pull 
the dependencies of 'hadoop-yarn-server-tests'  with 'tests' classifier:
   
   ```
   org/apache/hadoop/hadoop-common/2.7.4/hadoop-common-2.7.4-tests.jar
   org/apache/hadoop/hadoop-yarn-common/2.7.4/hadoop-yarn-common-2.7.4-tests.jar
   
org/apache/hadoop/hadoop-yarn-server-resourcemanager/2.7.4/hadoop-yarn-server-resourcemanager-2.7.4-tests.jar
   ```
   these were not pulled before the upgrade.
   
   This specific `hadoop-yarn-server-resourcemanager-2.7.4-tests.jar` causes 
the problem.
   
   1. When the test case creates the Hadoop configuration here,
   
   
https://github.com/apache/spark/blob/cc06266ade5a4eb35089501a3b32736624208d4c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L122
   
   2. Such jars have higher precedence in the class path, instead of specified 
custom `core-site.xml`:
   
   
https://github.com/apache/spark/blob/e93b8f02cd706bedc47c9b55a73f632fe9e61ec3/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1375
   
   3. Later, `core-site.xml` in the jar is picked instead:
   
   Before this fix:
   
   ```
   
jar:file:/.../https/maven-central.storage-download.googleapis.com/maven2/org/apache/hadoop/
   
hadoop-yarn-server-resourcemanager/2.7.4/hadoop-yarn-server-resourcemanager-2.7.4-tests.jar!/core-site.xml%
   ```
   
   After this fix:
   
   ```
   
file:/.../spark/resource-managers/yarn/target/org.apache.spark.deploy.yarn.YarnClusterSuite/
   org.apache.spark.deploy.yarn.YarnClusterSuite-localDir-nm-0_0/
   usercache/.../filecache/10/__spark_conf__.zip/__hadoop_conf__/core-site.xml%
   ```
   
   4. the `core-site.xml` in the jar of course does not contain:
   
   
https://github.com/apache/spark/blob/2cfd215dc4fb1ff6865644fec8284ba93dcddd5c/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala#L133-L141
   
   and the specific test fails.
   
   This PR uses some kind of hacky approach. It was excluded from  
'hadoop-yarn-server-tests'  with 'tests' classifier, and then added back as a 
proper dependency. In this way, SBT does not pull 
`hadoop-yarn-server-resourcemanager` with `tests` classifier anymore.
   
   For the reason why it fails specifically in Hadoop 2, it's unknown.
   
   ### Why are the changes needed?
   
   To make the build pass. This is a blocker.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, test-only.
   
   ### How was this patch tested?
   
   Manually tested and debugged:
   
   ```bash
   build/sbt clean "yarn/testOnly *.YarnClusterSuite -- -z SparkHadoopUtil" 
-Pyarn -Phadoop-2.7 -Phive -Phive-2.3
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to