HyukjinKwon opened a new pull request #30133: URL: https://github.com/apache/spark/pull/30133
### What changes were proposed in this pull request? This PR proposes to exclude `org.apache.hadoop:hadoop-yarn-server-resourcemanager:jar:tests` from `hadoop-yarn-server-tests`. For some reasons, after SBT 1.3 upgrade at `SPARK-21708`, SBT starts to pull the dependencies of 'hadoop-yarn-server-tests' with 'tests' classifier: ``` org/apache/hadoop/hadoop-common/2.7.4/hadoop-common-2.7.4-tests.jar org/apache/hadoop/hadoop-yarn-common/2.7.4/hadoop-yarn-common-2.7.4-tests.jar org/apache/hadoop/hadoop-yarn-server-resourcemanager/2.7.4/hadoop-yarn-server-resourcemanager-2.7.4-tests.jar ``` these were not pulled before the upgrade. This specific `hadoop-yarn-server-resourcemanager-2.7.4-tests.jar` causes the problem. 1. When the test case creates the Hadoop configuration here, https://github.com/apache/spark/blob/cc06266ade5a4eb35089501a3b32736624208d4c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L122 2. Such jars have higher precedence in the class path, instead of specified custom `core-site.xml`: https://github.com/apache/spark/blob/e93b8f02cd706bedc47c9b55a73f632fe9e61ec3/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1375 3. Later, `core-site.xml` in the jar is picked instead: Before this fix: ``` jar:file:/.../https/maven-central.storage-download.googleapis.com/maven2/org/apache/hadoop/ hadoop-yarn-server-resourcemanager/2.7.4/hadoop-yarn-server-resourcemanager-2.7.4-tests.jar!/core-site.xml% ``` After this fix: ``` file:/.../spark/resource-managers/yarn/target/org.apache.spark.deploy.yarn.YarnClusterSuite/ org.apache.spark.deploy.yarn.YarnClusterSuite-localDir-nm-0_0/ usercache/.../filecache/10/__spark_conf__.zip/__hadoop_conf__/core-site.xml% ``` 4. the `core-site.xml` in the jar of course does not contain: https://github.com/apache/spark/blob/2cfd215dc4fb1ff6865644fec8284ba93dcddd5c/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala#L133-L141 and the specific test fails. This PR uses some kind of hacky approach. It was excluded from 'hadoop-yarn-server-tests' with 'tests' classifier, and then added back as a proper dependency. In this way, SBT does not pull `hadoop-yarn-server-resourcemanager` with `tests` classifier anymore. For the reason why it fails specifically in Hadoop 2, it's unknown. ### Why are the changes needed? To make the build pass. This is a blocker. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually tested and debugged: ```bash build/sbt clean "yarn/testOnly *.YarnClusterSuite -- -z SparkHadoopUtil" -Pyarn -Phadoop-2.7 -Phive -Phive-2.3 ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
