This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new e019e8720fb [SPARK-44600][INFRA] Make `repl` module to pass Maven daily testing e019e8720fb is described below commit e019e8720fb5495c990735976ed4b50c3a006804 Author: yangjie01 <yangji...@baidu.com> AuthorDate: Fri Aug 4 17:18:24 2023 +0800 [SPARK-44600][INFRA] Make `repl` module to pass Maven daily testing ### What changes were proposed in this pull request? The following situation exists in the Spark code: 1. The `repl` module depends on the `core` module, which in turn depends on the `common/network-common` module. 2. The `common/network-common` module will perform `shade+relocation` operation on the guava dependency (dependency scope is compile), while `core` will only perform `relocation` operation on the Guava dependency (dependency scope is provided) So when we conduct Maven tests on the `repl` module, it is necessary to depend on both the jar package of `core` module and `common/network-common` module or both `target/classes` directory of `core` module and `common/network-common` module, and when the Maven test command leads to the dependence on the jar package of `core` module and `target/classes` directory of `common/network-common` module, test failures will occure due to `shade+relocation` issue. For example: if we run `build/mvn test -pl common/network-common,repl` The test `Dependencies classpath` is `/Users/yangjie01/.m2/repository/org/apache/spark/spark-core_2.12/4.0.0-SNAPSHOT/spark-core_2.12-4.0.0-SNAPSHOT.jar:.../Users/yangjie01/SourceCode/git/spark-mine-12/common/network-common/target/scala-2.12/classes:...` `repl` module test failed as follows: ``` *** RUN ABORTED *** java.lang.NoClassDefFoundError: org/sparkproject/guava/cache/CacheLoader at org.apache.spark.SparkConf.loadFromSystemProperties(SparkConf.scala:75) at org.apache.spark.SparkConf.<init>(SparkConf.scala:70) at org.apache.spark.SparkConf.<init>(SparkConf.scala:59) at org.apache.spark.repl.Main$.<init>(Main.scala:37) at org.apache.spark.repl.Main$.<clinit>(Main.scala) at org.apache.spark.repl.ReplSuite.$anonfun$new$1(ReplSuite.scala:94) at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127) at org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282) at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231) at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230) ... Cause: java.lang.ClassNotFoundException: org.sparkproject.guava.cache.CacheLoader at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:419) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:352) at org.apache.spark.SparkConf.loadFromSystemProperties(SparkConf.scala:75) at org.apache.spark.SparkConf.<init>(SparkConf.scala:70) at org.apache.spark.SparkConf.<init>(SparkConf.scala:59) at org.apache.spark.repl.Main$.<init>(Main.scala:37) at org.apache.spark.repl.Main$.<clinit>(Main.scala) at org.apache.spark.repl.ReplSuite.$anonfun$new$1(ReplSuite.scala:94) ``` the test failed due to `core.jar` already relocation Guava class path, but the content in `network-common/target/scala-2.12/classes ` has not yet undergone `shaded + relocation` for Guava, the is determined by the lifecycle executed by Maven. But when we execute `build/mvn clean install -pl common/network-common,repl` The test `Dependencies classpath` is `/Users/yangjie01/.m2/repository/org/apache/spark/spark-core_2.12/4.0.0-SNAPSHOT/spark-core_2.12-4.0.0-SNAPSHOT.jar:.../Users/yangjie01/SourceCode/git/spark-mine-12/common/network-common/target/spark-network-common_2.12-4.0.0-SNAPSHOT.jar:...` And All tests passed. The failure of the `repl` module test in the Maven daily test is due to similar reason: https://github.com/apache/spark/actions/runs/5751080986/job/15589117861 <img width="1479" alt="image" src="https://github.com/apache/spark/assets/1475305/be385810-2173-4b77-898d-0f452ce65088"> The possible solutions are as follows: 1. Force the use of `network-common.jar` as a test dependency in the above scenario during Maven testing, but I haven’t found a solution that can be confirmed as viable (consulted gpt-4) 2. Make the `core` module also perform shading on Guava, but this would increase the size of the core.jar by 15+% (14660516 bytes -> 16871122 bytes) 3. Move the `common/network-common` module to a separate group for testing to avoid similar problems, but this would waste some GA resources. 4. Move the `repl` module to another group to avoid this issue, but the remaining modules in the original group need to ensure that they can pass the test. Ultimately, this PR chose method 4, moving the `repl` module and `hive-thriftserver` into the same group, while also verifying through GAthat the `repl` module and the original group can both pass the test. ### Why are the changes needed? Make `repl` module to pass Maven daily testing. After this PR, only the `connector/connect/client/jvm` and `connector/connect/server` modules will have Maven test failures. ### Does this PR introduce _any_ user-facing change? No, just for test. ### How was this patch tested? - Verified and passed using GitHub Action. https://github.com/LuciferYang/spark/actions/runs/5745978299/job/15580477796 <img width="1369" alt="image" src="https://github.com/apache/spark/assets/1475305/4d5cdd71-2ec6-40ff-9996-0b64bc0e7b9e"> Closes #42291 from LuciferYang/maven-repl. Authored-by: yangjie01 <yangji...@baidu.com> Signed-off-by: yangjie01 <yangji...@baidu.com> --- .github/workflows/maven_test.yml | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/.github/workflows/maven_test.yml b/.github/workflows/maven_test.yml index 48a4d6b5ff9..618ab69ba59 100644 --- a/.github/workflows/maven_test.yml +++ b/.github/workflows/maven_test.yml @@ -57,11 +57,11 @@ jobs: - hive2.3 modules: - >- - core,repl,launcher,common#unsafe,common#kvstore,common#network-common,common#network-shuffle,common#sketch + core,launcher,common#unsafe,common#kvstore,common#network-common,common#network-shuffle,common#sketch - >- graphx,streaming,mllib-local,mllib,hadoop-cloud - >- - sql#hive-thriftserver + repl,sql#hive-thriftserver - >- connector#kafka-0-10,connector#kafka-0-10-sql,connector#kafka-0-10-token-provider,connector#spark-ganglia-lgpl,connector#protobuf,connector#avro - >- @@ -187,9 +187,9 @@ jobs: ./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pmesos -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl -Djava.version=${JAVA_VERSION/-ea} -Dtest.exclude.tags="$EXCLUDED_TAGS" test -fae elif [[ "$MODULES_TO_TEST" == "connect" ]]; then ./build/mvn $MAVEN_CLI_OPTS -Djava.version=${JAVA_VERSION/-ea} -pl connector/connect/client/jvm,connector/connect/common,connector/connect/server test -fae - elif [[ "$MODULES_TO_TEST" == "sql#hive-thriftserver" ]]; then + elif [[ "$MODULES_TO_TEST" == *"sql#hive-thriftserver"* ]]; then # To avoid a compilation loop, for the `sql/hive-thriftserver` module, run `clean install` instead - ./build/mvn $MAVEN_CLI_OPTS -pl sql/hive-thriftserver -Phive -Phive-thriftserver -Djava.version=${JAVA_VERSION/-ea} clean install -fae + ./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pmesos -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pspark-ganglia-lgpl -Djava.version=${JAVA_VERSION/-ea} clean install -fae else ./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pmesos -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Pspark-ganglia-lgpl -Phadoop-cloud -Djava.version=${JAVA_VERSION/-ea} test -fae fi --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org