This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new e019e8720fb [SPARK-44600][INFRA] Make `repl` module to pass Maven 
daily testing
e019e8720fb is described below

commit e019e8720fb5495c990735976ed4b50c3a006804
Author: yangjie01 <yangji...@baidu.com>
AuthorDate: Fri Aug 4 17:18:24 2023 +0800

    [SPARK-44600][INFRA] Make `repl` module to pass Maven daily testing
    
    ### What changes were proposed in this pull request?
    The following situation exists in the Spark code:
    1. The `repl` module depends on the `core` module, which in turn depends on 
the `common/network-common` module.
    2. The `common/network-common` module will perform `shade+relocation` 
operation on the guava dependency (dependency scope is compile), while `core` 
will only perform `relocation` operation on the Guava dependency (dependency 
scope is provided)
    
    So when we conduct Maven tests on the `repl` module, it is necessary to 
depend on both the jar package of `core` module and `common/network-common` 
module or both `target/classes` directory of `core` module and 
`common/network-common` module, and when the Maven test command leads to the 
dependence on the jar package of `core` module and  `target/classes` directory 
of `common/network-common` module, test failures will occure due to 
`shade+relocation` issue.
    
    For example:
    
    if we run `build/mvn test -pl common/network-common,repl`
    
    The test `Dependencies classpath` is 
`/Users/yangjie01/.m2/repository/org/apache/spark/spark-core_2.12/4.0.0-SNAPSHOT/spark-core_2.12-4.0.0-SNAPSHOT.jar:.../Users/yangjie01/SourceCode/git/spark-mine-12/common/network-common/target/scala-2.12/classes:...`
    
    `repl` module test failed as follows:
    
    ```
    *** RUN ABORTED ***
      java.lang.NoClassDefFoundError: org/sparkproject/guava/cache/CacheLoader
      at org.apache.spark.SparkConf.loadFromSystemProperties(SparkConf.scala:75)
      at org.apache.spark.SparkConf.<init>(SparkConf.scala:70)
      at org.apache.spark.SparkConf.<init>(SparkConf.scala:59)
      at org.apache.spark.repl.Main$.<init>(Main.scala:37)
      at org.apache.spark.repl.Main$.<clinit>(Main.scala)
      at org.apache.spark.repl.ReplSuite.$anonfun$new$1(ReplSuite.scala:94)
      at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
      at 
org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
      at org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
      at org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
      ...
      Cause: java.lang.ClassNotFoundException: 
org.sparkproject.guava.cache.CacheLoader
      at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
      at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
      at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
      at org.apache.spark.SparkConf.loadFromSystemProperties(SparkConf.scala:75)
      at org.apache.spark.SparkConf.<init>(SparkConf.scala:70)
      at org.apache.spark.SparkConf.<init>(SparkConf.scala:59)
      at org.apache.spark.repl.Main$.<init>(Main.scala:37)
      at org.apache.spark.repl.Main$.<clinit>(Main.scala)
      at org.apache.spark.repl.ReplSuite.$anonfun$new$1(ReplSuite.scala:94)
    ```
    
    the test failed due to `core.jar` already relocation Guava class path, but 
the content in `network-common/target/scala-2.12/classes ` has not yet 
undergone `shaded + relocation` for Guava, the is determined by the lifecycle 
executed by Maven.
    
    But when we execute `build/mvn clean install -pl common/network-common,repl`
    
    The test `Dependencies classpath` is 
`/Users/yangjie01/.m2/repository/org/apache/spark/spark-core_2.12/4.0.0-SNAPSHOT/spark-core_2.12-4.0.0-SNAPSHOT.jar:.../Users/yangjie01/SourceCode/git/spark-mine-12/common/network-common/target/spark-network-common_2.12-4.0.0-SNAPSHOT.jar:...`
    
    And All tests passed.
    
    The failure of the `repl` module test in the Maven daily test is due to 
similar reason: 
https://github.com/apache/spark/actions/runs/5751080986/job/15589117861
    
    <img width="1479" alt="image" 
src="https://github.com/apache/spark/assets/1475305/be385810-2173-4b77-898d-0f452ce65088";>
    
    The possible solutions are as follows:
    1. Force the use of `network-common.jar` as a test dependency in the above 
scenario during Maven testing, but I haven’t found a solution that can be 
confirmed as viable (consulted gpt-4)
    2. Make the `core` module also perform shading on Guava, but this would 
increase the size of the core.jar by 15+% (14660516 bytes -> 16871122 bytes)
    3. Move the `common/network-common` module to a separate group for testing 
to avoid similar problems, but this would waste some GA resources.
    4. Move the `repl` module to another group to avoid this issue, but the 
remaining modules in the original group need to ensure that they can pass the 
test.
    
    Ultimately, this PR chose method 4, moving the `repl` module and 
`hive-thriftserver` into the same group, while also verifying through GAthat 
the `repl` module and the original group can both pass the test.
    
    ### Why are the changes needed?
    Make `repl` module to pass Maven daily testing. After this PR, only the 
`connector/connect/client/jvm` and `connector/connect/server` modules will have 
Maven test failures.
    
    ### Does this PR introduce _any_ user-facing change?
    No, just for test.
    
    ### How was this patch tested?
    - Verified and passed using GitHub Action.
    
    https://github.com/LuciferYang/spark/actions/runs/5745978299/job/15580477796
    
    <img width="1369" alt="image" 
src="https://github.com/apache/spark/assets/1475305/4d5cdd71-2ec6-40ff-9996-0b64bc0e7b9e";>
    
    Closes #42291 from LuciferYang/maven-repl.
    
    Authored-by: yangjie01 <yangji...@baidu.com>
    Signed-off-by: yangjie01 <yangji...@baidu.com>
---
 .github/workflows/maven_test.yml | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/.github/workflows/maven_test.yml b/.github/workflows/maven_test.yml
index 48a4d6b5ff9..618ab69ba59 100644
--- a/.github/workflows/maven_test.yml
+++ b/.github/workflows/maven_test.yml
@@ -57,11 +57,11 @@ jobs:
           - hive2.3
         modules:
           - >-
-            
core,repl,launcher,common#unsafe,common#kvstore,common#network-common,common#network-shuffle,common#sketch
+            
core,launcher,common#unsafe,common#kvstore,common#network-common,common#network-shuffle,common#sketch
           - >-
             graphx,streaming,mllib-local,mllib,hadoop-cloud
           - >-
-            sql#hive-thriftserver
+            repl,sql#hive-thriftserver
           - >-
             
connector#kafka-0-10,connector#kafka-0-10-sql,connector#kafka-0-10-token-provider,connector#spark-ganglia-lgpl,connector#protobuf,connector#avro
           - >-
@@ -187,9 +187,9 @@ jobs:
             ./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pmesos 
-Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud 
-Pspark-ganglia-lgpl -Djava.version=${JAVA_VERSION/-ea} 
-Dtest.exclude.tags="$EXCLUDED_TAGS" test -fae
           elif [[ "$MODULES_TO_TEST" == "connect" ]]; then
             ./build/mvn $MAVEN_CLI_OPTS -Djava.version=${JAVA_VERSION/-ea} -pl 
connector/connect/client/jvm,connector/connect/common,connector/connect/server 
test -fae
-          elif [[ "$MODULES_TO_TEST" == "sql#hive-thriftserver" ]]; then
+          elif [[ "$MODULES_TO_TEST" == *"sql#hive-thriftserver"* ]]; then
             # To avoid a compilation loop, for the `sql/hive-thriftserver` 
module, run `clean install` instead
-            ./build/mvn $MAVEN_CLI_OPTS -pl sql/hive-thriftserver -Phive 
-Phive-thriftserver -Djava.version=${JAVA_VERSION/-ea} clean install -fae
+            ./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pmesos 
-Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud 
-Pspark-ganglia-lgpl -Djava.version=${JAVA_VERSION/-ea} clean install -fae
           else
             ./build/mvn $MAVEN_CLI_OPTS -pl "$TEST_MODULES" -Pyarn -Pmesos 
-Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Pspark-ganglia-lgpl 
-Phadoop-cloud -Djava.version=${JAVA_VERSION/-ea} test -fae
           fi


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to