[ https://issues.apache.org/jira/browse/SPARK-45762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mridul Muralidharan resolved SPARK-45762. ----------------------------------------- Resolution: Fixed Issue resolved by pull request 43627 [https://github.com/apache/spark/pull/43627] > Shuffle managers defined in user jars are not available for some launch modes > ----------------------------------------------------------------------------- > > Key: SPARK-45762 > URL: https://issues.apache.org/jira/browse/SPARK-45762 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.5.0 > Reporter: Alessandro Bellina > Assignee: Alessandro Bellina > Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > Starting a spark job in standalone mode with a custom `ShuffleManager` > provided in a jar via `--jars` does not work. This can also be experienced in > local-cluster mode. > The approach that works consistently is to copy the jar containing the custom > `ShuffleManager` to a specific location in each node then add it to > `spark.driver.extraClassPath` and `spark.executor.extraClassPath`, but we > would like to move away from setting extra configurations unnecessarily. > Example: > {code:java} > $SPARK_HOME/bin/spark-shell \ > --master spark://127.0.0.1:7077 \ > --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \ > --jars user-code.jar > {code} > This yields `java.lang.ClassNotFoundException` in the executors. > {code:java} > Exception in thread "main" java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1915) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:436) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:425) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.spark.examples.TestShuffleManager > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:467) > at > org.apache.spark.util.SparkClassUtils.classForName(SparkClassUtils.scala:41) > at > org.apache.spark.util.SparkClassUtils.classForName$(SparkClassUtils.scala:36) > at org.apache.spark.util.Utils$.classForName(Utils.scala:95) > at > org.apache.spark.util.Utils$.instantiateSerializerOrShuffleManager(Utils.scala:2574) > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366) > at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:255) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:487) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61) > at > java.base/java.security.AccessController.doPrivileged(AccessController.java:712) > at java.base/javax.security.auth.Subject.doAs(Subject.java:439) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > ... 4 more > {code} > We can change our command to use `extraClassPath`: > {code:java} > $SPARK_HOME/bin/spark-shell \ > --master spark://127.0.0.1:7077 \ > --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \ > --conf spark.driver.extraClassPath=user-code.jar \ > --conf spark.executor.extraClassPath=user-code.jar > {code} > Success after adding the jar to `extraClassPath`: > {code:java} > 23/10/26 12:58:26 INFO TransportClientFactory: Successfully created > connection to localhost/127.0.0.1:33053 after 7 ms (0 ms spent in bootstraps) > 23/10/26 12:58:26 WARN TestShuffleManager: Instantiated TestShuffleManager!! > 23/10/26 12:58:26 INFO DiskBlockManager: Created local directory at > /tmp/spark-cb101b05-c4b7-4ba9-8b3d-5b23baa7cb46/executor-5d5335dd-c116-4211-9691-87d8566017fd/blockmgr-2fcb1ab2-d886-4444-8c7f-9dca2c880c2c > {code} > We would like to change startup order such that the original command > succeeds, without specifying `extraClassPath`: > {code:java} > $SPARK_HOME/bin/spark-shell \ > --master spark://127.0.0.1:7077 \ > --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \ > --jars user-code.jar > {code} > Proposed changes: > Refactor code so we initialize the `ShuffleManager` later, after jars have > been localized. This is especially necessary in the executor, where we would > need to move this initialization until after the `replClassLoader` is updated > with jars passed in `--jars`. > Today, the `ShuffleManager` is instantiated at `SparkEnv` creation. Having to > instantiate the `ShuffleManager` this early doesn't work, because user jars > have not been localized in all scenarios, and we will fail to load the > `ShuffleManager`. We propose moving the `ShuffleManager` instantiation to > `SparkContext` on the driver, and Executor, to help with this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org