abellina opened a new pull request, #43627:
URL: https://github.com/apache/spark/pull/43627

   ### What changes were proposed in this pull request?
   As reported here https://issues.apache.org/jira/browse/SPARK-45762, 
`ShuffleManager` instances defined in a user jar cannot be use in all cases, 
unless specified in the `extraClassPath`. We would like to avoid adding extra 
configurations if this instance is already included in a jar passed via 
`--jars`.
   
   Proposed changes:
   
   Refactor code so we initialize the `ShuffleManager` later, after jars have 
been localized. This is especially necessary in the executor, where we would 
need to move this initialization until after the `replClassLoader` is updated 
with jars passed in `--jars`.
   
   Before this change, the `ShuffleManager` is instantiated at `SparkEnv` 
creation. Having to instantiate the `ShuffleManager` this early doesn't work, 
because user jars have not been localized in all scenarios, and we will fail to 
load the `ShuffleManager` defined in `--jars`. We propose moving the 
`ShuffleManager` instantiation to `SparkContext` on the driver, and `Executor`.
   
   ### Why are the changes needed?
   This is not a new API but a change of startup order. The changed are needed 
to improve the user experience for the user by reducing extra configurations 
depending on how a spark application is launched.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, but it's backwards compatible. Users no longer need to specify a 
`ShuffleManager` jar in `extraClassPath`, but they are able to if they desire.
   
   ### How was this patch tested?
   Added a unit test showing that a test `ShuffleManager` is available after 
`--jars` are passed, but not without (using local-cluster mode). 
   
   Tested manually with standalone mode, local-cluster mode, yarn client and 
cluster mode, k8s.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to