bsidhom commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks when executors are lost URL: https://github.com/apache/spark/pull/24462#issuecomment-487864690 @Ngone51 As @jealous mentions, shuffle implementations that maintain their own state outside of the Spark system does not need executors _or_ the built-in Spark shuffle service to serve shuffle data. @gczsjdy By "outside of Spark" I mean that shuffle manager implementations do not need to live inside of (a fork of) the Spark repo but can instead be built and shipped as "plugins"*. Building separately makes it easier and faster to compile, test, and iterate. That's how we currently build the HDFS/HCFS shuffle manager. You can then distribute the jar with a Spark deployment by dropping it under the `$SPARK_HOME/jars` directory; files in that directory (or a configured HDFS location) are then staged and made available to all executors. Unfortunately because the shuffle manager class is [instantiated at the time of `SparkEnv` creation](https://github.com/apache/spark/blob/25ee0474f47d9c30d6f553a7892d9549f91071cf/core/src/main/scala/org/apache/spark/SparkEnv.scala#L321), you can't simply depend on a shuffle manager library from client code and then use it while executing in a distributed mode. * Note that because `ShuffleManager` and related classes are `spark`-private, your shuffle implementation does need to live under a subpackage of `org.apache.spark`, but it can still be compiled separately.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
