mridulm commented on a change in pull request #32007:
URL: https://github.com/apache/spark/pull/32007#discussion_r624432860
##########
File path: core/src/main/scala/org/apache/spark/storage/BlockId.scala
##########
@@ -87,6 +87,29 @@ case class ShufflePushBlockId(shuffleId: Int, mapIndex: Int,
reduceId: Int) exte
override def name: String = "shufflePush_" + shuffleId + "_" + mapIndex +
"_" + reduceId
}
+@DeveloperApi
+case class ShuffleMergedBlockId(appId: String, shuffleId: Int, reduceId: Int)
extends BlockId {
+ override def name: String = "mergedShuffle_" + appId + "_" + shuffleId + "_"
+ reduceId + ".data"
+}
+
+@DeveloperApi
+case class ShuffleMergedIndexBlockId(
+ appId: String,
+ shuffleId: Int,
+ reduceId: Int) extends BlockId {
+ override def name: String =
+ "mergedShuffle_" + appId + "_" + shuffleId + "_" + reduceId + ".index"
Review comment:
Btw, forgot to add my thoughts on a new rpc message.
I am referring to adding something like
`RegisterExecutorForPushBasedShuffle` (or some such) in addition to existing
`RegisterExecutor`. That is, send `RegisterExecutorForPushBasedShuffle` after
`RegisterExecutor` succeeds.
A few thoughts:
* If ESS does not support the new RPC, how is the spark application supposed
to behave ?
* Case where `RegisterExecutor` would succeed while
`RegisterExecutorForPushBasedShuffle` would fail due to
`IllegalArgumentException` at ESS due to unrecognized msg id.
* Currently, it would throw a `SparkException` and leads to executor
failure : do we change this behavior ? Or do we simply fail the application due
to unsupported config ?
* Adding the new rpc allows us to decouple executor registration from
whether the executor host should be candidate for hosting mergers or not.
* This will help with future evolution.
* If we are taking this path, it would be better for ESS to manage the
merger location entirely - and not have executors create/update it (as
discussed above). It will help ESS evolve independently.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]