Ngone51 commented on a change in pull request #31763:
URL: https://github.com/apache/spark/pull/31763#discussion_r590959025
##########
File path: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala
##########
@@ -52,6 +52,13 @@ private[spark] sealed trait MapStatus {
* partitionId of the task or taskContext.taskAttemptId is used.
*/
def mapId: Long
+
+ /**
+ * Extra metadata for map status. This could be used by different
ShuffleManager implementation
+ * to store information they need. For example, a Remote Shuffle Service
ShuffleManager could
+ * store shuffle server information and let reducer task know where to fetch
shuffle data.
+ */
+ def metadata: Option[Serializable]
Review comment:
I don't think we should develop like this way...As you mentioned above,
SPARK-33114 can be considered as a subtask of SPARK-25299. So how can we
consider this PR as a first iteration when SPARK-25299 is still under
discussion and development, especially when people haven't reached an agreement
on the solution and has a possible alternative solution at the same time? Also,
I think the custom shuffle manager isn't officially supported by Spark because
the `ShuffleManager` interface is private. So it doesn't make sense for Spark
to add an internal API for un-official use cases if there's no strong reason.
SPARK-31801 is surely big. But as I mentioned early, we can split it. When
the solution is finalized, we can start with refactoring `MapStatus` first. I
think it would be a much smaller task and be enough for your case. And then,
we'll start the remaining work(e.g. use the new `MapStatus` where it was
referenced) but you don't care.
I understand you have paid a lot of effort into this work, and sorry we can
not get it in fast. And, unfortunately, I don't have the permission to merge.
You could persuade committers to merge the PR if you insist on it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]