yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1321786141
> One things that I know need to be addressed are: Some merge data infos are not saved on the driver because they are too small ( controlled by `spark.shuffle.push.minShuffleSizeToWait`) please see https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L2295 @mridulm sorry, in my previous implementation, I needed to pass the reduceid to the external shuffle service, but I found a problem, the driver cannot record the complete merged reduceId (see my comment for the reason)... But I had changed my implementation, so it may not be a problem (we can save merged reduceIds in shuffle service, please see . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
