[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1321786141 > One things that I know need to be addressed are: Some merge data infos are not saved on the driver because they are too small ( controlled by `spark.shuffle.push.minShuffleSizeToWait`) please see https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L2295 @mridulm sorry, in my previous implementation, I needed to pass the reduceid to the external shuffle service, but I found a problem, the driver cannot record the complete merged reduceId (see my comment for the reason)... But I had changed my implementation, so it may not be a problem (we can save merged reduceIds in shuffle service, please see ![codes](https://github.com/apache/spark/pull/38560/files#diff-d544fbb952b61283b3d18ca10a5027528efc4f2f65047130da015ae7754c117fR791). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1321687314 > @mridulm @wankunde @otterc I'm not sure if I missed any logic, please help review my code , thanks~ I will improve my code style later. Now I don't change my code in `BlockManagerMasterEndpoint` as #37922 do . Can it be split into two PRs, I implement the code of the shuffle service part first and @wankunde finish the rest part since he has done? Some thoughts on these changes I wrote here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1321685369 @mridulm I will speed up to finish the unfinished parts of the previous PR together in this PR. From your comments in the previous PR https://github.com/apache/spark/pull/37922#discussion_r990753031 and https://github.com/apache/spark/pull/37922#discussion_r990763769 . I had addressed these in my PR , please help review if it is suitable: https://github.com/apache/spark/pull/38560/files#diff-d544fbb952b61283b3d18ca10a5027528efc4f2f65047130da015ae7754c117fR421 and https://github.com/apache/spark/pull/38560/files#diff-d544fbb952b61283b3d18ca10a5027528efc4f2f65047130da015ae7754c117fR514 Yes , there are nontrivial overlap between https://github.com/apache/spark/pull/37922 , I can cherry pick some codes and fix the comments in this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1316389795 @mridulm as your comment said https://github.com/apache/spark/pull/37922#discussion_r990763769 , I want to Improve this part of the deletion logic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1311193090 my latest implementation no longer passes reduceIds from driver, there are still some code style improvements, just some rough implementation for now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1311176907 @mridulm Yes...These two issues are the similar. @wankunde Can I continue editing my PR in this Issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1310162375 > I am wondering whether the driver needs to pass the merged reduceId to the external shuffle service (but now the driver cannot fully record merged info), or the shuffle service records the merged reduceIds, and driver only need to pass the shuffleId and other information I decided to change the driver to not send reduceId, because only the shuffle service finally understands which shuffle data is stored, no matter how the driver processes or processes the message -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1308437041 I am wondering whether the driver needs to pass the merged reduceId to the external shuffle service, or the shuffle service records the merged reduceIds, and the subsequent driver nodes only need to pass the shuffleId and other information -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1307984001 The two things that I know need to be addressed are: 1. Some merge data blocks are not saved on the driver because they are too small ( controlled by `spark.shuffle.push.minShuffleSizeToWait`) please see https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L2295 2. The meaning of the `externalShuffleServiceRemoveShuffleEnabled` parameter is not consistent with my current function, but the name is very similar. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org