[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-21 Thread GitBox
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1321786141 > One things that I know need to be addressed are: Some merge data infos are not saved on the driver because they are too small ( controlled by `spark.shuffle.push.minShuffleSizeToWait`)

[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-21 Thread GitBox
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1321687314 > @mridulm @wankunde @otterc I'm not sure if I missed any logic, please help review my code , thanks~ I will improve my code style later. Now I don't change my code in

[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-21 Thread GitBox
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1321685369 @mridulm I will speed up to finish the unfinished parts of the previous PR together in this PR. From your comments in the previous PR

[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-15 Thread GitBox
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1316389795 @mridulm as your comment said https://github.com/apache/spark/pull/37922#discussion_r990763769 , I want to Improve this part of the deletion logic -- This is an automated message from

[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-10 Thread GitBox
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1311193090 my latest implementation no longer passes reduceIds from driver, there are still some code style improvements, just some rough implementation for now -- This is an automated message

[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-10 Thread GitBox
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1311176907 @mridulm Yes...These two issues are the similar. @wankunde Can I continue editing my PR in this Issue? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-10 Thread GitBox
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1310162375 > I am wondering whether the driver needs to pass the merged reduceId to the external shuffle service (but now the driver cannot fully record merged info), or the shuffle service records

[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-09 Thread GitBox
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1308437041 I am wondering whether the driver needs to pass the merged reduceId to the external shuffle service, or the shuffle service records the merged reduceIds, and the subsequent driver nodes

[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-08 Thread GitBox
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1307984001 The two things that I know need to be addressed are: 1. Some merge data blocks are not saved on the driver because they are too small ( controlled by