[GitHub] [spark] mridulm commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-07 Thread via GitHub
mridulm commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1459652457 Sure ! Please go ahead :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] mridulm commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-07 Thread via GitHub
mridulm commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1459618430 @jerqi the basic issue here is, `getPreferredLocations` in `ShuffledRowRDD` should return `Nil` at the very beginning in case `spark.shuffle.reduceLocality.enabled = false` We

[GitHub] [spark] mridulm commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-07 Thread via GitHub
mridulm commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1458674205 @jerqi Agree that we should have a way to specify locality preference for disaggregated shuffle implementations to spark scheduler - so that shuffle tasks are closer to the data.

[GitHub] [spark] mridulm commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub
mridulm commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1457315803 The test failure is unrelated, so existing tests work fine - will work on specifically checking for the changes in this PR later today. -- This is an automated message from the Apache

[GitHub] [spark] mridulm commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub
mridulm commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-145754 We are evaluating it currently @dongjoon-hyun :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] mridulm commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored

2023-03-06 Thread via GitHub
mridulm commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1456844136 This is still WIP, but want to get early feedback. +CC @Ngone51, @otterc, @waitinfuture -- This is an automated message from the Apache Git Service. To respond to the message, please