[GitHub] [spark] mridulm commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-20 Thread GitBox


mridulm commented on PR #38333:
URL: https://github.com/apache/spark/pull/38333#issuecomment-1321529913

   I was on two minds whether to fix this in 3.3 as well ...
   Yes, 3.3 is affected by it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-20 Thread GitBox


mridulm commented on PR #38333:
URL: https://github.com/apache/spark/pull/38333#issuecomment-1321511612

   Merged to master.
   Thanks for fixing this @gaoyajun02 !
   Thanks for the review @otterc :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-16 Thread GitBox


mridulm commented on PR #38333:
URL: https://github.com/apache/spark/pull/38333#issuecomment-1318228713

   The test failure looks unrelated, can you retrigger the tests @gaoyajun02 ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-15 Thread GitBox


mridulm commented on PR #38333:
URL: https://github.com/apache/spark/pull/38333#issuecomment-1315695999

   Also, can you please update to latest master @gaoyajun02 ? Not sure why we 
are seeing the linter failure in build


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-15 Thread GitBox


mridulm commented on PR #38333:
URL: https://github.com/apache/spark/pull/38333#issuecomment-1315692121

   There is a pending 
[comment](https://github.com/apache/spark/pull/38333/files#r1019735633), can 
you take a look at it @gaoyajun02 ? Thx 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-10 Thread GitBox


mridulm commented on PR #38333:
URL: https://github.com/apache/spark/pull/38333#issuecomment-1311084796

   > I think it is more efficient to fallback and fetch map outputs instead of 
failing the stage and regenerating the data of the partition. When the corrupt 
blocks are merged shuffle blocks or chunks we don't retry to fetch them anyways 
and fallback immediately.
   
   Sounds good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-02 Thread GitBox


mridulm commented on PR #38333:
URL: https://github.com/apache/spark/pull/38333#issuecomment-1301464492

   If there are hardware issues which are causing failures - it is better to 
move the nodes to deny list and prevent them from getting used: we will keep 
seeing more failures, including for vanilla shuffle.
   
   On other hand, I can also look at this as a data corruption issue - @otterc 
what was the plan around how we support shuffle corruption diagnosis for push 
based shuffle (SPARK-36206, etc). Is the expectation that we fallback ? Or we 
diagnose + fail ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-01 Thread GitBox


mridulm commented on PR #38333:
URL: https://github.com/apache/spark/pull/38333#issuecomment-1299631344

   For cases like this, it might actually be better to add the node to deny 
list and fail the task to recompute the parent stage ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-10-21 Thread GitBox


mridulm commented on PR #38333:
URL: https://github.com/apache/spark/pull/38333#issuecomment-1287007388

   +CC @otterc 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org