[GitHub] [spark] pan3793 commented on pull request #34934: [SPARK-37675][CORE][SHUFFLE] Return PushMergedRemoteMetaFailedFetchResult if no available push-merged block

GitBox Thu, 20 Jan 2022 18:34:10 -0800


pan3793 commented on pull request #34934:
URL: https://github.com/apache/spark/pull/34934#issuecomment-1018109938



   > I think that the partitions that the reducer is trying to read are not 
finalized by the shuffle server. This log statement would have shown that. I 
still don't know how the reducer is getting these merged block but at least we 
can rule out the shuffle service.
   
   Is there possible that shuffle service overwites the finalized merged blocks?
   
   > There is also a possibility that push-based shuffle has a conflict with 
AQE. Do you see the same issue when 
`spark.sql.adaptive.coalescePartitions.enabled` and 
`spark.sql.adaptive.fetchShuffleBlocksInBatch` are disabled?
   
   I think `fetchShuffleBlocksInBatch` should not affect merged blocks, but 
anyway, let me try.
   
   BTW, I will lose the access of this test cluster after 1.31, hopes we can 
find the root cause before then.
   
   
https://github.com/apache/spark/blob/ef0055418ee065de924bc1e9b8c5b31587068dea/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala#L486-L536


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] pan3793 commented on pull request #34934: [SPARK-37675][CORE][SHUFFLE] Return PushMergedRemoteMetaFailedFetchResult if no available push-merged block

Reply via email to