pan3793 commented on pull request #34934: URL: https://github.com/apache/spark/pull/34934#issuecomment-1018109938
> I think that the partitions that the reducer is trying to read are not finalized by the shuffle server. This log statement would have shown that. I still don't know how the reducer is getting these merged block but at least we can rule out the shuffle service. Is there possible that shuffle service overwites the finalized merged blocks? > There is also a possibility that push-based shuffle has a conflict with AQE. Do you see the same issue when `spark.sql.adaptive.coalescePartitions.enabled` and `spark.sql.adaptive.fetchShuffleBlocksInBatch` are disabled? I think `fetchShuffleBlocksInBatch` should not affect merged blocks, but anyway, let me try. BTW, I will lose the access of this test cluster after 1.31, hopes we can find the root cause before then. https://github.com/apache/spark/blob/ef0055418ee065de924bc1e9b8c5b31587068dea/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala#L486-L536 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
