Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19788
The idea LGTM, but I think @JoshRosen has a valid concern. My 2 cents:
1. The concept of reading multiple reducer partitions in one shot was
introduced by `ShuffleManager.getReader`. Although it's only used for adaptive
execution in Spark SQL, this is still a feature provided by Spark Core.
2. We should not force users to upgrade the external shuffle service when
upgrading Spark, if they don't use adaptive execution.
3. We should not enable this batch shuffle fetching if the serializer and
compressor don't support it.
4. For better user experience, we should avoid adding more configs if
possible
My proposal: we should enable this feature only if
1. We need to fetch multiple reducer partitions at once, which can only
happen when adaptive execution is enabled, currently.
2. The serializer supports it, i.e.
`Serializer.supportsRelocationOfSerializedObjects` is true
3. The compressor supports it, i.e.
`CompressionCodec.supportsConcatenationOfSerializedStreams` is true
Thus we don't need extra config and we will automatically enable it when
adaptive execution is enabled.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]