Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19788 The idea LGTM, but I think @JoshRosen has a valid concern. My 2 cents: 1. The concept of reading multiple reducer partitions in one shot was introduced by `ShuffleManager.getReader`. Although it's only used for adaptive execution in Spark SQL, this is still a feature provided by Spark Core. 2. We should not force users to upgrade the external shuffle service when upgrading Spark, if they don't use adaptive execution. 3. We should not enable this batch shuffle fetching if the serializer and compressor don't support it. 4. For better user experience, we should avoid adding more configs if possible My proposal: we should enable this feature only if 1. We need to fetch multiple reducer partitions at once, which can only happen when adaptive execution is enabled, currently. 2. The serializer supports it, i.e. `Serializer.supportsRelocationOfSerializedObjects` is true 3. The compressor supports it, i.e. `CompressionCodec.supportsConcatenationOfSerializedStreams` is true Thus we don't need extra config and we will automatically enable it when adaptive execution is enabled.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org