[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

cloud-fan Mon, 19 Feb 2018 23:20:07 -0800

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19788
  
    The idea LGTM, but I think @JoshRosen has a valid concern. My 2 cents:
    1. The concept of reading multiple reducer partitions in one shot was 
introduced by `ShuffleManager.getReader`. Although it's only used for adaptive 
execution in Spark SQL, this is still a feature provided by Spark Core.
    2. We should not force users to upgrade the external shuffle service when 
upgrading Spark, if they don't use adaptive execution.
    3. We should not enable this batch shuffle fetching if the serializer and 
compressor don't support it.
    4. For better user experience, we should avoid adding more configs if 
possible
    
    My proposal: we should enable this feature only if
    1. We need to fetch multiple reducer partitions at once, which can only 
happen when adaptive execution is enabled, currently.
    2. The serializer supports it, i.e. 
`Serializer.supportsRelocationOfSerializedObjects` is true
    3. The compressor supports it, i.e. 
`CompressionCodec.supportsConcatenationOfSerializedStreams` is true
    
    Thus we don't need extra config and we will automatically enable it when 
adaptive execution is enabled.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

Reply via email to