GitHub user 10110346 opened a pull request:

    https://github.com/apache/spark/pull/23228

    [MINOR][DOC]The condition description of serialized shuffle is not very 
accurate

    ## What changes were proposed in this pull request?
    `1. The shuffle dependency specifies no aggregation or output ordering.`
    If the shuffle dependency specifies aggregation, but it only aggregates at 
the reducer side, serialized shuffle can still be used.
    `3. The shuffle produces fewer than 16777216 output partitions.`
    If the number of output partitions is 16777216 , we can use serialized 
shuffle.
    ## How was this patch tested?
    N/A


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/10110346/spark SerializedShuffle_doc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23228.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23228
    
----
commit d5dadbf30d5429c36ec3d5c2845a71c2717fd6f3
Author: liuxian <liu.xian3@...>
Date:   2018-12-05T08:55:20Z

    fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to