Github user yucai commented on the issue:
    One disk IO solution's performance seems not as good as current PR19877's 
    spark.range(1, 5120000000L, 1, 1280).selectExpr("id as key", "id as 
    All codes are based on the same recent master.
    Current AE:
    Server Side One Disk IO OPT:
    Rebase PR19877:
    **Deep Dive**
    In one disk IO OPT solution, we still have two disadvantages:
    1. Need send the shuffle block one by one, and the client side needs 
process them one by one.
    Instead PR19877 will send all of them in one time (logically) and client 
side processes them in one time.
    2. No netty's zero copy.
    So I did another two experiment to verify my guess.
    1. One Disk IO One Net (I hacked some client side codes):
    2. One Disk IO One Net + Zero Copy (need client hack also):
    After optimizing to "one net", we got the similar performance as PR19877.
    Looks like "one net" is also important, but it needs change in client side.
    @cloud-fan, I understand you may be very busy with 2.4, feel free to ping 
me if you have any suggestion.


To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to