GitHub user zsxwing opened a pull request:

    https://github.com/apache/spark/pull/16706

    [SPARK-19365][Core]Optimize RequestMessage serialization

    ## What changes were proposed in this pull request?
    
    Right now Netty PRC serializes `RequestMessage` using Java serialization, 
and the size of a single message (e.g., RequestMessage(..., "hello!")`) is 
about 1kb.
    
    This PR optimizes it by serializing `RequestMessage` manually, and reduces 
the above message size to 100+ bytes.
    
    ## How was this patch tested?
    
    Jenkins
    
    I did a simple test to measure the improvement:
    
    Before
    ```
    $ bin/spark-shell --master local-cluster[1,4,1024]
    ...
    scala> for (i <- 1 to 10) {
         |   val start = System.nanoTime
         |   val s = sc.parallelize(1 to 1000000, 10 * 1000).count()
         |   val end = System.nanoTime
         |   println(s"$i\t" + ((end - start)/1000/1000))
         | }
    1       6830                                                                
    
    2       4353                                                                
    
    3       3322                                                                
    
    4       3107                                                                
    
    5       3235                                                                
    
    6       3139                                                                
    
    7       3156                                                                
    
    8       3166                                                                
    
    9       3091                                                                
    
    10      3029
    ```
    After:
    ```
    $ bin/spark-shell --master local-cluster[1,4,1024]
    ...
    scala> for (i <- 1 to 10) {
         |   val start = System.nanoTime
         |   val s = sc.parallelize(1 to 1000000, 10 * 1000).count()
         |   val end = System.nanoTime
         |   println(s"$i\t" + ((end - start)/1000/1000))
         | }
    1       6431                                                                
    
    2       3643                                                                
    
    3       2913                                                                
    
    4       2679                                                                
    
    5       2760                                                                
    
    6       2710                                                                
    
    7       2747                                                                
    
    8       2793                                                                
    
    9       2679                                                                
    
    10      2651  
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsxwing/spark rpc-opt

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16706.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16706
    
----
commit b373c103d623c985e03e5fc6e81d86a2c829bb0f
Author: Shixiong Zhu <[email protected]>
Date:   2017-01-25T23:47:01Z

    Optimize RequestMessage serialization

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to