[GitHub] spark issue #15713: [SPARK-18196] [CORE] Optimise CompactBuffer implementati...

a-roberts Thu, 01 Dec 2016 09:02:30 -0800

Github user a-roberts commented on the issue:

    https://github.com/apache/spark/pull/15713
  
    Performance results against the Spark master branch on a 48 core machine 
running PageRank with 500k pages follow
    
    **Vanilla CompactBuffer, no changes, run time and throughput (bytes per 
second) provided**
    ```
    ScalaSparkPagerank 2016-12-01 13:16:09 259928115            47.933          
     5422738
    ScalaSparkPagerank 2016-12-01 13:22:41 259928115            45.551          
     5706309
    ScalaSparkPagerank 2016-12-01 13:26:31 259928115            46.745          
     5560554
    ScalaSparkPagerank 2016-12-01 13:28:58 259928115            51.699          
     5027720
    ScalaSparkPagerank 2016-12-01 13:33:26 259928115            48.415          
     5368751
    240.343s / 5 = 48.068s avg
    ```
    
    **The commit here**
    ```
    ScalaSparkPagerank 2016-12-01 10:26:12 259928115            48.706          
     5336675
    ScalaSparkPagerank 2016-12-01 10:37:30 259928115            48.947          
     5310399
    ScalaSparkPagerank 2016-12-01 10:40:16 259928115            49.768          
     5222796
    ScalaSparkPagerank 2016-12-01 12:55:37 259928115            48.873          
     5318439
    ScalaSparkPagerank 2016-12-01 12:58:12 259928115            47.535          
     5468141
    243.829 / 5 = 48.7658s avg
    ```
    
    Way too similar so attributing this to benchmark noise, without the 51s run 
this would be a few percentage points worse though
    
    **Use an ArrayBuffer (initial capacity of 16, default) instead of 
CompactBuffer**
    ```
    ScalaSparkPagerank 2016-12-01 13:42:45 259928115            62.190          
     4179580
    ScalaSparkPagerank 2016-12-01 13:55:20 259928115            54.112          
     4803520
    ScalaSparkPagerank 2016-12-01 13:59:06 259928115            60.818          
     4273868
    ScalaSparkPagerank 2016-12-01 14:06:26 259928115            57.428          
     4526156
    ScalaSparkPagerank 2016-12-01 14:35:01 259928115            58.218          
     4464737
    292.766 / 5 = 58.5532s avg
    ```
    
    **Use an ArrayBuffer (initial capacity of 2) instead of CompactBuffer**
    ```
    ScalaSparkPagerank 2016-12-01 15:31:16 259928115            53.544          
     4854476
    ScalaSparkPagerank 2016-12-01 15:36:32 259928115            58.105          
     4473420
    ScalaSparkPagerank 2016-12-01 15:38:45 259928115            53.976          
     4815623
    ScalaSparkPagerank 2016-12-01 15:44:09 259928115            55.174          
     4711061
    ScalaSparkPagerank 2016-12-01 15:50:01 259928115            55.084          
     4718758
    275.883 / 5 = 55.1766s avg
    ```
    
    With my tests I see that using an ArrayBuffer is noticeably worse, so I'll 
continue to look into what's going on to see if we can improve performance here 
as this is definitely a hot codepath for this particular algorithm



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15713: [SPARK-18196] [CORE] Optimise CompactBuffer implementati...

Reply via email to