[GitHub] spark pull request: [SPARK-3119] Re-implementation of TorrentBroad...

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2030#issuecomment-52606422
  
    Benchmarked as of 0d8ed5b and the results aren't conclusively faster than 
`master`; the good news is that we've narrowed the gap that I saw earlier 
between `master` and v1.0.2 for small jobs.
    
    Each bar here represents a test where I ran 100 back-to-back jobs, each 
with 10 tasks, and varied the size of the taskâs closure (each bar is the 
average of 10 runs, ignoring the first run to allow for JIT / warmup).  The 
closure sizes (x-axis) are empty (well, whatever the minimum size was), 1 
megabyte, and 10 megabytes; y-axis is time (seconds).  This is running on 10 
r3.2xlarge nodes in EC2.  The test code is based off of my modified version of 
spark-perf 
(https://github.com/JoshRosen/spark-perf/commit/0e768b2e03bfb3eeb421397e6e0fe93082879ef8)
    
    
![image](https://cloud.githubusercontent.com/assets/50748/3963508/3cc0d430-277d-11e4-9109-9efd98a8b30e.png)
    
    Or, in tabular form, the means:
    
    
![image](https://cloud.githubusercontent.com/assets/50748/3963514/5566af50-277d-11e4-9214-08ea57058c2d.png)
    
    and standard deviations:
    
    
![image](https://cloud.githubusercontent.com/assets/50748/3963521/74a6dc82-277d-11e4-9a85-e3f5c1597242.png)
    
    Keep in mind that this is running 100 back-to-back jobs; for example, 
v1.0.2 averaged 9ms per job for the small jobs.
    
    I'll run these benchmarks again tomorrow morning when I'm less tired to 
make sure I haven't inadvertently misconfigured anything.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3119] Re-implementation of TorrentBroad...

Reply via email to