Github user a-roberts commented on the issue:
https://github.com/apache/spark/pull/15736
I've conducted a lot of performance tests and gathered .hcd files so I can
investigate this next week, but it looks like either the first commit is the
best for performance or my current configuration with this benchmark results in
us being unable to infer if our changes really make a difference.
Sharing some raw data, the format is as follows.
Benchmark name, date, time, data size in bytes (the same each run), the
elapsed time and the throughput (bytes per second).
**With the above suggestions for Partitioned*Buffer**
```
ScalaSparkPagerank 2016-11-25 18:49:23 259928115 49.577
5242917
ScalaSparkPagerank 2016-11-25 18:56:55 259928115 49.946
5204182
ScalaSparkPagerank 2016-11-25 19:00:04 259928115 46.510
5588650
ScalaSparkPagerank 2016-11-25 19:02:23 259928115 49.018
5302707
ScalaSparkPagerank 2016-11-25 19:05:25 259928115 49.270
5275585
```
**Vanilla, no changes at all**
```
ScalaSparkPagerank 2016-11-25 19:08:45 259928115 48.068
5407508
ScalaSparkPagerank 2016-11-25 19:11:20 259928115 47.712
5447856
ScalaSparkPagerank 2016-11-25 19:13:50 259928115 44.517
5838850
ScalaSparkPagerank 2016-11-25 19:16:07 259928115 49.942
5204599
ScalaSparkPagerank 2016-11-25 19:19:08 259928115 48.521
5357023
```
**Original commit**
```
ScalaSparkPagerank 2016-11-25 19:47:59 259928115 45.486
5714464
ScalaSparkPagerank 2016-11-25 19:50:48 259928115 48.507
5358569
ScalaSparkPagerank 2016-11-25 19:53:09 259928115 47.063
5522982
ScalaSparkPagerank 2016-11-25 19:56:58 259928115 46.154
5631757
ScalaSparkPagerank 2016-11-25 20:00:01 259928115 48.935
5311701
```
In Healthcenter I do see that these methods are still great candidates for
optimisation as they are all very commonly used.
Open to more suggestions, I have exclusive access to lots of hardware, can
easily churn out more custom builds and have lots of profiling software we can
use. I'll be committing code for the SizeEstimator soon as that's a good
candidate for optimisation here as well.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]