GitHub user ooq opened a pull request:
https://github.com/apache/spark/pull/14266
[SPARK-16526] [SQL] Benchmarking Performance for Fast HashMap
Implementations and Set Knobs
## What changes were proposed in this pull request?
The 3rd PR in its series to resolve SPARK-16523.
This patch adds benchmark tests for vectorized hashmap vs. row-based
hashmap (along with results in the comments). Those tests are ignored by
default as they take long to run.
We would also like to use the results to set the knob which switches
between vectorized and row-based hashmap.
## How was this patch tested?
This patch are mostly tests itself.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ooq/spark rowbasedfastaggmap-pr3
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14266.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14266
----
commit c87f26b318b5d673ac95454df5c1cb9a56c677eb
Author: Qifan Pu <[email protected]>
Date: 2016-07-13T07:35:06Z
add RowBatch and RowBasedHashMapGenerator
commit a3360e0ab1223dd43f891e755e648680a402b7df
Author: Qifan Pu <[email protected]>
Date: 2016-07-13T08:08:35Z
enable row based hashmap
commit 45641e5a7df341522518b19bf4a4662d14d64b48
Author: Qifan Pu <[email protected]>
Date: 2016-07-13T08:52:31Z
fix scale codestyle
commit b94fc6383f0727ce4249653550833fd3f0019a65
Author: Qifan Pu <[email protected]>
Date: 2016-07-13T08:53:11Z
merge fix
commit 9b0b294013239f4db744d7f5f5c1bdf838dd0559
Author: Qifan Pu <[email protected]>
Date: 2016-07-13T08:55:53Z
fix indent
commit 24248b190745bef13c567bd2681164d990d31cf3
Author: Qifan Pu <[email protected]>
Date: 2016-07-14T18:18:33Z
add SimpleRowBatch for performance
commit 9008725af8159ac186e0c7f81b08b85ddd7a0ec7
Author: Qifan Pu <[email protected]>
Date: 2016-07-14T18:19:36Z
a number of minor fixs
commit 4bdaeada70a20f89f6c593a4fc0298597e9a43cd
Author: Qifan Pu <[email protected]>
Date: 2016-07-14T18:58:08Z
Merge branch 'master' of github.com:apache/spark into rowbasedfastaggmap-pr2
commit 225b6619cd070ac9da3846a3bd02fa730e4ec835
Author: Qifan Pu <[email protected]>
Date: 2016-07-14T20:53:28Z
fix bug
commit bb4678856ebc1d729e530b9a1949ca9211c6a92e
Author: Qifan Pu <[email protected]>
Date: 2016-07-15T17:43:40Z
return row
commit a158125956627e502a8045fb077760063a3ca397
Author: Qifan Pu <[email protected]>
Date: 2016-07-15T17:45:16Z
simply fash hash map condition check
commit 22d8afd7dbd187b85e6f0c0d51544f0234d4beac
Author: Qifan Pu <[email protected]>
Date: 2016-07-15T17:52:36Z
update data structures to be consistent with what is used
commit ecff4ff3f30aefbaea89a12d2d5b3fda062b0f38
Author: Qifan Pu <[email protected]>
Date: 2016-07-17T23:39:19Z
Update simple row batch to improve performance & use SimpleRowBatch by
default
commit 33b2910fa412669b2460b99ba0b6232f462e7879
Author: Qifan Pu <[email protected]>
Date: 2016-07-17T23:57:41Z
add simplerowbatch
commit 2c1973a872e5b8d99a55234724ec24acbc5f70ff
Author: Qifan Pu <[email protected]>
Date: 2016-07-18T07:58:14Z
Add tests for SimpleRowBatch
commit ce72d900004bfa720460126a3573642a8a97bc53
Author: Qifan Pu <[email protected]>
Date: 2016-07-18T08:00:11Z
keep in sync with pr1
commit 43cf549c27451209fc3fe4c8bb726fcfb2d7501c
Author: Qifan Pu <[email protected]>
Date: 2016-07-18T09:59:03Z
Add benchmarks for comparing hashmaps
commit 6515c3dc8b6f4084f66259f18af362fccb436157
Author: Qifan Pu <[email protected]>
Date: 2016-07-18T18:21:01Z
simply free page in iterator
commit 8f538b177e36ccc5fb690a3b29eb03ca72d1a4b2
Author: Qifan Pu <[email protected]>
Date: 2016-07-18T19:00:00Z
Clean logic in SimpleRowBatch that was supposedly to deal with multiple
pages
commit 461028e62c9d9821cf11abdb9d85e9a8edb58ba4
Author: Qifan Pu <[email protected]>
Date: 2016-07-18T19:01:36Z
update with pr1
commit 774e088dc719cbd4d4ef97995656ec912b11878a
Author: Qifan Pu <[email protected]>
Date: 2016-07-18T19:02:39Z
update with pr1
commit 251d3919ed1b7dccacccc9bee6e121954a698cdd
Author: Qifan Pu <[email protected]>
Date: 2016-07-18T22:09:47Z
shrink findOrInsert() code size
commit 708f7bb3790556a596f6de51f127e99cd6f11662
Author: Qifan Pu <[email protected]>
Date: 2016-07-18T22:12:29Z
update some benchmarking results
commit d9394888977c97fe95f1642ad9f613dcbee1e4fa
Author: Qifan Pu <[email protected]>
Date: 2016-07-18T22:25:56Z
remove Rowbatch; renaming SimpleRowBatch to RowBasedKeyValueBatch
commit 02e4ab1c76cc777ef84cacf894f063505a19fffa
Author: Qifan Pu <[email protected]>
Date: 2016-07-18T22:26:28Z
Merge branch 'rowbasedfastaggmap-pr1' of github.com:ooq/spark into
rowbasedfastaggmap-pr3
commit 60e78bd477a90892b8568c1da08d7b0e5fe3672a
Author: Qifan Pu <[email protected]>
Date: 2016-07-18T22:42:10Z
update benchmark
commit 20baf3e24699589342e14b1e8f2c90fec85d183b
Author: Qifan Pu <[email protected]>
Date: 2016-07-18T23:13:29Z
update benchmark
commit 3b3c9ea6dfc17ba4ebd562c70608be02f22693f9
Author: Qifan Pu <[email protected]>
Date: 2016-07-19T16:15:02Z
Add benchmark results for vectorized vs. rowbased hashmap
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]