GitHub user yhuai opened a pull request:
https://github.com/apache/spark/pull/7813
[SPARK-9240] [SQL] [WIP] Hybrid aggregate operator using unsafe row
This PR adds a base aggregation iterator `AggregationIterator`, which is
used to create `SortBasedAggregationIterator` (for sort-based aggregation) and
`UnsafeHybridAggregationIterator` (first tries hash-based aggregation and if
the memory consumption of the map exceeds a certain threshold, falls back to
the sort-based aggregation). With these two iterators, we will not need
existing iterators and I am removing those. Also, we can use a single physical
`Aggregate` operator and it internally determines what iterators to used.
https://issues.apache.org/jira/browse/SPARK-9240
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/yhuai/spark AggregateOperator
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/7813.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #7813
----
commit d95daf021e3d8fdc7407c11c0ab86309f3a9340c
Author: Yin Huai <[email protected]>
Date: 2015-07-31T01:46:49Z
Create a base iterator class for aggregation iterators and add the initial
version of the hybrid iterator.
commit 84ceb3ae5b9d96a12001698a8c46d19764734f44
Author: Yin Huai <[email protected]>
Date: 2015-07-31T01:57:52Z
First round cleanup.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]