Github user liancheng commented on the issue:
https://github.com/apache/spark/pull/15590
@hvanhovell That's a great point.
This is actually one of my pain points while writing this new operator.
These problems are:
1. `HashAggregateExec` and `SortAggregateExec` have some similar code paths
like merging multiple external sorters but can't really share the same code.
1. While prototyping `ObjectHashAggregateExec`, the behavior of the first
version of the operator was more consistent with `HashAggregateExec`, which
still fed the rest input rows into new hash maps after falling back to
sort-based aggregation. However, I found it's still quite hard to reuse any
code paths beside `AggregationIterator`. Mostly because `HashAggregateExec` is
highly specialized to the unsafe format.
The fallback logic of the current version is more consistent with
`SortAggregateExec`, which no longer builds any other hash maps. I think it's
possible to unify this part of code paths with `SortAggregateExec`. I'd like to
do this in a follow-up PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]