[GitHub] spark issue #15590: [SPARK-17949][SQL] A JVM object based aggregate operator

liancheng Wed, 26 Oct 2016 10:55:56 -0700

Github user liancheng commented on the issue:

    https://github.com/apache/spark/pull/15590
  
    @hvanhovell That's a great point.
    
    This is actually one of my pain points while writing this new operator. 
These problems are:
    
    1. `HashAggregateExec` and `SortAggregateExec` have some similar code paths 
like merging multiple external sorters but can't really share the same code.
    1. While prototyping `ObjectHashAggregateExec`, the behavior of the first 
version of the operator was more consistent with `HashAggregateExec`, which 
still fed the rest input rows into new hash maps after falling back to 
sort-based aggregation. However, I found it's still quite hard to reuse any 
code paths beside `AggregationIterator`. Mostly because `HashAggregateExec` is 
highly specialized to the unsafe format.
    
    The fallback logic of the current version is more consistent with 
`SortAggregateExec`, which no longer builds any other hash maps. I think it's 
possible to unify this part of code paths with `SortAggregateExec`. I'd like to 
do this in a follow-up PR.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #15590: [SPARK-17949][SQL] A JVM object based aggregate operator

Reply via email to