Github user nburoojy commented on the pull request:
https://github.com/apache/spark/pull/8592#issuecomment-154511621
Thanks for the review @marmbrus !
I've sent https://github.com/apache/spark/pull/9526 with your suggestion to
alias the Hive UDAFs. And I'd like to include it in the 1.6 release.
Longer-term (beyond 1.6) I'd like to solve the core issue.
For my particular use case I would like the ability to aggregate compound
types (struct and array), and it appears Hive 0.13.0 does not support this.
What kind of major changes would we have to make to support O(1) array
insertion? I was thinking that a strategy like
[CollectHashSet](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala)
uses would also work here; that is, I would implement a `CompactBufferUDT`
(backed by
[CompactBuffer](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/CompactBuffer.scala)),
and `updateExpressions` would append to the buffer in amortized `O(1)`.
Would this strategy break assumptions in the new aggregation framework? Do
you think this change is larger than I expect?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]