Github user hvanhovell commented on the pull request:
https://github.com/apache/spark/pull/12874#issuecomment-218726402
I am not sure if there is enough support to add this to Spark. The thing is
that this is a potential source OOME's and that it banks on the specific
behavior of the `SortBasedAggregate` code path; this is however the same for
Hive's `collect*` functions and this PR **is** an improvement over its `Hive`
counterparts. Do you have a very pressing usecase for this?
A more fruitfull approach would be to implement a dedicated operator for
this. This would eliminate the reliance on the `SortBasedAggregate`, but it
would still be capable of causing OOME's (we could spill the elements to disk,
but the resulting row still has to fit into main memory).
@rxin what is your take on this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]