[GitHub] spark pull request: [SPARK-10605][SQL] Create native collect_list/...

hvanhovell Thu, 12 May 2016 04:05:43 -0700

Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/12874#issuecomment-218726402
  
    I am not sure if there is enough support to add this to Spark. The thing is 
that this is a potential source OOME's and that it banks on the specific 
behavior of the `SortBasedAggregate` code path; this is however the same for 
Hive's `collect*` functions and this PR **is** an improvement over its `Hive` 
counterparts. Do you have a very pressing usecase for this?
    
    A more fruitfull approach would be to implement a dedicated operator for 
this. This would eliminate the reliance on the `SortBasedAggregate`, but it 
would still be capable of causing OOME's (we could spill the elements to disk, 
but the resulting row still has to fit into main memory).
    
    @rxin what is your take on this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-10605][SQL] Create native collect_list/...

Reply via email to