Holden Karau created SPARK-48362:
------------------------------------

             Summary: Add CollectSetWIthLimit
                 Key: SPARK-48362
                 URL: https://issues.apache.org/jira/browse/SPARK-48362
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 4.0.0
            Reporter: Holden Karau


See 
[https://stackoverflow.com/questions/38730912/how-to-limit-functions-collect-set-in-spark-sql]

 

Some users want to collect a set but if the number of distinct elements is too 
large they may get a Cannot grow BufferHolder  error from trying to collect the 
set then trim it.

 

We should offer a collect set which pre-emptively does not add more elements 
than needed to reduce the amount of memory used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to