Liang-Chi Hsieh created SPARK-18487:
---------------------------------------

             Summary: Consume all elements for Dataset.show/take to avoid 
memory leak
                 Key: SPARK-18487
                 URL: https://issues.apache.org/jira/browse/SPARK-18487
             Project: Spark
          Issue Type: Bug
          Components: SQL
            Reporter: Liang-Chi Hsieh


The methods such as Dataset.show and take use Limit (CollectLimitExec) which 
leverages SparkPlan.executeTake to efficiently collect required number of 
elements back to the driver.

However, under wholestage codege, we usually release resources after all 
elements are consumed (e.g., HashAggregate). In this case, we will not release 
the resources and cause memory leak with Dataset.show, for example.

We should consume all elements in the iterator to avoid memory leak.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to