[ https://issues.apache.org/jira/browse/SPARK-27041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-27041. ------------------------------- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23954 [https://github.com/apache/spark/pull/23954] > large partition data cause pyspark with python2.x oom > ----------------------------------------------------- > > Key: SPARK-27041 > URL: https://issues.apache.org/jira/browse/SPARK-27041 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.4.0 > Reporter: David Yang > Assignee: David Yang > Priority: Major > Fix For: 3.0.0 > > > With large partition, pyspark may exceeds executor memory limit and trigger > out of memory for python 2.7. > This is because map() is used. Unlike in python3.x, python 2.7 map() will > generate a list and need to read all data into memory. > The proposed fix will use imap in python 2.7 and it has been verified. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org