TigerYang414 opened a new pull request #23954: [SPARK-27041][PySpark] Use 
imap() for python 2.x to resolve oom issue
URL: https://github.com/apache/spark/pull/23954
 
 
   ## What changes were proposed in this pull request?
   
   With large partition, pyspark may exceeds executor memory limit and trigger 
out of memory for python 2.7.
   This is because map() is used. Unlike in python3.x, python 2.7 map() will 
generate a list and need to read all data into memory. 
   
   The proposed fix will use imap in python 2.7 and it has been verified.
   
   ## How was this patch tested?
   Manual test.
   (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
   (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
   
   Please review http://spark.apache.org/contributing.html before opening a 
pull request.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to