Using pyspark cli on spark 2.1.1 I’m getting out of memory issues when running 
the udf function on a recordset count of 10 with a mapping of the same value 
(arbirtrary for testing purposes). This is on amazon EMR release label 5.6.0 
with the following hardware specs

m4.4xlarge
32 vCPU, 64 GiB memory, EBS only storage
EBS Storage:100 GiB

Help?

This message is confidential, intended only for the named recipient(s) and may 
contain information that is privileged or exempt from disclosure under 
applicable law. If you are not the intended recipient(s), you are notified that 
the dissemination, distribution, or copying of this message is strictly 
prohibited. If you receive this message in error or are not the named 
recipient(s), please notify the sender by return email and delete this message. 
Thank you.

Reply via email to