Using pyspark cli on spark 2.1.1 I’m getting out of memory issues when running the udf function on a recordset count of 10 with a mapping of the same value (arbirtrary for testing purposes). This is on amazon EMR release label 5.6.0 with the following hardware specs
m4.4xlarge 32 vCPU, 64 GiB memory, EBS only storage EBS Storage:100 GiB Help? This message is confidential, intended only for the named recipient(s) and may contain information that is privileged or exempt from disclosure under applicable law. If you are not the intended recipient(s), you are notified that the dissemination, distribution, or copying of this message is strictly prohibited. If you receive this message in error or are not the named recipient(s), please notify the sender by return email and delete this message. Thank you.