You can actually look at this code base
https://github.com/apache/spark/blob/f85aa06464a10f5d1563302fd76465dded475a12/python/pyspark/rdd.py#L1825
_memory_limit function returns the amount of memory that you set with
spark.python.worker.memory and is used for groupBy and such operations.
Thank
Hi all,
I am running a simple word count job on a cluster of 4 nodes (24 cores per
node). I am varying two parameter in the configuration,
spark.python.worker.memory and the number of partitions in the RDD. My job
is written in python.
I am observing a discontinuity in the run time of the job whe