Re: spark.python.worker.memory Discontinuity

2015-11-01 Thread Akhil Das
You can actually look at this code base https://github.com/apache/spark/blob/f85aa06464a10f5d1563302fd76465dded475a12/python/pyspark/rdd.py#L1825 _memory_limit function returns the amount of memory that you set with spark.python.worker.memory and is used for groupBy and such operations. Thank

spark.python.worker.memory Discontinuity

2015-10-23 Thread Connor Zanin
Hi all, I am running a simple word count job on a cluster of 4 nodes (24 cores per node). I am varying two parameter in the configuration, spark.python.worker.memory and the number of partitions in the RDD. My job is written in python. I am observing a discontinuity in the run time of the job whe