Hi All, I need some help with a problem in pyspark which is causing a major issue.
Recently I've noticed that the behaviour of the python.deamons on the worker nodes for compute-intensive tasks have changed from using all the avaliable cores to using only a single core. On each worker node, 8 python.deamons exist but they all seem to run on a single core. The remaining 7 cores idle. Our hardware consists of 9 hosts (1x driver node and 8x worker nodes) each with 8 cores and 64gb RAM - we are using Spark 1.2.0-SNAPSHOT (Python) in standalone mode via Cloudera 5.3.2. To give a better understanding of the problem I have made a quick script from one of the given examples which replicates the problem: When I run the calculating_pies.py script using the command "spark-submit calculate_pies.py" this is what I typically see on all my worker nodes: 1 [|||||||||||||||||||||||||||||||||||||||||||||||||||100.0%] 5 [||| 3.9%] 2 [ 0.0%] 6 [||| 2.6%] 3 [|| 1.3%] 7 [||||| 8.4%] 4 [|| 1.3%] 8 [|||||| 8.7%] PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 30672 spark 20 0 225M 112M 1156 R 13.0 0.2 0:03.10 python -m pyspark.daemon 30681 spark 20 0 225M 112M 1152 R 13.0 0.2 0:03.10 python -m pyspark.daemon 30687 spark 20 0 225M 112M 1152 R 13.0 0.2 0:03.10 python -m pyspark.daemon 30678 spark 20 0 225M 112M 1152 R 12.0 0.2 0:03.10 python -m pyspark.daemon 30693 spark 20 0 225M 112M 1152 R 12.0 0.2 0:03.08 python -m pyspark.daemon 30674 spark 20 0 225M 112M 1152 R 12.0 0.2 0:03.10 python -m pyspark.daemon 30688 spark 20 0 225M 112M 1152 R 12.0 0.2 0:03.08 python -m pyspark.daemon 30684 spark 20 0 225M 112M 1152 R 12.0 0.2 0:03.10 python -m pyspark.daemon Through the spark UI I do see 8 executor ids with 8 active tasks on each. I also see the same behaviour if I use the flag --total-executor-cores 64 in spark-submit. Strangly, If I run the same script in local mode everything seems to run fine. This is what I see 1 [||||||||||||||||||||||||||||||||||||||||||||||100.0%] 5 [||||||||||||||||||||||||||||||||||||||||||||||100.0%] 2 [||||||||||||||||||||||||||||||||||||||||||||||100.0%] 6 [||||||||||||||||||||||||||||||||||||||||||||||100.0%] 3 [||||||||||||||||||||||||||||||||||||||||||||||100.0%] 7 [||||||||||||||||||||||||||||||||||||||||||||||100.0%] 4 [|||||||||||||||||||||||||||||||||||||||||||||||99.3%] 8 [|||||||||||||||||||||||||||||||||||||||||||||||99.4%] PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 22519 data 20 0 225M 106M 1368 R 99.0 0.2 0:10.97 python -m pyspark.daemon 22508 data 20 0 225M 106M 1368 R 99.0 0.2 0:10.92 python -m pyspark.daemon 22513 data 20 0 225M 106M 1368 R 99.0 0.2 0:11.02 python -m pyspark.daemon 22526 data 20 0 225M 106M 1368 R 99.0 0.2 0:10.84 python -m pyspark.daemon 22522 data 20 0 225M 106M 1368 R 98.0 0.2 0:10.95 python -m pyspark.daemon 22523 data 20 0 225M 106M 1368 R 97.0 0.2 0:10.92 python -m pyspark.daemon 22507 data 20 0 225M 106M 1368 R 97.0 0.2 0:10.83 python -m pyspark.daemon 22516 data 20 0 225M 106M 1368 R 93.0 0.2 0:10.88 python -m pyspark.daemon ============== calculating_pies.py ================== #!/usr/bin/pyspark import random from pyspark.conf import SparkConf from pyspark.context import SparkContext from pyspark.storagelevel import StorageLevel def pi(NUM_SAMPLE = 1000000): count = 0. for i in xrange(NUM_SAMPLE): x, y = random.random(), random.random() if (x * x) + (y * y) < 1: count += 1 return 4.0 * (count / NUM_SAMPLE) if __name__ == "__main__": sconf = (SparkConf() .set('spark.default.parallelism','256') .set('spark.app.name', 'Calculating PI')) # local # sc = SparkContext(conf=sconf) # standalone sc = SparkContext("spark://<driver_host>:7077", conf=sconf) # yarn # sc = SparkContext("yarn-client", conf=sconf) rdd_pies = sc.parallelize(range(10000), 1000) rdd_pies.map(lambda x: pi()).collect() sc.stop() ===================================================== Does anyone have any suggestions, or know of any config we should be looking at that could solve this problem? Does anyone else see the same problem? Any help is appreciated, Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-not-using-all-cores-tp21989.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org