So, I'm running this simple program on a 16 core multicore system. I run it
by issuing the following.
spark-submit --master local[*] pi.py
And the code of that program is the following. When I use top to see CPU
consumption, only 1 core is being utilized. Why is it so? Seconldy, spark
documentation says that the default parallelism is contained in property
spark.default.parallelism. How can I read this property from within my
python program?
#"""pi.py"""
from pyspark import SparkContext
import random
NUM_SAMPLES = 12500000
def sample(p):
x, y = random.random(), random.random()
return 1 if x*x + y*y < 1 else 0
sc = SparkContext("local", "Test App")
count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a,
b: a + b)
print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]