Re: Why does this siimple spark program uses only one core?
Call getNumPartitions() on your RDD to make sure it has the right number of partitions. You can also specify it when doing parallelize, e.g. rdd = sc.parallelize(xrange(1000), 10)) This should run in parallel if you have multiple partitions and cores, but it might be that during part of the process only one node (e.g. the master process) is doing anything. Matei > On Nov 9, 2014, at 9:27 AM, Akhil Das wrote: > > You can set the following entry inside the conf/spark-defaults.conf file > > spark.cores.max 16 > > If you want to read the default value, then you can use the following api call > > sc.defaultParallelism > > where ​sc is your sparkContext object.​ > > Thanks > Best Regards > > On Sun, Nov 9, 2014 at 6:48 PM, ReticulatedPython <mailto:person.of.b...@gmail.com>> wrote: > So, I'm running this simple program on a 16 core multicore system. I run it > by issuing the following. > > spark-submit --master local[*] pi.py > > And the code of that program is the following. When I use top to see CPU > consumption, only 1 core is being utilized. Why is it so? Seconldy, spark > documentation says that the default parallelism is contained in property > spark.default.parallelism. How can I read this property from within my > python program? > > #"""pi.py""" > from pyspark import SparkContext > import random > > NUM_SAMPLES = 1250 > > def sample(p): > x, y = random.random(), random.random() > return 1 if x*x + y*y < 1 else 0 > > sc = SparkContext("local", "Test App") > count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a, > b: a + b) > print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html > > <http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html> > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > <mailto:user-unsubscr...@spark.apache.org> > For additional commands, e-mail: user-h...@spark.apache.org > <mailto:user-h...@spark.apache.org> > >
Re: Why does this siimple spark program uses only one core?
You can set the following entry inside the conf/spark-defaults.conf file spark.cores.max 16 If you want to read the default value, then you can use the following api call *sc*.defaultParallelism where ​*sc* is your sparkContext object.​ Thanks Best Regards On Sun, Nov 9, 2014 at 6:48 PM, ReticulatedPython wrote: > So, I'm running this simple program on a 16 core multicore system. I run it > by issuing the following. > > spark-submit --master local[*] pi.py > > And the code of that program is the following. When I use top to see CPU > consumption, only 1 core is being utilized. Why is it so? Seconldy, spark > documentation says that the default parallelism is contained in property > spark.default.parallelism. How can I read this property from within my > python program? > > #"""pi.py""" > from pyspark import SparkContext > import random > > NUM_SAMPLES = 1250 > > def sample(p): > x, y = random.random(), random.random() > return 1 if x*x + y*y < 1 else 0 > > sc = SparkContext("local", "Test App") > count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a, > b: a + b) > print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Why does this siimple spark program uses only one core?
So, I'm running this simple program on a 16 core multicore system. I run it by issuing the following. spark-submit --master local[*] pi.py And the code of that program is the following. When I use top to see CPU consumption, only 1 core is being utilized. Why is it so? Seconldy, spark documentation says that the default parallelism is contained in property spark.default.parallelism. How can I read this property from within my python program? #"""pi.py""" from pyspark import SparkContext import random NUM_SAMPLES = 1250 def sample(p): x, y = random.random(), random.random() return 1 if x*x + y*y < 1 else 0 sc = SparkContext("local", "Test App") count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a, b: a + b) print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org