Re: Why does this siimple spark program uses only one core?

2014-11-09 Thread Matei Zaharia
Call getNumPartitions() on your RDD to make sure it has the right number of 
partitions. You can also specify it when doing parallelize, e.g.

rdd = sc.parallelize(xrange(1000), 10))

This should run in parallel if you have multiple partitions and cores, but it 
might be that during part of the process only one node (e.g. the master 
process) is doing anything.

Matei


> On Nov 9, 2014, at 9:27 AM, Akhil Das  wrote:
> 
> You can set the following entry inside the conf/spark-defaults.conf file 
> 
> spark.cores.max 16
> 
> If you want to read the default value, then you can use the following api call
> 
> sc.defaultParallelism
> 
> where ​sc is your sparkContext object.​
> 
> Thanks
> Best Regards
> 
> On Sun, Nov 9, 2014 at 6:48 PM, ReticulatedPython  <mailto:person.of.b...@gmail.com>> wrote:
> So, I'm running this simple program on a 16 core multicore system. I run it
> by issuing the following.
> 
> spark-submit --master local[*] pi.py
> 
> And the code of that program is the following. When I use top to see CPU
> consumption, only 1 core is being utilized. Why is it so? Seconldy, spark
> documentation says that the default parallelism is contained in property
> spark.default.parallelism. How can I read this property from within my
> python program?
> 
> #"""pi.py"""
> from pyspark import SparkContext
> import random
> 
> NUM_SAMPLES = 1250
> 
> def sample(p):
> x, y = random.random(), random.random()
> return 1 if x*x + y*y < 1 else 0
> 
> sc = SparkContext("local", "Test App")
> count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a,
> b: a + b)
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html
>  
> <http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html>
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> For additional commands, e-mail: user-h...@spark.apache.org 
> <mailto:user-h...@spark.apache.org>
> 
> 



Re: Why does this siimple spark program uses only one core?

2014-11-09 Thread Akhil Das
You can set the following entry inside the conf/spark-defaults.conf file

spark.cores.max 16


If you want to read the default value, then you can use the following api
call

*sc*.defaultParallelism

where ​*sc* is your sparkContext object.​


Thanks
Best Regards

On Sun, Nov 9, 2014 at 6:48 PM, ReticulatedPython 
wrote:

> So, I'm running this simple program on a 16 core multicore system. I run it
> by issuing the following.
>
> spark-submit --master local[*] pi.py
>
> And the code of that program is the following. When I use top to see CPU
> consumption, only 1 core is being utilized. Why is it so? Seconldy, spark
> documentation says that the default parallelism is contained in property
> spark.default.parallelism. How can I read this property from within my
> python program?
>
> #"""pi.py"""
> from pyspark import SparkContext
> import random
>
> NUM_SAMPLES = 1250
>
> def sample(p):
> x, y = random.random(), random.random()
> return 1 if x*x + y*y < 1 else 0
>
> sc = SparkContext("local", "Test App")
> count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a,
> b: a + b)
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Why does this siimple spark program uses only one core?

2014-11-09 Thread ReticulatedPython
So, I'm running this simple program on a 16 core multicore system. I run it
by issuing the following.

spark-submit --master local[*] pi.py

And the code of that program is the following. When I use top to see CPU
consumption, only 1 core is being utilized. Why is it so? Seconldy, spark
documentation says that the default parallelism is contained in property
spark.default.parallelism. How can I read this property from within my
python program?

#"""pi.py"""
from pyspark import SparkContext
import random

NUM_SAMPLES = 1250

def sample(p):
x, y = random.random(), random.random()
return 1 if x*x + y*y < 1 else 0

sc = SparkContext("local", "Test App")
count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a,
b: a + b)
print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-this-siimple-spark-program-uses-only-one-core-tp18434.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org