Reg. Difference in Performance

2015-02-28 Thread Deep Pradhan
Hi,
I am running Spark applications in GCE. I set up cluster with different
number of nodes varying from 1 to 7. The machines are single core machines.
I set the spark.default.parallelism to the number of nodes in the cluster
for each cluster. I ran the four applications available in Spark Examples,
SparkTC, SparkALS, SparkLR, SparkPi for each of the configurations.
What I notice is the following:
In case of SparkTC and SparkALS, the time to complete the job increases
with the increase in number of nodes in cluster, where as in SparkLR and
SparkPi, the time to complete the job remains the same across all the
configurations.
Could anyone explain me this?

Thank You
Regards,
Deep


Re: Reg. Difference in Performance

2015-02-28 Thread Joseph Bradley
Hi Deep,

Compute times may not be very meaningful for small examples like those.  If
you increase the sizes of the examples, then you may start to observe more
meaningful trends and speedups.

Joseph

On Sat, Feb 28, 2015 at 7:26 AM, Deep Pradhan pradhandeep1...@gmail.com
wrote:

 Hi,
 I am running Spark applications in GCE. I set up cluster with different
 number of nodes varying from 1 to 7. The machines are single core machines.
 I set the spark.default.parallelism to the number of nodes in the cluster
 for each cluster. I ran the four applications available in Spark Examples,
 SparkTC, SparkALS, SparkLR, SparkPi for each of the configurations.
 What I notice is the following:
 In case of SparkTC and SparkALS, the time to complete the job increases
 with the increase in number of nodes in cluster, where as in SparkLR and
 SparkPi, the time to complete the job remains the same across all the
 configurations.
 Could anyone explain me this?

 Thank You
 Regards,
 Deep



Re: Reg. Difference in Performance

2015-02-28 Thread Deep Pradhan
You mean the size of the data that we take?

Thank You
Regards,
Deep

On Sun, Mar 1, 2015 at 6:04 AM, Joseph Bradley jos...@databricks.com
wrote:

 Hi Deep,

 Compute times may not be very meaningful for small examples like those.
 If you increase the sizes of the examples, then you may start to observe
 more meaningful trends and speedups.

 Joseph

 On Sat, Feb 28, 2015 at 7:26 AM, Deep Pradhan pradhandeep1...@gmail.com
 wrote:

 Hi,
 I am running Spark applications in GCE. I set up cluster with different
 number of nodes varying from 1 to 7. The machines are single core machines.
 I set the spark.default.parallelism to the number of nodes in the cluster
 for each cluster. I ran the four applications available in Spark Examples,
 SparkTC, SparkALS, SparkLR, SparkPi for each of the configurations.
 What I notice is the following:
 In case of SparkTC and SparkALS, the time to complete the job increases
 with the increase in number of nodes in cluster, where as in SparkLR and
 SparkPi, the time to complete the job remains the same across all the
 configurations.
 Could anyone explain me this?

 Thank You
 Regards,
 Deep