Reg. Difference in Performance
Hi, I am running Spark applications in GCE. I set up cluster with different number of nodes varying from 1 to 7. The machines are single core machines. I set the spark.default.parallelism to the number of nodes in the cluster for each cluster. I ran the four applications available in Spark Examples, SparkTC, SparkALS, SparkLR, SparkPi for each of the configurations. What I notice is the following: In case of SparkTC and SparkALS, the time to complete the job increases with the increase in number of nodes in cluster, where as in SparkLR and SparkPi, the time to complete the job remains the same across all the configurations. Could anyone explain me this? Thank You Regards, Deep
Re: Reg. Difference in Performance
Hi Deep, Compute times may not be very meaningful for small examples like those. If you increase the sizes of the examples, then you may start to observe more meaningful trends and speedups. Joseph On Sat, Feb 28, 2015 at 7:26 AM, Deep Pradhan pradhandeep1...@gmail.com wrote: Hi, I am running Spark applications in GCE. I set up cluster with different number of nodes varying from 1 to 7. The machines are single core machines. I set the spark.default.parallelism to the number of nodes in the cluster for each cluster. I ran the four applications available in Spark Examples, SparkTC, SparkALS, SparkLR, SparkPi for each of the configurations. What I notice is the following: In case of SparkTC and SparkALS, the time to complete the job increases with the increase in number of nodes in cluster, where as in SparkLR and SparkPi, the time to complete the job remains the same across all the configurations. Could anyone explain me this? Thank You Regards, Deep
Re: Reg. Difference in Performance
You mean the size of the data that we take? Thank You Regards, Deep On Sun, Mar 1, 2015 at 6:04 AM, Joseph Bradley jos...@databricks.com wrote: Hi Deep, Compute times may not be very meaningful for small examples like those. If you increase the sizes of the examples, then you may start to observe more meaningful trends and speedups. Joseph On Sat, Feb 28, 2015 at 7:26 AM, Deep Pradhan pradhandeep1...@gmail.com wrote: Hi, I am running Spark applications in GCE. I set up cluster with different number of nodes varying from 1 to 7. The machines are single core machines. I set the spark.default.parallelism to the number of nodes in the cluster for each cluster. I ran the four applications available in Spark Examples, SparkTC, SparkALS, SparkLR, SparkPi for each of the configurations. What I notice is the following: In case of SparkTC and SparkALS, the time to complete the job increases with the increase in number of nodes in cluster, where as in SparkLR and SparkPi, the time to complete the job remains the same across all the configurations. Could anyone explain me this? Thank You Regards, Deep