So, with the increase in the number of worker instances, if I also increase the degree of parallelism, will it make any difference? I can use this model even the other way round right? I can always predict the performance of an app with the increase in number of worker instances, the deterioration in performance, right?
Thank You On Sat, Feb 21, 2015 at 8:52 PM, Deep Pradhan <pradhandeep1...@gmail.com> wrote: > Yes, I have decreased the executor memory. > But,if I have to do this, then I have to tweak around with the code > corresponding to each configuration right? > > On Sat, Feb 21, 2015 at 8:47 PM, Sean Owen <so...@cloudera.com> wrote: > >> "Workers" has a specific meaning in Spark. You are running many on one >> machine? that's possible but not usual. >> >> Each worker's executors have access to a fraction of your machine's >> resources then. If you're not increasing parallelism, maybe you're not >> actually using additional workers, so are using less resource for your >> problem. >> >> Or because the resulting executors are smaller, maybe you're hitting >> GC thrashing in these executors with smaller heaps. >> >> Or if you're not actually configuring the executors to use less >> memory, maybe you're over-committing your RAM and swapping? >> >> Bottom line, you wouldn't use multiple workers on one small standalone >> node. This isn't a good way to estimate performance on a distributed >> cluster either. >> >> On Sat, Feb 21, 2015 at 3:11 PM, Deep Pradhan <pradhandeep1...@gmail.com> >> wrote: >> > No, I just have a single node standalone cluster. >> > >> > I am not tweaking around with the code to increase parallelism. I am >> just >> > running SparkKMeans that is there in Spark-1.0.0 >> > I just wanted to know, if this behavior is natural. And if so, what >> causes >> > this? >> > >> > Thank you >> > >> > On Sat, Feb 21, 2015 at 8:32 PM, Sean Owen <so...@cloudera.com> wrote: >> >> >> >> What's your storage like? are you adding worker machines that are >> >> remote from where the data lives? I wonder if it just means you are >> >> spending more and more time sending the data over the network as you >> >> try to ship more of it to more remote workers. >> >> >> >> To answer your question, no in general more workers means more >> >> parallelism and therefore faster execution. But that depends on a lot >> >> of things. For example, if your process isn't parallelize to use all >> >> available execution slots, adding more slots doesn't do anything. >> >> >> >> On Sat, Feb 21, 2015 at 2:51 PM, Deep Pradhan < >> pradhandeep1...@gmail.com> >> >> wrote: >> >> > Yes, I am talking about standalone single node cluster. >> >> > >> >> > No, I am not increasing parallelism. I just wanted to know if it is >> >> > natural. >> >> > Does message passing across the workers account for the happenning? >> >> > >> >> > I am running SparkKMeans, just to validate one prediction model. I am >> >> > using >> >> > several data sets. I have a standalone mode. I am varying the workers >> >> > from 1 >> >> > to 16 >> >> > >> >> > On Sat, Feb 21, 2015 at 8:14 PM, Sean Owen <so...@cloudera.com> >> wrote: >> >> >> >> >> >> I can imagine a few reasons. Adding workers might cause fewer tasks >> to >> >> >> execute locally (?) So you may be execute more remotely. >> >> >> >> >> >> Are you increasing parallelism? for trivial jobs, chopping them up >> >> >> further may cause you to pay more overhead of managing so many small >> >> >> tasks, for no speed up in execution time. >> >> >> >> >> >> Can you provide any more specifics though? you haven't said what >> >> >> you're running, what mode, how many workers, how long it takes, etc. >> >> >> >> >> >> On Sat, Feb 21, 2015 at 2:37 PM, Deep Pradhan >> >> >> <pradhandeep1...@gmail.com> >> >> >> wrote: >> >> >> > Hi, >> >> >> > I have been running some jobs in my local single node stand alone >> >> >> > cluster. I >> >> >> > am varying the worker instances for the same job, and the time >> taken >> >> >> > for >> >> >> > the >> >> >> > job to complete increases with increase in the number of workers. >> I >> >> >> > repeated >> >> >> > some experiments varying the number of nodes in a cluster too and >> the >> >> >> > same >> >> >> > behavior is seen. >> >> >> > Can the idea of worker instances be extrapolated to the nodes in a >> >> >> > cluster? >> >> >> > >> >> >> > Thank You >> >> > >> >> > >> > >> > >> > >