No, I just have a single node standalone cluster. I am not tweaking around with the code to increase parallelism. I am just running SparkKMeans that is there in Spark-1.0.0 I just wanted to know, if this behavior is natural. And if so, what causes this?
Thank you On Sat, Feb 21, 2015 at 8:32 PM, Sean Owen <so...@cloudera.com> wrote: > What's your storage like? are you adding worker machines that are > remote from where the data lives? I wonder if it just means you are > spending more and more time sending the data over the network as you > try to ship more of it to more remote workers. > > To answer your question, no in general more workers means more > parallelism and therefore faster execution. But that depends on a lot > of things. For example, if your process isn't parallelize to use all > available execution slots, adding more slots doesn't do anything. > > On Sat, Feb 21, 2015 at 2:51 PM, Deep Pradhan <pradhandeep1...@gmail.com> > wrote: > > Yes, I am talking about standalone single node cluster. > > > > No, I am not increasing parallelism. I just wanted to know if it is > natural. > > Does message passing across the workers account for the happenning? > > > > I am running SparkKMeans, just to validate one prediction model. I am > using > > several data sets. I have a standalone mode. I am varying the workers > from 1 > > to 16 > > > > On Sat, Feb 21, 2015 at 8:14 PM, Sean Owen <so...@cloudera.com> wrote: > >> > >> I can imagine a few reasons. Adding workers might cause fewer tasks to > >> execute locally (?) So you may be execute more remotely. > >> > >> Are you increasing parallelism? for trivial jobs, chopping them up > >> further may cause you to pay more overhead of managing so many small > >> tasks, for no speed up in execution time. > >> > >> Can you provide any more specifics though? you haven't said what > >> you're running, what mode, how many workers, how long it takes, etc. > >> > >> On Sat, Feb 21, 2015 at 2:37 PM, Deep Pradhan < > pradhandeep1...@gmail.com> > >> wrote: > >> > Hi, > >> > I have been running some jobs in my local single node stand alone > >> > cluster. I > >> > am varying the worker instances for the same job, and the time taken > for > >> > the > >> > job to complete increases with increase in the number of workers. I > >> > repeated > >> > some experiments varying the number of nodes in a cluster too and the > >> > same > >> > behavior is seen. > >> > Can the idea of worker instances be extrapolated to the nodes in a > >> > cluster? > >> > > >> > Thank You > > > > >