Yes, I have decreased the executor memory. But,if I have to do this, then I have to tweak around with the code corresponding to each configuration right?
On Sat, Feb 21, 2015 at 8:47 PM, Sean Owen <so...@cloudera.com> wrote: > "Workers" has a specific meaning in Spark. You are running many on one > machine? that's possible but not usual. > > Each worker's executors have access to a fraction of your machine's > resources then. If you're not increasing parallelism, maybe you're not > actually using additional workers, so are using less resource for your > problem. > > Or because the resulting executors are smaller, maybe you're hitting > GC thrashing in these executors with smaller heaps. > > Or if you're not actually configuring the executors to use less > memory, maybe you're over-committing your RAM and swapping? > > Bottom line, you wouldn't use multiple workers on one small standalone > node. This isn't a good way to estimate performance on a distributed > cluster either. > > On Sat, Feb 21, 2015 at 3:11 PM, Deep Pradhan <pradhandeep1...@gmail.com> > wrote: > > No, I just have a single node standalone cluster. > > > > I am not tweaking around with the code to increase parallelism. I am just > > running SparkKMeans that is there in Spark-1.0.0 > > I just wanted to know, if this behavior is natural. And if so, what > causes > > this? > > > > Thank you > > > > On Sat, Feb 21, 2015 at 8:32 PM, Sean Owen <so...@cloudera.com> wrote: > >> > >> What's your storage like? are you adding worker machines that are > >> remote from where the data lives? I wonder if it just means you are > >> spending more and more time sending the data over the network as you > >> try to ship more of it to more remote workers. > >> > >> To answer your question, no in general more workers means more > >> parallelism and therefore faster execution. But that depends on a lot > >> of things. For example, if your process isn't parallelize to use all > >> available execution slots, adding more slots doesn't do anything. > >> > >> On Sat, Feb 21, 2015 at 2:51 PM, Deep Pradhan < > pradhandeep1...@gmail.com> > >> wrote: > >> > Yes, I am talking about standalone single node cluster. > >> > > >> > No, I am not increasing parallelism. I just wanted to know if it is > >> > natural. > >> > Does message passing across the workers account for the happenning? > >> > > >> > I am running SparkKMeans, just to validate one prediction model. I am > >> > using > >> > several data sets. I have a standalone mode. I am varying the workers > >> > from 1 > >> > to 16 > >> > > >> > On Sat, Feb 21, 2015 at 8:14 PM, Sean Owen <so...@cloudera.com> > wrote: > >> >> > >> >> I can imagine a few reasons. Adding workers might cause fewer tasks > to > >> >> execute locally (?) So you may be execute more remotely. > >> >> > >> >> Are you increasing parallelism? for trivial jobs, chopping them up > >> >> further may cause you to pay more overhead of managing so many small > >> >> tasks, for no speed up in execution time. > >> >> > >> >> Can you provide any more specifics though? you haven't said what > >> >> you're running, what mode, how many workers, how long it takes, etc. > >> >> > >> >> On Sat, Feb 21, 2015 at 2:37 PM, Deep Pradhan > >> >> <pradhandeep1...@gmail.com> > >> >> wrote: > >> >> > Hi, > >> >> > I have been running some jobs in my local single node stand alone > >> >> > cluster. I > >> >> > am varying the worker instances for the same job, and the time > taken > >> >> > for > >> >> > the > >> >> > job to complete increases with increase in the number of workers. I > >> >> > repeated > >> >> > some experiments varying the number of nodes in a cluster too and > the > >> >> > same > >> >> > behavior is seen. > >> >> > Can the idea of worker instances be extrapolated to the nodes in a > >> >> > cluster? > >> >> > > >> >> > Thank You > >> > > >> > > > > > >