Yes, I have decreased the executor memory.
But,if I have to do this, then I have to tweak around with the code
corresponding to each configuration right?

On Sat, Feb 21, 2015 at 8:47 PM, Sean Owen <so...@cloudera.com> wrote:

> "Workers" has a specific meaning in Spark. You are running many on one
> machine? that's possible but not usual.
>
> Each worker's executors have access to a fraction of your machine's
> resources then. If you're not increasing parallelism, maybe you're not
> actually using additional workers, so are using less resource for your
> problem.
>
> Or because the resulting executors are smaller, maybe you're hitting
> GC thrashing in these executors with smaller heaps.
>
> Or if you're not actually configuring the executors to use less
> memory, maybe you're over-committing your RAM and swapping?
>
> Bottom line, you wouldn't use multiple workers on one small standalone
> node. This isn't a good way to estimate performance on a distributed
> cluster either.
>
> On Sat, Feb 21, 2015 at 3:11 PM, Deep Pradhan <pradhandeep1...@gmail.com>
> wrote:
> > No, I just have a single node standalone cluster.
> >
> > I am not tweaking around with the code to increase parallelism. I am just
> > running SparkKMeans that is there in Spark-1.0.0
> > I just wanted to know, if this behavior is natural. And if so, what
> causes
> > this?
> >
> > Thank you
> >
> > On Sat, Feb 21, 2015 at 8:32 PM, Sean Owen <so...@cloudera.com> wrote:
> >>
> >> What's your storage like? are you adding worker machines that are
> >> remote from where the data lives? I wonder if it just means you are
> >> spending more and more time sending the data over the network as you
> >> try to ship more of it to more remote workers.
> >>
> >> To answer your question, no in general more workers means more
> >> parallelism and therefore faster execution. But that depends on a lot
> >> of things. For example, if your process isn't parallelize to use all
> >> available execution slots, adding more slots doesn't do anything.
> >>
> >> On Sat, Feb 21, 2015 at 2:51 PM, Deep Pradhan <
> pradhandeep1...@gmail.com>
> >> wrote:
> >> > Yes, I am talking about standalone single node cluster.
> >> >
> >> > No, I am not increasing parallelism. I just wanted to know if it is
> >> > natural.
> >> > Does message passing across the workers account for the happenning?
> >> >
> >> > I am running SparkKMeans, just to validate one prediction model. I am
> >> > using
> >> > several data sets. I have a standalone mode. I am varying the workers
> >> > from 1
> >> > to 16
> >> >
> >> > On Sat, Feb 21, 2015 at 8:14 PM, Sean Owen <so...@cloudera.com>
> wrote:
> >> >>
> >> >> I can imagine a few reasons. Adding workers might cause fewer tasks
> to
> >> >> execute locally (?) So you may be execute more remotely.
> >> >>
> >> >> Are you increasing parallelism? for trivial jobs, chopping them up
> >> >> further may cause you to pay more overhead of managing so many small
> >> >> tasks, for no speed up in execution time.
> >> >>
> >> >> Can you provide any more specifics though? you haven't said what
> >> >> you're running, what mode, how many workers, how long it takes, etc.
> >> >>
> >> >> On Sat, Feb 21, 2015 at 2:37 PM, Deep Pradhan
> >> >> <pradhandeep1...@gmail.com>
> >> >> wrote:
> >> >> > Hi,
> >> >> > I have been running some jobs in my local single node stand alone
> >> >> > cluster. I
> >> >> > am varying the worker instances for the same job, and the time
> taken
> >> >> > for
> >> >> > the
> >> >> > job to complete increases with increase in the number of workers. I
> >> >> > repeated
> >> >> > some experiments varying the number of nodes in a cluster too and
> the
> >> >> > same
> >> >> > behavior is seen.
> >> >> > Can the idea of worker instances be extrapolated to the nodes in a
> >> >> > cluster?
> >> >> >
> >> >> > Thank You
> >> >
> >> >
> >
> >
>

Reply via email to