No, I just have a single node standalone cluster.

I am not tweaking around with the code to increase parallelism. I am just
running SparkKMeans that is there in Spark-1.0.0
I just wanted to know, if this behavior is natural. And if so, what causes
this?

Thank you

On Sat, Feb 21, 2015 at 8:32 PM, Sean Owen <so...@cloudera.com> wrote:

> What's your storage like? are you adding worker machines that are
> remote from where the data lives? I wonder if it just means you are
> spending more and more time sending the data over the network as you
> try to ship more of it to more remote workers.
>
> To answer your question, no in general more workers means more
> parallelism and therefore faster execution. But that depends on a lot
> of things. For example, if your process isn't parallelize to use all
> available execution slots, adding more slots doesn't do anything.
>
> On Sat, Feb 21, 2015 at 2:51 PM, Deep Pradhan <pradhandeep1...@gmail.com>
> wrote:
> > Yes, I am talking about standalone single node cluster.
> >
> > No, I am not increasing parallelism. I just wanted to know if it is
> natural.
> > Does message passing across the workers account for the happenning?
> >
> > I am running SparkKMeans, just to validate one prediction model. I am
> using
> > several data sets. I have a standalone mode. I am varying the workers
> from 1
> > to 16
> >
> > On Sat, Feb 21, 2015 at 8:14 PM, Sean Owen <so...@cloudera.com> wrote:
> >>
> >> I can imagine a few reasons. Adding workers might cause fewer tasks to
> >> execute locally (?) So you may be execute more remotely.
> >>
> >> Are you increasing parallelism? for trivial jobs, chopping them up
> >> further may cause you to pay more overhead of managing so many small
> >> tasks, for no speed up in execution time.
> >>
> >> Can you provide any more specifics though? you haven't said what
> >> you're running, what mode, how many workers, how long it takes, etc.
> >>
> >> On Sat, Feb 21, 2015 at 2:37 PM, Deep Pradhan <
> pradhandeep1...@gmail.com>
> >> wrote:
> >> > Hi,
> >> > I have been running some jobs in my local single node stand alone
> >> > cluster. I
> >> > am varying the worker instances for the same job, and the time taken
> for
> >> > the
> >> > job to complete increases with increase in the number of workers. I
> >> > repeated
> >> > some experiments varying the number of nodes in a cluster too and the
> >> > same
> >> > behavior is seen.
> >> > Can the idea of worker instances be extrapolated to the nodes in a
> >> > cluster?
> >> >
> >> > Thank You
> >
> >
>

Reply via email to