Re: Another parallelism question

Adrien Carreira Wed, 15 Jun 2016 00:35:00 -0700

I think I understood that.

But, In my example :


1 machine on cluster with this basic topology and with 1 worker on conf

builder.setBolt("fetcher", new Fetch()).setNumTasks(2).shuffleGrouping("spout");

builder.setBolt("extract", new
Extract()).setNumTasks(2).shuffleGrouping("fetcher");

builder.setBolt("indexer", new Indexer())
.setNumTasks(2).shuffleGrouping("extract");

Storm will spawn on 1 worker, 3 thread with 6 task. I'm right ?

Then, If I rebalance to 2 worker, I will have 6 thread for tasks.

I'm still right ?

My Problem is : to scale up I understood that I need to set the numTasks to
a bigger value, but It will spawn more task than I want... I only want One
task when I've one machine, two when I've two machine, etc, etc....

Hope I'm clear


2016-06-09 16:27 GMT+02:00 Matthias J. Sax <[email protected]>:

> See here:
>
>
> https://stackoverflow.com/questions/31932573/rebalancing-executors-in-apache-storm/31941796#31941796
>
>
> https://stackoverflow.com/questions/20371073/how-to-tune-the-parallelism-hint-in-storm
>
>
> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
>
>
> -Matthias
>
>
> On 06/09/2016 03:41 PM, Nathan Leung wrote:
> > At that point you have to think about what makes sense for your system
> > right now.  For example, maybe it makes sense to have # tasks = 4 times
> > what you need right now, and then reload the topology when you outgrow
> that.
> >
> > Alternatively, you can consider bringing up a larger replacement
> > topology, and then killing the older one.  In this case you will have to
> > be more careful with names, and possibly things like resource (worker)
> > allocation.
> >
> > On Thu, Jun 9, 2016 at 9:30 AM, Adrien Carreira <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     So let's say one day I would like to have 100 machine,
> >
> >     I should set 100 on setNumTask ?
> >
> >     2016-06-09 15:20 GMT+02:00 Nathan Leung <[email protected]
> >     <mailto:[email protected]>>:
> >
> >         You can create your topology with more tasks than executors,
> >         then when the rebalance happens you can add executors.  However
> >         at the moment you cannot add more tasks to a running topology.
> >
> >         On Thu, Jun 9, 2016 at 8:58 AM, Adrien Carreira
> >         <[email protected] <mailto:[email protected]>> wrote:
> >
> >             I've just create a topology like this :
> >
> >             builder.setBolt("fetcher", new Fetch())
> >             .shuffleGrouping("spout");
> >
> >             builder.setBolt("extract", new Extract())
> >             .shuffleGrouping("fetcher");
> >
> >             builder.setBolt("indexer", new Indexer())
> >             .shuffleGrouping("extract");
> >
> >
> >             Means that I've three bolt with One Worker and
> >             parrallelism_hint of 1.
> >
> >             Now, Let's say that I've another machine available, or that
> >             I've too many tuple to process and I need another machine.
> >
> >
> >             I've executed this command :
> >
> >             storm rebalance kairos-who -n 2 -e indexer=2 -e fetcher=2 -e
> >             extract=2
> >
> >
> >             But what I've is two worker with :
> >
> >             worker 1 => Spout + extract
> >
> >             worker 2 => fetcher + indexer
> >
> >
> >             What I would love :
> >
> >             Worker 1 => Spout + fetcher + extract + indexer
> >
> >             Worker 2 => Same...
> >
> >
> >             I hope I'm clear...
> >
> >
> >
> >
> >
> >
> >
> >             2016-06-09 14:47 GMT+02:00 Andrew Xor
> >             <[email protected]
> >             <mailto:[email protected]>>:
> >
> >                 Hello,
> >
> >                   I am sorry, but I don't know why you cannot emulate
> >                 those scale up factors by using rebalance; after all it
> >                 spawns the requested amount of workers (in topology) and
> >                 executors (in spouts/bolts) only bounded by the
> >                 topology_max_task_parallelism. Have you read the article
> >                 in order to understand how parallelism works in storm?
> >
> >                 Regards.
> >
> >                 On Thu, Jun 9, 2016 at 3:34 PM, Adrien Carreira
> >                 <[email protected] <mailto:[email protected]>>
> wrote:
> >
> >                     Yes,
> >
> >                     But the rebalance command doesn't do what I would
> like.
> >
> >
> >                     Let's suppose that I've :
> >
> >                     SPOUT A (1) => BOLT 1 (1) => BOLT2 (1) => BOLT3 (3)
> >
> >                     (number is the parallelism hint)
> >                     It means that If I scale to n worker I would like :
> >
> >                     SPOUT A (1*n) => BOLT 1 (1*n) => BOLT2 (1*n) =>
> >                     BOLT3 (3*n)
> >
> >
> >                     But, the storm rebalance keeps the parralisme_hint :/
> >
> >
> >
> >                     2016-06-09 14:29 GMT+02:00 Andrew Xor
> >                     <[email protected]
> >                     <mailto:[email protected]>>:
> >
> >                         Hello,
> >
> >                          Why not use the rebalance command? It's well
> >                         documented here
> >                         <
> http://storm.apache.org/releases/current/Understanding-the-parallelism-of-a-Storm-topology.html
> >.
> >
> >                         Regards.
> >
> >                         On Thu, Jun 9, 2016 at 3:22 PM, Adrien Carreira
> >                         <[email protected]
> >                         <mailto:[email protected]>> wrote:
> >
> >                             Hi,
> >
> >                             After a month building a topology on storm.
> >                             I've one question about parallelism that I
> >                             can't answer.
> >
> >                             I've developed my topology and tested on a
> >                             cluster with two nodes.
> >
> >                             My parallelism_hint are ok, everything are
> fine.
> >
> >                             My question is, if I need to scale the
> >                             number of worker in the topology to have
> >                             more worker dooing the same thing how can I
> >                             achieve that without kill/restart the
> topology
> >
> >                             Thanks for your reply
> >
> >
> >
> >
> >
> >
> >
> >
>
>

Re: Another parallelism question

Reply via email to