If you rebalance to 2 workers via $ storm rebalance topology-name -n new-num-workers
the number of thread/executors is not changed. Thus, one worker will
execute one thread, while the other will execute two. Each thread, still
executes 2 tasks each.
You can also change the number of executors using "-e" flag in rebalance
command.
However, your desired behavior to have one task if you are on one
machine, and two tasks if you rebalance to two machines is not possible.
The number of tasks cannot be change dynamically -- you would need to
kill and redeploy your topology to get this behavior.
However, having more tasks does not result in measurable overhead. So
why should it be problem to have more tasks?
-Matthias
On 06/15/2016 09:34 AM, Adrien Carreira wrote:
> I think I understood that.
>
> But, In my example :
>
> 1 machine on cluster with this basic topology and with 1 worker on conf
>
> builder.setBolt("fetcher", new
> Fetch()).setNumTasks(2).shuffleGrouping("spout");
>
> builder.setBolt("extract", new
> Extract()).setNumTasks(2).shuffleGrouping("fetcher");
>
> builder.setBolt("indexer", new
> Indexer()).setNumTasks(2).shuffleGrouping("extract");
>
> Storm will spawn on 1 worker, 3 thread with 6 task. I'm right ?
>
> Then, If I rebalance to 2 worker, I will have 6 thread for tasks.
>
> I'm still right ?
>
> My Problem is : to scale up I understood that I need to set the numTasks
> to a bigger value, but It will spawn more task than I want... I only
> want One task when I've one machine, two when I've two machine, etc, etc....
>
> Hope I'm clear
>
>
> 2016-06-09 16:27 GMT+02:00 Matthias J. Sax <[email protected]
> <mailto:[email protected]>>:
>
> See here:
>
>
> https://stackoverflow.com/questions/31932573/rebalancing-executors-in-apache-storm/31941796#31941796
>
>
> https://stackoverflow.com/questions/20371073/how-to-tune-the-parallelism-hint-in-storm
>
>
> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
>
>
> -Matthias
>
>
> On 06/09/2016 03:41 PM, Nathan Leung wrote:
> > At that point you have to think about what makes sense for your system
> > right now. For example, maybe it makes sense to have # tasks = 4 times
> > what you need right now, and then reload the topology when you outgrow
> that.
> >
> > Alternatively, you can consider bringing up a larger replacement
> > topology, and then killing the older one. In this case you will have to
> > be more careful with names, and possibly things like resource (worker)
> > allocation.
> >
> > On Thu, Jun 9, 2016 at 9:30 AM, Adrien Carreira <[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >
> > So let's say one day I would like to have 100 machine,
> >
> > I should set 100 on setNumTask ?
> >
> > 2016-06-09 15:20 GMT+02:00 Nathan Leung <[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>>:
> >
> > You can create your topology with more tasks than executors,
> > then when the rebalance happens you can add executors. However
> > at the moment you cannot add more tasks to a running topology.
> >
> > On Thu, Jun 9, 2016 at 8:58 AM, Adrien Carreira
> > <[email protected] <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >
> > I've just create a topology like this :
> >
> > builder.setBolt("fetcher", new Fetch())
> > .shuffleGrouping("spout");
> >
> > builder.setBolt("extract", new Extract())
> > .shuffleGrouping("fetcher");
> >
> > builder.setBolt("indexer", new Indexer())
> > .shuffleGrouping("extract");
> >
> >
> > Means that I've three bolt with One Worker and
> > parrallelism_hint of 1.
> >
> > Now, Let's say that I've another machine available, or
> that
> > I've too many tuple to process and I need another machine.
> >
> >
> > I've executed this command :
> >
> > storm rebalance kairos-who -n 2 -e indexer=2 -e
> fetcher=2 -e
> > extract=2
> >
> >
> > But what I've is two worker with :
> >
> > worker 1 => Spout + extract
> >
> > worker 2 => fetcher + indexer
> >
> >
> > What I would love :
> >
> > Worker 1 => Spout + fetcher + extract + indexer
> >
> > Worker 2 => Same...
> >
> >
> > I hope I'm clear...
> >
> >
> >
> >
> >
> >
> >
> > 2016-06-09 14:47 GMT+02:00 Andrew Xor
> > <[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected]
> <mailto:[email protected]>>>:
> >
> > Hello,
> >
> > I am sorry, but I don't know why you cannot emulate
> > those scale up factors by using rebalance; after all it
> > spawns the requested amount of workers (in topology) and
> > executors (in spouts/bolts) only bounded by the
> > topology_max_task_parallelism. Have you read the article
> > in order to understand how parallelism works in storm?
> >
> > Regards.
> >
> > On Thu, Jun 9, 2016 at 3:34 PM, Adrien Carreira
> > <[email protected] <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >
> > Yes,
> >
> > But the rebalance command doesn't do what I would
> like.
> >
> >
> > Let's suppose that I've :
> >
> > SPOUT A (1) => BOLT 1 (1) => BOLT2 (1) => BOLT3 (3)
> >
> > (number is the parallelism hint)
> > It means that If I scale to n worker I would like :
> >
> > SPOUT A (1*n) => BOLT 1 (1*n) => BOLT2 (1*n) =>
> > BOLT3 (3*n)
> >
> >
> > But, the storm rebalance keeps the parralisme_hint
> :/
> >
> >
> >
> > 2016-06-09 14:29 GMT+02:00 Andrew Xor
> > <[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected]
> <mailto:[email protected]>>>:
> >
> > Hello,
> >
> > Why not use the rebalance command? It's well
> > documented here
> >
>
> <http://storm.apache.org/releases/current/Understanding-the-parallelism-of-a-Storm-topology.html>.
> >
> > Regards.
> >
> > On Thu, Jun 9, 2016 at 3:22 PM, Adrien Carreira
> > <[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected]
> <mailto:[email protected]>>> wrote:
> >
> > Hi,
> >
> > After a month building a topology on
> storm.
> > I've one question about parallelism that I
> > can't answer.
> >
> > I've developed my topology and tested on a
> > cluster with two nodes.
> >
> > My parallelism_hint are ok, everything
> are fine.
> >
> > My question is, if I need to scale the
> > number of worker in the topology to have
> > more worker dooing the same thing how
> can I
> > achieve that without kill/restart the
> topology
> >
> > Thanks for your reply
> >
> >
> >
> >
> >
> >
> >
> >
>
>
signature.asc
Description: OpenPGP digital signature
