I think I understood that.
But, In my example :
1 machine on cluster with this basic topology and with 1 worker on conf
builder.setBolt("fetcher", new Fetch()).setNumTasks(2).shuffleGrouping("spout");
builder.setBolt("extract", new
Extract()).setNumTasks(2).shuffleGrouping("fetcher");
builder.setBolt("indexer", new Indexer())
.setNumTasks(2).shuffleGrouping("extract");
Storm will spawn on 1 worker, 3 thread with 6 task. I'm right ?
Then, If I rebalance to 2 worker, I will have 6 thread for tasks.
I'm still right ?
My Problem is : to scale up I understood that I need to set the numTasks to
a bigger value, but It will spawn more task than I want... I only want One
task when I've one machine, two when I've two machine, etc, etc....
Hope I'm clear
2016-06-09 16:27 GMT+02:00 Matthias J. Sax <[email protected]>:
> See here:
>
>
> https://stackoverflow.com/questions/31932573/rebalancing-executors-in-apache-storm/31941796#31941796
>
>
> https://stackoverflow.com/questions/20371073/how-to-tune-the-parallelism-hint-in-storm
>
>
> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
>
>
> -Matthias
>
>
> On 06/09/2016 03:41 PM, Nathan Leung wrote:
> > At that point you have to think about what makes sense for your system
> > right now. For example, maybe it makes sense to have # tasks = 4 times
> > what you need right now, and then reload the topology when you outgrow
> that.
> >
> > Alternatively, you can consider bringing up a larger replacement
> > topology, and then killing the older one. In this case you will have to
> > be more careful with names, and possibly things like resource (worker)
> > allocation.
> >
> > On Thu, Jun 9, 2016 at 9:30 AM, Adrien Carreira <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> > So let's say one day I would like to have 100 machine,
> >
> > I should set 100 on setNumTask ?
> >
> > 2016-06-09 15:20 GMT+02:00 Nathan Leung <[email protected]
> > <mailto:[email protected]>>:
> >
> > You can create your topology with more tasks than executors,
> > then when the rebalance happens you can add executors. However
> > at the moment you cannot add more tasks to a running topology.
> >
> > On Thu, Jun 9, 2016 at 8:58 AM, Adrien Carreira
> > <[email protected] <mailto:[email protected]>> wrote:
> >
> > I've just create a topology like this :
> >
> > builder.setBolt("fetcher", new Fetch())
> > .shuffleGrouping("spout");
> >
> > builder.setBolt("extract", new Extract())
> > .shuffleGrouping("fetcher");
> >
> > builder.setBolt("indexer", new Indexer())
> > .shuffleGrouping("extract");
> >
> >
> > Means that I've three bolt with One Worker and
> > parrallelism_hint of 1.
> >
> > Now, Let's say that I've another machine available, or that
> > I've too many tuple to process and I need another machine.
> >
> >
> > I've executed this command :
> >
> > storm rebalance kairos-who -n 2 -e indexer=2 -e fetcher=2 -e
> > extract=2
> >
> >
> > But what I've is two worker with :
> >
> > worker 1 => Spout + extract
> >
> > worker 2 => fetcher + indexer
> >
> >
> > What I would love :
> >
> > Worker 1 => Spout + fetcher + extract + indexer
> >
> > Worker 2 => Same...
> >
> >
> > I hope I'm clear...
> >
> >
> >
> >
> >
> >
> >
> > 2016-06-09 14:47 GMT+02:00 Andrew Xor
> > <[email protected]
> > <mailto:[email protected]>>:
> >
> > Hello,
> >
> > I am sorry, but I don't know why you cannot emulate
> > those scale up factors by using rebalance; after all it
> > spawns the requested amount of workers (in topology) and
> > executors (in spouts/bolts) only bounded by the
> > topology_max_task_parallelism. Have you read the article
> > in order to understand how parallelism works in storm?
> >
> > Regards.
> >
> > On Thu, Jun 9, 2016 at 3:34 PM, Adrien Carreira
> > <[email protected] <mailto:[email protected]>>
> wrote:
> >
> > Yes,
> >
> > But the rebalance command doesn't do what I would
> like.
> >
> >
> > Let's suppose that I've :
> >
> > SPOUT A (1) => BOLT 1 (1) => BOLT2 (1) => BOLT3 (3)
> >
> > (number is the parallelism hint)
> > It means that If I scale to n worker I would like :
> >
> > SPOUT A (1*n) => BOLT 1 (1*n) => BOLT2 (1*n) =>
> > BOLT3 (3*n)
> >
> >
> > But, the storm rebalance keeps the parralisme_hint :/
> >
> >
> >
> > 2016-06-09 14:29 GMT+02:00 Andrew Xor
> > <[email protected]
> > <mailto:[email protected]>>:
> >
> > Hello,
> >
> > Why not use the rebalance command? It's well
> > documented here
> > <
> http://storm.apache.org/releases/current/Understanding-the-parallelism-of-a-Storm-topology.html
> >.
> >
> > Regards.
> >
> > On Thu, Jun 9, 2016 at 3:22 PM, Adrien Carreira
> > <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> > Hi,
> >
> > After a month building a topology on storm.
> > I've one question about parallelism that I
> > can't answer.
> >
> > I've developed my topology and tested on a
> > cluster with two nodes.
> >
> > My parallelism_hint are ok, everything are
> fine.
> >
> > My question is, if I need to scale the
> > number of worker in the topology to have
> > more worker dooing the same thing how can I
> > achieve that without kill/restart the
> topology
> >
> > Thanks for your reply
> >
> >
> >
> >
> >
> >
> >
> >
>
>