The issue with this is with routing of tuples. If I want a keyed grouping
where a tuple with "foo" in it will always go to the same instance of a bolt.
I don't see how it is possible to go from a situation where I have one bolt
instance that has seen all of the tuples up to that point, and has some
arbitrary state computed from them, and go to 2 instances of the bolt. If I do
that, I either have to throw all of the state away for both bolts, which is
what redeploying your topology does, or I have to provide a way to checkpoint
split and combine the state of these bolts. That is an incredibly difficult
problem to solve, especially if the routing is user plug-able. Instead we ask
you ahead of time what is the maximum amount of state partitioning do you want
for each bolt instance and then let you potentially run each of these in
parallel.
I guess we could do something like S4 where every key got a new bolt instance,
but then they had a lot of issues with check-pointing all of these bolt
instances and swapping them out. They also didn't allow for pluggable
groupings. Everything was keyed grouping.
- Bobby
On Friday, June 19, 2015 6:35 AM, Matthias J. Sax
<[email protected]> wrote:
Yes. The number of tasks is the maximum parallelism. However, you can
have less parallelism as number of tasks. If you know the maximum number
of distinct keys in your data set you can set the number of task
accordingly. (more parallelism as number of distinct keys in not
possible anyway).
-Matthias
On 06/19/2015 01:01 PM, Harshit Gupta wrote:
> That's what. I want to have an arbitrary degree of parallelism. I don't
> wish to hard code it. The current release doesn't allow that, isn't it ?
>
> On 19/06/2015 8:55 pm, "Matthias J. Sax" <[email protected]
> <mailto:[email protected]>> wrote:
>
> If the number of tasks is 3, you can have a maximum dop of 3.
>
> -> #executers <= #tasks
>
> Have a lock here:
>
>
>https://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html
>
> -Matthias
>
> On 06/19/2015 12:31 PM, Harshit Gupta wrote:
> > Hi Matthias,
> >
> > Thanks for your reply.
> >
> > Consider this, say the max number of tasks for a bolt B is set to
> 3. But
> > at some point of time, I want to deploy B on 6 different machines. How
> > would I do that ??
> >
> > I am new to Storm and your answer will improve my understanding of the
> > platform.
> >
> > Thanks a lot.
> >
> > On 19/06/2015 6:59 pm, "Matthias J. Sax"
> <[email protected] <mailto:[email protected]>
> > <mailto:[email protected]
> <mailto:[email protected]>>> wrote:
> >
> > Just want to clarify: The number of task is not the number
> parallel
> > running bolt instances (called executors, which are threads).
> So I don't
> > understand why you don't want to start with the maximum number
> of tasks?
> > There should be almost no overhead if you have more tasks than
> executors
> > (executors can process multiple tasks and switching between
> tasks is
> > light weight). Adjusting the number of executors during
> runtime can be
> > done without redeploying (-> "rebalance"), giving you the
> flexibility
> > you need.
> >
> > -Matthias
> >
> > On 06/19/2015 10:09 AM, Nilesh Chhapru wrote:
> > > Hi Harshit,
> > >
> > >
> > >
> > > No there isn’t any way you can achieve this without
> redeploying your
> > > topology, you may get this feature in the upcoming releases of
> > storm as
> > > this is in their roadmap.
> > >
> > >
> > >
> > >
> > >
> > > *Regards*,
> > >
> > > *Nilesh Chhapru.*
> > >
> > >
> > >
> > > *From:*Harshit Gupta [mailto:[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected]
> <mailto:[email protected]>>]
> > > *Sent:* 19 June 2015 11:43 AM
> > > *To:* [email protected]
> <mailto:[email protected]>
> > <mailto:[email protected]
> <mailto:[email protected]>>
> > > *Subject:* Fwd: DYNAMIC ADJUSTMENT OF NUMBER OF TASKS
> > >
> > >
> > >
> > > Hello,
> > >
> > > I am working on extending the Storm platform and would like to
> > know the
> > > scope of dynamically adjusting the number of tasks for a
> topology.
> > >
> > > I don't want to work with a worst-case ceiling on the number
> of tasks.
> > >
> > > Please let me know if there is/isn't a method for
> dynamically changing
> > > the number of tasks without restarting the topology.
> > >
> > > Thanks.
> > >
> > > --
> > >
> > > /With regards,/
> > >
> > > * *
> > >
> > > *HARSHIT GUPTA*
> > >
> > > Fourth Year Undergraduate Student,
> > >
> > > Department Of Computer Science And Engineering,
> > >
> > > Indian Institute Of Technology, Kharagpur.
> > >
> >
>