Thanks Bobby.

The issue of a bolt losing it's state looks pretty valid. However, what I
actually wanted to ask is - if I don't want to specify the number of tasks
in the topology. Say I have a logic that figures out how many instances of
each component to run. And that can be done once the topology has been
submitted. Is there a way of doing that ?

On Tue, Jun 23, 2015 at 5:47 AM, Bobby Evans <[email protected]> wrote:

> The issue with this is with routing of tuples.  If I want a keyed grouping
> where a tuple with "foo" in it will always go to the same instance of a
> bolt.  I don't see how it is possible to go from a situation where I have
> one bolt instance that has seen all of the tuples up to that point, and has
> some arbitrary state computed from them, and go to 2 instances of the
> bolt.  If I do that, I either have to throw all of the state away for both
> bolts, which is what redeploying your topology does, or I have to provide a
> way to checkpoint split and combine the state of these bolts. That is an
> incredibly difficult problem to solve, especially if the routing is user
> plug-able.  Instead we ask you ahead of time what is the maximum amount of
> state partitioning do you want for each bolt instance and then let you
> potentially run each of these in parallel.
>
> I guess we could do something like S4 where every key got a new bolt
> instance, but then they had a lot of issues with check-pointing all of
> these bolt instances and swapping them out.  They also didn't allow for
> pluggable groupings.  Everything was keyed grouping.
>
> - Bobby
>
>
>
>   On Friday, June 19, 2015 6:35 AM, Matthias J. Sax <
> [email protected]> wrote:
>
>
> Yes. The number of tasks is the maximum parallelism. However, you can
> have less parallelism as number of tasks. If you know the maximum number
> of distinct keys in your data set you can set the number of task
> accordingly. (more parallelism as number of distinct keys in not
> possible anyway).
>
> -Matthias
>
>
> On 06/19/2015 01:01 PM, Harshit Gupta wrote:
> > That's what. I want to have an arbitrary degree of parallelism. I don't
> > wish to hard code it. The current release doesn't allow that, isn't it ?
> >
> > On 19/06/2015 8:55 pm, "Matthias J. Sax" <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >    If the number of tasks is 3, you can have a maximum dop of 3.
> >
> >    ->  #executers <= #tasks
> >
> >    Have a lock here:
> >
> >
> https://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html
> >
> >    -Matthias
> >
> >    On 06/19/2015 12:31 PM, Harshit Gupta wrote:
> >    > Hi Matthias,
> >    >
> >    > Thanks for your reply.
> >    >
> >    > Consider this, say the max number of tasks for a bolt B is set to
> >    3. But
> >    > at some point of time, I want to deploy B on 6 different machines.
> How
> >    > would I do that ??
> >    >
> >    > I am new to Storm and your answer will improve my understanding of
> the
> >    > platform.
> >    >
> >    > Thanks a lot.
> >    >
> >    > On 19/06/2015 6:59 pm, "Matthias J. Sax"
> >    <[email protected] <mailto:[email protected]>
> >    > <mailto:[email protected]
> >    <mailto:[email protected]>>> wrote:
> >    >
> >    >    Just want to clarify: The number of task is not the number
> >    parallel
> >    >    running bolt instances (called executors, which are threads).
> >    So I don't
> >    >    understand why you don't want to start with the maximum number
> >    of tasks?
> >    >    There should be almost no overhead if you have more tasks than
> >    executors
> >    >    (executors can process multiple tasks and switching between
> >    tasks is
> >    >    light weight). Adjusting the number of executors during
> >    runtime can be
> >    >    done without redeploying (-> "rebalance"), giving you the
> >    flexibility
> >    >    you need.
> >    >
> >    >    -Matthias
> >    >
> >    >    On 06/19/2015 10:09 AM, Nilesh Chhapru wrote:
> >    >    > Hi Harshit,
> >    >    >
> >    >    >
> >    >    >
> >    >    > No there isn’t any way you can achieve this without
> >    redeploying your
> >    >    > topology, you may get this feature in the upcoming releases of
> >    >    storm as
> >    >    > this is in their roadmap.
> >    >    >
> >    >    >
> >    >    >
> >    >    >
> >    >    >
> >    >    > *Regards*,
> >    >    >
> >    >    > *Nilesh Chhapru.*
> >    >    >
> >    >    >
> >    >    >
> >    >    > *From:*Harshit Gupta [mailto:[email protected]
> >    <mailto:[email protected]>
> >    >    <mailto:[email protected]
> >    <mailto:[email protected]>>]
> >    >    > *Sent:* 19 June 2015 11:43 AM
> >    >    > *To:* [email protected]
> >    <mailto:[email protected]>
>
> >    >    <mailto:[email protected]
> >    <mailto:[email protected]>>
> >    >    > *Subject:* Fwd: DYNAMIC ADJUSTMENT OF NUMBER OF TASKS
> >    >    >
> >    >    >
> >    >    >
> >    >    > Hello,
> >    >    >
> >    >    > I am working on extending the Storm platform and would like to
> >    >    know the
> >    >    > scope of dynamically adjusting the number of tasks for a
> >    topology.
> >    >    >
> >    >    > I don't want to work with a worst-case ceiling on the number
> >    of tasks.
> >    >    >
> >    >    > Please let me know if there is/isn't a method for
> >    dynamically changing
> >    >    > the number of tasks without restarting the topology.
> >    >    >
> >    >    > Thanks.
> >    >    >
> >    >    > --
> >    >    >
> >    >    > /With regards,/
> >    >    >
> >    >    > * *
> >    >    >
> >    >    > *HARSHIT GUPTA*
> >    >    >
> >    >    > Fourth Year Undergraduate Student,
> >    >    >
> >    >    > Department Of Computer Science And Engineering,
> >    >    >
> >    >    > Indian Institute Of Technology, Kharagpur.
> >    >    >
> >    >
> >
>
>
>


-- 
*With regards,*

*HARSHIT GUPTA*
Fourth Year Undergraduate Student,
Department Of Computer Science And Engineering,
Indian Institute Of Technology, Kharagpur.

Reply via email to