Re: Basic questions about Strom

anshu shukla Fri, 23 Sep 2016 08:40:14 -0700

*Thanks For nice answer *Jungtaek -

*Que- What happened on-going tuples while storm rebalances topology?*
*Ans- When you're rebalancing topology, you're encouraged to input
wait-time, too.*
*Topology will be deactivated immediately so that Spout will not call
nextTuple(), only Bolts will be running to handle on-going tuples while
wait-time.*
*If there're still on-going tuples left, they will not be acked. So if data
source of Spout is RabbitMQ with ack mode or Kafka or so on, Spout will
read them from data source again.*


*But just one query in case acking is disabled then on rebalance there may
be the loss of the messages that are in the queue after wait time . Because
In Storm UI what I found is rebalancing particular bolt /worker  mapping
resets the fields for that worker only and other remains  same .*

On Sat, Jun 11, 2016 at 3:30 AM, Jungtaek Lim <[email protected]> wrote:

> Hi Junguk,
>
> 1. In declareOutputFields, you're declaring schema of output stream of
> this component. First value of tuple will be matched to "word", and second
> value of tuple will be matched to "count". You can access value as field
> name or index.
>
> Btw, declare() declares default stream, and there're other methods which
> declare named (non-default) stream.
>
> 2. When you're rebalancing topology, you're encouraged to input wait-time,
> too.
> Topology will be deactivated immediately so that Spout will not call
> nextTuple(), only Bolts will be running to handle on-going tuples while
> wait-time.
> If there're still on-going tuples left, they will not be acked. So if
> datasource of Spout is RabbitMQ with ack mode or Kafka or so on, Spout will
> read them from datasource again.
>
> 3. Right. In order to check serialization issue earlier, there's option
> "topology.testing.always.try.serialize" as debug purpose. Note that it
> affects performance so it should be disabled ("false" by default) for
> production environment.
>
> Hope this helps.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
>
> 2016년 6월 11일 (토) 오전 3:27, Junguk Cho <[email protected]>님이 작성:
>
>> Hi, I have some basic questions.
>>
>> 1. About Tuple.
>> We declare tuple in declareOutputFields.
>> For example, declarer.declare(new Fields("word", "count"));
>>
>> Are "word" and "count" forwarded to next node with actual data?
>> What are the roles of "word" and "count" here internally?
>>
>>
>> 2. About rebalancing (http://storm.apache.org/
>> releases/1.0.1/Understanding-the-parallelism-of-a-Storm-topology.html)
>>
>> In storm, there is rebalancing capability.
>> What happened on-going tuples while storm rebalances topology?
>> Does it drop and replay?
>>
>> 3. Serialization.
>> In storm, as far as I know for inter-thread communication, serialization
>> does not happen. For inter-process and inter-node communication,
>> serialization is required.
>> Is it right?
>>
>> Thanks,
>> Junguk
>>
>>


-- 
Thanks & Regards,
Anshu Shukla

Re: Basic questions about Strom

Reply via email to