Re: Basic questions about Strom

Jungtaek Lim Tue, 14 Jun 2016 18:14:09 -0700

Sorry Junguk I missed this thread.

1. You can get value of field of tuple by both of field name and index.
Please refer
https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/tuple/ITuple.java
For that case, String word = tuple.getStringByField("word") is same to
String word = tuple.getString(0)


2. "stream id" is available for core tuple (via getSourceStreamId() and
getSourceGlobalStreamId()) but not available for trident tuple.

3. Nope. It should be serialized in order to send to another process.
Turning it off disables serialization of tuples which are sent only to
inner-process tasks.


2016년 6월 11일 (토) 오전 10:57, Junguk Cho <[email protected]>님이 작성:

> Hi, Jungtaek.
>
> Thank you for reply.
>
> I have following questions.
>
> 1. If we look at the example (WordCountTopology), in WordCount class, it
> uses   String word = tuple.getString(0); to get string (word).
> So, I don't understand exact roles of  "word" and "count". Internally,
> they use them for Map-like structure?
> To be clear, does each bolt exchange data with this format  "word" :
> <data> ?
>
> About default and non-default stream, do all tuples include stream id
> whenever they send?
>
>
> 3. To be clear, if we set "false", storm does not use serialization for
> inter-process and inter-node?
>
> Thanks in advance.
> - Junguk
>
>
>
>
> 2016-06-10 18:00 GMT-04:00 Jungtaek Lim <[email protected]>:
>
>> Hi Junguk,
>>
>> 1. In declareOutputFields, you're declaring schema of output stream of
>> this component. First value of tuple will be matched to "word", and second
>> value of tuple will be matched to "count". You can access value as field
>> name or index.
>>
>> Btw, declare() declares default stream, and there're other methods which
>> declare named (non-default) stream.
>>
>> 2. When you're rebalancing topology, you're encouraged to input
>> wait-time, too.
>> Topology will be deactivated immediately so that Spout will not call
>> nextTuple(), only Bolts will be running to handle on-going tuples while
>> wait-time.
>> If there're still on-going tuples left, they will not be acked. So if
>> datasource of Spout is RabbitMQ with ack mode or Kafka or so on, Spout will
>> read them from datasource again.
>>
>> 3. Right. In order to check serialization issue earlier, there's option
>> "topology.testing.always.try.serialize" as debug purpose. Note that it
>> affects performance so it should be disabled ("false" by default) for
>> production environment.
>>
>> Hope this helps.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>>
>> 2016년 6월 11일 (토) 오전 3:27, Junguk Cho <[email protected]>님이 작성:
>>
>>> Hi, I have some basic questions.
>>>
>>> 1. About Tuple.
>>> We declare tuple in declareOutputFields.
>>> For example, declarer.declare(new Fields("word", "count"));
>>>
>>> Are "word" and "count" forwarded to next node with actual data?
>>> What are the roles of "word" and "count" here internally?
>>>
>>>
>>> 2. About rebalancing (
>>> http://storm.apache.org/releases/1.0.1/Understanding-the-parallelism-of-a-Storm-topology.html
>>> )
>>>
>>> In storm, there is rebalancing capability.
>>> What happened on-going tuples while storm rebalances topology?
>>> Does it drop and replay?
>>>
>>> 3. Serialization.
>>> In storm, as far as I know for inter-thread communication, serialization
>>> does not happen. For inter-process and inter-node communication,
>>> serialization is required.
>>> Is it right?
>>>
>>> Thanks,
>>> Junguk
>>>
>>>
>

Re: Basic questions about Strom

Reply via email to