@Junguk: In any normal function you create in Java, you say something like
this:
public void someFunction(Integer firstValue, Float secondValue) {}
This way, Java understands that the first parameter is an integer named
firstValue and the second parameter is a Float named second value.
Same way, when you say declarer.declare(new Fields("word", "count"));
You are just telling Storm that when you receive a tuple, the first field
of the tuple will be some object named "word" and the second object in the
tuple will be some object named "count". Instead of "word" and "count" you
could have also named them like this and it would make no difference:
declarer.declare(new Fields("firstValue", "secondValue"));
Now in your code when you extract the values from the tuple, you have to
know the datatypes of the "firstValue" and "secondValue".
String w = (String) tuple.getValue(0);//firstValue
MyCountingClass mcc = (MyCountingClass) tuple.getValue(1);//secondValue
I agree the storm tutorials are a bit confusing that way. Please see if the
tutorial I wrote is clearer:
http://nrecursions.blogspot.in/2016/04/a-simple-apache-storm-tutorial.html
On Sat, Jun 11, 2016 at 7:27 AM, Junguk Cho <[email protected]> wrote:
> Hi, Jungtaek.
>
> Thank you for reply.
>
> I have following questions.
>
> 1. If we look at the example (WordCountTopology), in WordCount class, it
> uses String word = tuple.getString(0); to get string (word).
> So, I don't understand exact roles of "word" and "count". Internally,
> they use them for Map-like structure?
> To be clear, does each bolt exchange data with this format "word" :
> <data> ?
>
> About default and non-default stream, do all tuples include stream id
> whenever they send?
>
>
> 3. To be clear, if we set "false", storm does not use serialization for
> inter-process and inter-node?
>
> Thanks in advance.
> - Junguk
>
>
>
>
> 2016-06-10 18:00 GMT-04:00 Jungtaek Lim <[email protected]>:
>
>> Hi Junguk,
>>
>> 1. In declareOutputFields, you're declaring schema of output stream of
>> this component. First value of tuple will be matched to "word", and second
>> value of tuple will be matched to "count". You can access value as field
>> name or index.
>>
>> Btw, declare() declares default stream, and there're other methods which
>> declare named (non-default) stream.
>>
>> 2. When you're rebalancing topology, you're encouraged to input
>> wait-time, too.
>> Topology will be deactivated immediately so that Spout will not call
>> nextTuple(), only Bolts will be running to handle on-going tuples while
>> wait-time.
>> If there're still on-going tuples left, they will not be acked. So if
>> datasource of Spout is RabbitMQ with ack mode or Kafka or so on, Spout will
>> read them from datasource again.
>>
>> 3. Right. In order to check serialization issue earlier, there's option
>> "topology.testing.always.try.serialize" as debug purpose. Note that it
>> affects performance so it should be disabled ("false" by default) for
>> production environment.
>>
>> Hope this helps.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>>
>> 2016년 6월 11일 (토) 오전 3:27, Junguk Cho <[email protected]>님이 작성:
>>
>>> Hi, I have some basic questions.
>>>
>>> 1. About Tuple.
>>> We declare tuple in declareOutputFields.
>>> For example, declarer.declare(new Fields("word", "count"));
>>>
>>> Are "word" and "count" forwarded to next node with actual data?
>>> What are the roles of "word" and "count" here internally?
>>>
>>>
>>> 2. About rebalancing (
>>> http://storm.apache.org/releases/1.0.1/Understanding-the-parallelism-of-a-Storm-topology.html
>>> )
>>>
>>> In storm, there is rebalancing capability.
>>> What happened on-going tuples while storm rebalances topology?
>>> Does it drop and replay?
>>>
>>> 3. Serialization.
>>> In storm, as far as I know for inter-thread communication, serialization
>>> does not happen. For inter-process and inter-node communication,
>>> serialization is required.
>>> Is it right?
>>>
>>> Thanks,
>>> Junguk
>>>
>>>
>
--
Regards,
Navin