Hi Junguk,
1. In declareOutputFields, you're declaring schema of output stream of this
component. First value of tuple will be matched to "word", and second value
of tuple will be matched to "count". You can access value as field name or
index.
Btw, declare() declares default stream, and there're other methods which
declare named (non-default) stream.
2. When you're rebalancing topology, you're encouraged to input wait-time,
too.
Topology will be deactivated immediately so that Spout will not call
nextTuple(), only Bolts will be running to handle on-going tuples while
wait-time.
If there're still on-going tuples left, they will not be acked. So if
datasource of Spout is RabbitMQ with ack mode or Kafka or so on, Spout will
read them from datasource again.
3. Right. In order to check serialization issue earlier, there's option
"topology.testing.always.try.serialize" as debug purpose. Note that it
affects performance so it should be disabled ("false" by default) for
production environment.
Hope this helps.
Thanks,
Jungtaek Lim (HeartSaVioR)
2016년 6월 11일 (토) 오전 3:27, Junguk Cho <[email protected]>님이 작성:
> Hi, I have some basic questions.
>
> 1. About Tuple.
> We declare tuple in declareOutputFields.
> For example, declarer.declare(new Fields("word", "count"));
>
> Are "word" and "count" forwarded to next node with actual data?
> What are the roles of "word" and "count" here internally?
>
>
> 2. About rebalancing (
> http://storm.apache.org/releases/1.0.1/Understanding-the-parallelism-of-a-Storm-topology.html
> )
>
> In storm, there is rebalancing capability.
> What happened on-going tuples while storm rebalances topology?
> Does it drop and replay?
>
> 3. Serialization.
> In storm, as far as I know for inter-thread communication, serialization
> does not happen. For inter-process and inter-node communication,
> serialization is required.
> Is it right?
>
> Thanks,
> Junguk
>
>