Hi, Jungtaek. Thank you for reply.
I have following questions. 1. If we look at the example (WordCountTopology), in WordCount class, it uses String word = tuple.getString(0); to get string (word). So, I don't understand exact roles of "word" and "count". Internally, they use them for Map-like structure? To be clear, does each bolt exchange data with this format "word" : <data> ? About default and non-default stream, do all tuples include stream id whenever they send? 3. To be clear, if we set "false", storm does not use serialization for inter-process and inter-node? Thanks in advance. - Junguk 2016-06-10 18:00 GMT-04:00 Jungtaek Lim <[email protected]>: > Hi Junguk, > > 1. In declareOutputFields, you're declaring schema of output stream of > this component. First value of tuple will be matched to "word", and second > value of tuple will be matched to "count". You can access value as field > name or index. > > Btw, declare() declares default stream, and there're other methods which > declare named (non-default) stream. > > 2. When you're rebalancing topology, you're encouraged to input wait-time, > too. > Topology will be deactivated immediately so that Spout will not call > nextTuple(), only Bolts will be running to handle on-going tuples while > wait-time. > If there're still on-going tuples left, they will not be acked. So if > datasource of Spout is RabbitMQ with ack mode or Kafka or so on, Spout will > read them from datasource again. > > 3. Right. In order to check serialization issue earlier, there's option > "topology.testing.always.try.serialize" as debug purpose. Note that it > affects performance so it should be disabled ("false" by default) for > production environment. > > Hope this helps. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > > 2016년 6월 11일 (토) 오전 3:27, Junguk Cho <[email protected]>님이 작성: > >> Hi, I have some basic questions. >> >> 1. About Tuple. >> We declare tuple in declareOutputFields. >> For example, declarer.declare(new Fields("word", "count")); >> >> Are "word" and "count" forwarded to next node with actual data? >> What are the roles of "word" and "count" here internally? >> >> >> 2. About rebalancing ( >> http://storm.apache.org/releases/1.0.1/Understanding-the-parallelism-of-a-Storm-topology.html >> ) >> >> In storm, there is rebalancing capability. >> What happened on-going tuples while storm rebalances topology? >> Does it drop and replay? >> >> 3. Serialization. >> In storm, as far as I know for inter-thread communication, serialization >> does not happen. For inter-process and inter-node communication, >> serialization is required. >> Is it right? >> >> Thanks, >> Junguk >> >>
