Hi, Both replies are really helpful.
Thanks, Junguk 2016-06-15 2:06 GMT-04:00 Navin Ipe <[email protected]>: > @Junguk: In any normal function you create in Java, you say something like > this: > public void someFunction(Integer firstValue, Float secondValue) {} > > This way, Java understands that the first parameter is an integer named > firstValue and the second parameter is a Float named second value. > > Same way, when you say declarer.declare(new Fields("word", "count")); > You are just telling Storm that when you receive a tuple, the first field > of the tuple will be some object named "word" and the second object in the > tuple will be some object named "count". Instead of "word" and "count" you > could have also named them like this and it would make no difference: > declarer.declare(new Fields("firstValue", "secondValue")); > > Now in your code when you extract the values from the tuple, you have to > know the datatypes of the "firstValue" and "secondValue". > > String w = (String) tuple.getValue(0);//firstValue > MyCountingClass mcc = (MyCountingClass) tuple.getValue(1);//secondValue > > I agree the storm tutorials are a bit confusing that way. Please see if > the tutorial I wrote is clearer: > http://nrecursions.blogspot.in/2016/04/a-simple-apache-storm-tutorial.html > > > > > > > On Sat, Jun 11, 2016 at 7:27 AM, Junguk Cho <[email protected]> wrote: > >> Hi, Jungtaek. >> >> Thank you for reply. >> >> I have following questions. >> >> 1. If we look at the example (WordCountTopology), in WordCount class, it >> uses String word = tuple.getString(0); to get string (word). >> So, I don't understand exact roles of "word" and "count". Internally, >> they use them for Map-like structure? >> To be clear, does each bolt exchange data with this format "word" : >> <data> ? >> >> About default and non-default stream, do all tuples include stream id >> whenever they send? >> >> >> 3. To be clear, if we set "false", storm does not use serialization for >> inter-process and inter-node? >> >> Thanks in advance. >> - Junguk >> >> >> >> >> 2016-06-10 18:00 GMT-04:00 Jungtaek Lim <[email protected]>: >> >>> Hi Junguk, >>> >>> 1. In declareOutputFields, you're declaring schema of output stream of >>> this component. First value of tuple will be matched to "word", and second >>> value of tuple will be matched to "count". You can access value as field >>> name or index. >>> >>> Btw, declare() declares default stream, and there're other methods which >>> declare named (non-default) stream. >>> >>> 2. When you're rebalancing topology, you're encouraged to input >>> wait-time, too. >>> Topology will be deactivated immediately so that Spout will not call >>> nextTuple(), only Bolts will be running to handle on-going tuples while >>> wait-time. >>> If there're still on-going tuples left, they will not be acked. So if >>> datasource of Spout is RabbitMQ with ack mode or Kafka or so on, Spout will >>> read them from datasource again. >>> >>> 3. Right. In order to check serialization issue earlier, there's option >>> "topology.testing.always.try.serialize" as debug purpose. Note that it >>> affects performance so it should be disabled ("false" by default) for >>> production environment. >>> >>> Hope this helps. >>> >>> Thanks, >>> Jungtaek Lim (HeartSaVioR) >>> >>> >>> 2016년 6월 11일 (토) 오전 3:27, Junguk Cho <[email protected]>님이 작성: >>> >>>> Hi, I have some basic questions. >>>> >>>> 1. About Tuple. >>>> We declare tuple in declareOutputFields. >>>> For example, declarer.declare(new Fields("word", "count")); >>>> >>>> Are "word" and "count" forwarded to next node with actual data? >>>> What are the roles of "word" and "count" here internally? >>>> >>>> >>>> 2. About rebalancing ( >>>> http://storm.apache.org/releases/1.0.1/Understanding-the-parallelism-of-a-Storm-topology.html >>>> ) >>>> >>>> In storm, there is rebalancing capability. >>>> What happened on-going tuples while storm rebalances topology? >>>> Does it drop and replay? >>>> >>>> 3. Serialization. >>>> In storm, as far as I know for inter-thread communication, >>>> serialization does not happen. For inter-process and inter-node >>>> communication, serialization is required. >>>> Is it right? >>>> >>>> Thanks, >>>> Junguk >>>> >>>> >> > > > -- > Regards, > Navin >
