Specifically about fields grouping, I've added a tutorial. It's very confusing to understand other tutorials on the internet because of the lack of explanation about the strings used as the fields. http://nrecursions.blogspot.in/2016/09/understanding-fields-grouping-in-apache.html
On Thu, Jun 16, 2016 at 11:57 PM, Junguk Cho <[email protected]> wrote: > Hi, > > Both replies are really helpful. > > Thanks, > Junguk > > 2016-06-15 2:06 GMT-04:00 Navin Ipe <[email protected]>: > >> @Junguk: In any normal function you create in Java, you say something >> like this: >> public void someFunction(Integer firstValue, Float secondValue) {} >> >> This way, Java understands that the first parameter is an integer named >> firstValue and the second parameter is a Float named second value. >> >> Same way, when you say declarer.declare(new Fields("word", "count")); >> You are just telling Storm that when you receive a tuple, the first field >> of the tuple will be some object named "word" and the second object in the >> tuple will be some object named "count". Instead of "word" and "count" you >> could have also named them like this and it would make no difference: >> declarer.declare(new Fields("firstValue", "secondValue")); >> >> Now in your code when you extract the values from the tuple, you have to >> know the datatypes of the "firstValue" and "secondValue". >> >> String w = (String) tuple.getValue(0);//firstValue >> MyCountingClass mcc = (MyCountingClass) tuple.getValue(1);//secondValue >> >> I agree the storm tutorials are a bit confusing that way. Please see if >> the tutorial I wrote is clearer: http://nrecursions.blogspot. >> in/2016/04/a-simple-apache-storm-tutorial.html >> >> >> >> >> >> >> On Sat, Jun 11, 2016 at 7:27 AM, Junguk Cho <[email protected]> wrote: >> >>> Hi, Jungtaek. >>> >>> Thank you for reply. >>> >>> I have following questions. >>> >>> 1. If we look at the example (WordCountTopology), in WordCount class, it >>> uses String word = tuple.getString(0); to get string (word). >>> So, I don't understand exact roles of "word" and "count". Internally, >>> they use them for Map-like structure? >>> To be clear, does each bolt exchange data with this format "word" : >>> <data> ? >>> >>> About default and non-default stream, do all tuples include stream id >>> whenever they send? >>> >>> >>> 3. To be clear, if we set "false", storm does not use serialization for >>> inter-process and inter-node? >>> >>> Thanks in advance. >>> - Junguk >>> >>> >>> >>> >>> 2016-06-10 18:00 GMT-04:00 Jungtaek Lim <[email protected]>: >>> >>>> Hi Junguk, >>>> >>>> 1. In declareOutputFields, you're declaring schema of output stream of >>>> this component. First value of tuple will be matched to "word", and second >>>> value of tuple will be matched to "count". You can access value as field >>>> name or index. >>>> >>>> Btw, declare() declares default stream, and there're other methods >>>> which declare named (non-default) stream. >>>> >>>> 2. When you're rebalancing topology, you're encouraged to input >>>> wait-time, too. >>>> Topology will be deactivated immediately so that Spout will not call >>>> nextTuple(), only Bolts will be running to handle on-going tuples while >>>> wait-time. >>>> If there're still on-going tuples left, they will not be acked. So if >>>> datasource of Spout is RabbitMQ with ack mode or Kafka or so on, Spout will >>>> read them from datasource again. >>>> >>>> 3. Right. In order to check serialization issue earlier, there's option >>>> "topology.testing.always.try.serialize" as debug purpose. Note that it >>>> affects performance so it should be disabled ("false" by default) for >>>> production environment. >>>> >>>> Hope this helps. >>>> >>>> Thanks, >>>> Jungtaek Lim (HeartSaVioR) >>>> >>>> >>>> 2016년 6월 11일 (토) 오전 3:27, Junguk Cho <[email protected]>님이 작성: >>>> >>>>> Hi, I have some basic questions. >>>>> >>>>> 1. About Tuple. >>>>> We declare tuple in declareOutputFields. >>>>> For example, declarer.declare(new Fields("word", "count")); >>>>> >>>>> Are "word" and "count" forwarded to next node with actual data? >>>>> What are the roles of "word" and "count" here internally? >>>>> >>>>> >>>>> 2. About rebalancing (http://storm.apache.org/ >>>>> releases/1.0.1/Understanding-the-parallelism-of-a-Storm-topology.html) >>>>> >>>>> In storm, there is rebalancing capability. >>>>> What happened on-going tuples while storm rebalances topology? >>>>> Does it drop and replay? >>>>> >>>>> 3. Serialization. >>>>> In storm, as far as I know for inter-thread communication, >>>>> serialization does not happen. For inter-process and inter-node >>>>> communication, serialization is required. >>>>> Is it right? >>>>> >>>>> Thanks, >>>>> Junguk >>>>> >>>>> >>> >> >> >> -- >> Regards, >> Navin >> > > -- Regards, Navin
