Hi Shuo, Seeing a lot of group errors in log file is expected. From http://storm.apache.org/documentation/Concepts.html the description of Field Grouping says
1. Fields grouping: The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped by the "user-id" field, tuples with the same "user-id" will always go to the same task, but tuples with different "user-id"'s may go to different tasks. It means the tuples with same values for field A and B will always go to the same task but it does not mean that tuples with other vales for field A and B cannot go to the same task. For e.g. If your input data has following tuples 1, 2, 3, 4 1, 2, 5, 6 3, 4, 5, 6 In the above scenario the first two tuples are guaranteed to go to the same task but the third tuple can also go to the same task, specially when parallelism hint is set to 1 for BoltY. There is no other task. Think about it like hashcode method in java. Equal objects always have same hash codes but two different objects can have the same hashcode. From: Shuo Chen Reply-To: "[email protected]<mailto:[email protected]>" Date: Tuesday, November 3, 2015 at 6:46 PM To: "[email protected]<mailto:[email protected]>" Subject: multiple fields grouping in storm I have two Bolt class BoltX and BoltY. BoltY receives tuples from BoltX. BoltX declares output with multiple fields, each tuple contains 4 strings: class BoltX implements IBasicBolt { ... public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("A","B","C","D")); }} In BoltY: class BoltX implements IBasicBolt { boolean hasReceive = false; String A = null; String B = null; ... public void execute(Tuple input, BasicOutputCollector collector) { if (!hasReceive) { hasReceive = true; A = input.getString(0); B = input.getString(1); } if (!input.getString(0).equals(A) || !input.getString(1).equals(B)) { LOG.error("group error"); return; } ... } ...} In Topology: ... builder.setBolt("x", new BoltX(), 3); builder.setBolt("y", new Bolty(), 3).fieldsGrouping("x", new Fields("A", "B"));... I think that the output from x with same fields "A" and "B" will go to the same task of BoltY. However, the log of topology shows lots of "group error". So how to group outputs with same fields "A" and "B" to the same task of BoltY? The question is also asked in http://stackoverflow.com/questions/33512554/multiple-fields-grouping-in-storm -- Shuo Chen [email protected]<mailto:[email protected]> [email protected]<mailto:[email protected]>
