Hi Shuo,

Seeing a lot of group errors in log file is expected. From 
http://storm.apache.org/documentation/Concepts.html the description of Field 
Grouping says


  1.  Fields grouping: The stream is partitioned by the fields specified in the 
grouping. For example, if the stream is grouped by the "user-id" field, tuples 
with the same "user-id" will always go to the same task, but tuples with 
different "user-id"'s may go to different tasks.

It means the tuples with same values for field A and B will always go to the 
same task but it does not mean that tuples with other vales for field A and B 
cannot go to the same task. For e.g. If your input data has following tuples

1, 2, 3, 4
1, 2, 5, 6
3, 4, 5, 6

In the above scenario the first two tuples are guaranteed to go to the same 
task but the third tuple can also go to the same task, specially when 
parallelism hint is set to 1 for BoltY. There is no other task. Think about it 
like hashcode method in java. Equal objects always have same hash codes but two 
different objects can have the same hashcode.

From: Shuo Chen
Reply-To: "[email protected]<mailto:[email protected]>"
Date: Tuesday, November 3, 2015 at 6:46 PM
To: "[email protected]<mailto:[email protected]>"
Subject: multiple fields grouping in storm


I have two Bolt class BoltX and BoltY. BoltY receives tuples from BoltX. BoltX 
declares output with multiple fields, each tuple contains 4 strings:

class BoltX implements IBasicBolt {
    ...
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("A","B","C","D"));
    }}

In BoltY:

class BoltX implements IBasicBolt {
    boolean hasReceive = false;
    String A = null;
    String B = null;
    ...
    public void execute(Tuple input, BasicOutputCollector collector) {
        if (!hasReceive) {
            hasReceive = true;
            A = input.getString(0);
            B = input.getString(1);
        }

        if (!input.getString(0).equals(A) || !input.getString(1).equals(B)) {
            LOG.error("group error");
            return;
        }
        ...
    }
    ...}

In Topology:

...
builder.setBolt("x", new BoltX(), 3);
builder.setBolt("y", new Bolty(), 3).fieldsGrouping("x", new Fields("A", 
"B"));...

I think that the output from x with same fields "A" and "B" will go to the same 
task of BoltY.

However, the log of topology shows lots of "group error".

So how to group outputs with same fields "A" and "B" to the same task of BoltY?

The question is also asked in 
http://stackoverflow.com/questions/33512554/multiple-fields-grouping-in-storm

--
Shuo Chen
[email protected]<mailto:[email protected]>
[email protected]<mailto:[email protected]>

Reply via email to