> can i do this like this?
> builder.setBolt("bolt", new PBolt(),n*4)
> .fieldsGrouping("A", new Fields("field1") )
> .fieldsGrouping("B",new Fields("field1"))
> .fieldsGrouping("C", new Fields("field1","field2"));
>
> Is one tuple from A or B going to be processed by all 4 excecuters
> where tuples from C with field1 are processed? (this is what i would
> want). if not how could i do this?
Hi magda
One tuple form bolt A or B will just be processed by ONLY one task of bolt “C”
if you use fieldsGrouping.
To meet your requirement, you can implement custom grouping as follows:
1. customGrouping1 for bolt A and B: let m = hashcode(“field”) % n, then the
out tasks is the tasks which taskid % m == 0. So one tuple from bolt A or B
will be processed by a set of tasks of bolt “bolt"
2. customGrouping2 for bolt C: first get the same out tasks as customGrouping1,
the use hashcode(“field2”) to choose only one out task out of them. So one
tuple from C will be processed by one task of bolt “bolt” and this task will
also receive the tuples from A and B with the same “field1”.
--
Best Regards!
肖康(Kang Xiao,<[email protected] (mailto:[email protected])>)
Distributed Software Engineer
在 2014年3月7日 星期五,2:37,magda 写道:
> hello,
>
> i'm having trouble getting the right idea how to parallelize my topology.
>
> I have a bolt subscribing to 3 streams A,B,C. All of the streams have a
> field1 which is between 0-n. so it makes sense to fieldgroup on field1
> with a prallism.hint of n.
>
> Stream A and B are hashmaps which are emitted every minute or five.
>
> But Stream C is the actual stream to process (the others are
> parameters which are needed for my calculation) is huge and can/should
> be partitioned by field2 (an id which are longs between 1 and n*1000000)
>
> up until now because n was just 1. i just made an allgrouping for stream
> A and B
>
> builder.setBolt("bolt", new PBolt(),4)
> .allGrouping("A").allGrouping("B")
> .fieldsGrouping("C", new Fields("field2") );
>
> but now i'm trying to scale n, since all streams can be partitioned by
> field1. i could just take group by field1 but i want to achieve a
> higher parallelism then n.
>
> can i do this like this?
> builder.setBolt("bolt", new PBolt(),n*4)
> .fieldsGrouping("A", new Fields("field1") )
> .fieldsGrouping("B",new Fields("field1"))
> .fieldsGrouping("C", new Fields("field1","field2"));
>
> Is one tuple from A or B going to be processed by all 4 excecuters
> where tuples from C with field1 are processed? (this is what i would
> want). if not how could i do this?
>
> thanks for any help,
> aky
>
>