回复： can I partition streams on multiple fields? and what is the result?

Kang Xiao Sun, 09 Mar 2014 07:52:25 -0700

> can i do this like this?
> builder.setBolt("bolt", new PBolt(),n*4)
> .fieldsGrouping("A", new Fields("field1") )
> .fieldsGrouping("B",new Fields("field1"))
> .fieldsGrouping("C", new Fields("field1","field2"));
>  
> Is one tuple from A or B going to be processed by all 4 excecuters
> where tuples from C with field1 are processed? (this is what i would
> want). if not how could i do this?



Hi magda

One tuple form bolt A or B will just be processed by ONLY one task of bolt “C” 
if you use fieldsGrouping.  

To meet your requirement, you can implement custom grouping as follows:
1. customGrouping1 for bolt A and B: let m = hashcode(“field”) % n, then the 
out tasks is the tasks which taskid % m == 0. So one tuple from bolt A or B 
will be processed by a set of tasks of bolt “bolt"
2. customGrouping2 for bolt C: first get the same out tasks as customGrouping1, 
the use hashcode(“field2”) to choose only one out task out of them. So one 
tuple from C will be processed by one task of bolt “bolt” and  this task will 
also receive the tuples from A and B with the same “field1”.

--  
Best Regards!

肖康(Kang Xiao,<[email protected] (mailto:[email protected])>)
Distributed Software Engineer


在 2014年3月7日 星期五，2:37，magda 写道：

> hello,
>  
> i'm having trouble getting the right idea how to parallelize my topology.
>  
> I have a bolt subscribing to 3 streams A,B,C. All of the streams have a
> field1 which is between 0-n. so it makes sense to fieldgroup on field1
> with a prallism.hint of n.
>  
> Stream A and B are hashmaps which are emitted every minute or five.
>  
> But Stream C is the actual stream to process (the others are
> parameters which are needed for my calculation) is huge and can/should
> be partitioned by field2 (an id which are longs between 1 and n*1000000)
>  
> up until now because n was just 1. i just made an allgrouping for stream
> A and B
>  
> builder.setBolt("bolt", new PBolt(),4)
> .allGrouping("A").allGrouping("B")
> .fieldsGrouping("C", new Fields("field2") );
>  
> but now i'm trying to scale n, since all streams can be partitioned by
> field1. i could just take group by field1 but i want to achieve a
> higher parallelism then n.
>  
> can i do this like this?
> builder.setBolt("bolt", new PBolt(),n*4)
> .fieldsGrouping("A", new Fields("field1") )
> .fieldsGrouping("B",new Fields("field1"))
> .fieldsGrouping("C", new Fields("field1","field2"));
>  
> Is one tuple from A or B going to be processed by all 4 excecuters
> where tuples from C with field1 are processed? (this is what i would
> want). if not how could i do this?
>  
> thanks for any help,
> aky
>  
>

回复： can I partition streams on multiple fields? and what is the result?

Reply via email to