Re: 回复： can I partition streams on multiple fields? and what is the result?

magda Mon, 10 Mar 2014 01:19:23 -0700

thanks for the response!


Am 09.03.14 15:51, schrieb Kang Xiao:
>> can i do this like this?
>> builder.setBolt("bolt", new PBolt(),n*4)
>> .fieldsGrouping("A", new Fields("field1") )
>> .fieldsGrouping("B",new Fields("field1"))
>> .fieldsGrouping("C", new Fields("field1","field2"));
>>
>> Is one tuple from A or B going to be processed by all 4 excecuters
>> where tuples from C with field1 are processed? (this is what i would
>> want). if not how could i do this?
> Hi magda
> 
> One tuple form bolt A or B will just be processed by ONLY one task of
> bolt “C” if you use fieldsGrouping. 
> 
> To meet your requirement, you can implement custom grouping as follows:
> 1. customGrouping1 for bolt A and B: let m = hashcode(“field”) % n, then
> the out tasks is the tasks which taskid % m == 0. So one tuple from bolt
> A or B will be processed by a set of tasks of bolt “bolt"
> 2. customGrouping2 for bolt C: first get the same out tasks as
> customGrouping1, the use hashcode(“field2”) to choose only one out task
> out of them. So one tuple from C will be processed by one task of bolt
> “bolt” and  this task will also receive the tuples from A and B with the
> same “field1”.
> 
> -- 
> Best Regards!
> 
> 肖康(Kang Xiao,<[email protected] <mailto:[email protected]>>)
> Distributed Software Engineer
> 
> 在 2014年3月7日 星期五，2:37，magda 写道：
> 
>> hello,
>>
>> i'm having trouble getting the right idea how to parallelize my topology.
>>
>> I have a bolt subscribing to 3 streams A,B,C. All of the streams have a
>> field1 which is between 0-n. so it makes sense to fieldgroup on field1
>> with a prallism.hint of n.
>>
>> Stream A and B are hashmaps which are emitted every minute or five.
>>
>> But Stream C is the actual stream to process (the others are
>> parameters which are needed for my calculation) is huge and can/should
>> be partitioned by field2 (an id which are longs between 1 and n*1000000)
>>
>> up until now because n was just 1. i just made an allgrouping for stream
>> A and B
>>
>> builder.setBolt("bolt", new PBolt(),4)
>> .allGrouping("A").allGrouping("B")
>> .fieldsGrouping("C", new Fields("field2") );
>>
>> but now i'm trying to scale n, since all streams can be partitioned by
>> field1. i could just take group by field1 but i want to achieve a
>> higher parallelism then n.
>>
>> can i do this like this?
>> builder.setBolt("bolt", new PBolt(),n*4)
>> .fieldsGrouping("A", new Fields("field1") )
>> .fieldsGrouping("B",new Fields("field1"))
>> .fieldsGrouping("C", new Fields("field1","field2"));
>>
>> Is one tuple from A or B going to be processed by all 4 excecuters
>> where tuples from C with field1 are processed? (this is what i would
>> want). if not how could i do this?
>>
>> thanks for any help,
>> aky
>

Re: 回复： can I partition streams on multiple fields? and what is the result?

Reply via email to