For your case, if messages have the same field value, they will be send to only one executor in whole topology.
Best regards, Dmytro Dragan On Jun 8, 2015 08:31, "Seungtack Baek" <[email protected]> wrote: > Thanks a lot for such a timely response. > > So, even if each bolt tasks resides in different worker (different server > in our use-case), the messages go to all 32 tasks, right? > > Also, this leads me into another question. (I think the answer is yes). > Given field grouping guarantees that messages with same "field value" go > to the same task, does "the same task" mean across all workers? or within > same worker. > > For example, let's two kafka partition 0, 1, spout task s1, s2 and bolt > tasks b1, b2, b3 and b4 distributed across two workers w1 and w2. > So it looks like, > w1 > - partition_0 -> s1 -> b1 & b2 > w2 > - partition_1 -> s2 -> b3 & b4 > > When two messages with same field value, m1 and m2 are produced to kafka > partition 0 and 1, respectively, does both m1 and m2 go to same bolt, say > b3? Or, does it get sent to same bolt in each worker (say b1 in w1 and b3 > in w3)? > > Simply put, does field grouping groups messages in whole topology? or only > groups in a single worker? > > Thanks, > Baek > > > > > > *Seungtack Baek | Precocity, LLC* > > Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715 > > *[email protected] <[email protected]>* | > www.precocityllc.com > > > This is the end of this message. > > -- > > On Mon, Jun 8, 2015 at 12:17 AM, Dima Dragan <[email protected]> > wrote: > >> Hi, Seungtack! >> >> Distribution of messages will be depends only from grouping (in case of >> "shuffe grouping", Tuples are randomly distributed across the all bolt's >> tasks in a way such that each bolt is guaranteed to get an equal number of >> tuples. >> >> Best regards, >> Dmytro Dragan >> On Jun 8, 2015 07:12, "Seungtack Baek" <[email protected]> >> wrote: >> >>> Hi, >>> >>> I have read from the documentation that if you have more spout tasks >>> than kafka partition, the excessive tasks will remain idle for entire >>> lifecycle of the topology. >>> >>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4 >>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be >>> assigned to each partitions in kafka and the other 2 will remain idle. >>> However, does that mean that only the bolts within the same worker will get >>> the messages (assuming shuffle grouping)? Or, do the messages get emitted >>> to whatever bolt taks available, regardless of which worker? >>> >>> Thanks, >>> Baek >>> >>> >>> *Seungtack Baek | Precocity, LLC* >>> >>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715 >>> >>> *[email protected] <[email protected]>* | >>> www.precocityllc.com >>> >>> >>> This is the end of this message. >>> >>> -- >>> >>> On Sun, Jun 7, 2015 at 10:12 PM, Seungtack Baek < >>> [email protected]> wrote: >>> >>>> Hi, >>>> >>>> I have read from the documentation that if you have more spout tasks >>>> than kafka partition, the excessive tasks will remain idle for entire >>>> lifecycle of the topology. >>>> >>>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4 >>>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be >>>> assigned to each partitions in kafka and the other 2 will remain idle. >>>> However, does that mean that only the bolts within the same worker will get >>>> the messages (assuming shuffle grouping)? Or, do the messages get emitted >>>> to whatever bolt taks available, regardless of which worker? >>>> >>>> Thanks, >>>> Baek >>>> >>>> >>>> *Seungtack Baek | Precocity, LLC* >>>> >>>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715 >>>> >>>> *[email protected] <[email protected]>* | >>>> www.precocityllc.com >>>> >>>> >>>> This is the end of this message. >>>> >>>> -- >>>> >>> >>> >
