Hey Javier, Thanks a bunch for your thoughts and insights--much appreciated! Regarding local vs. remote messaging, hopefully local will occur if possible. With what I am seeing scheduler-wise (one Bolt 2, 4 Bolt 1 executors per worker), I am really hopeful I am getting local messaging. I am going to profile it to see the IMAX distruptor vs Netty communication events to confirm.
Thanks again, Javier! --John On Mon, Oct 5, 2015 at 11:38 AM, Javier Gonzalez <[email protected]> wrote: > If you get one bolt2 per worker, it should work as you say. Though I'm not > completely sure it's *guaranteed* that every mesage will go local. > > Regards, > Javier > On Oct 5, 2015 10:01 AM, "John Yost" <[email protected]> wrote: > >> Hi Javier, >> >> I apologize, I don't think I am making myself clear. I am attempting to >> get all the tuples for a given key sent to the same Bolt 2 executor >> instance. I previously followed the pattern of using fieldsGrouping on >> Bolt1 as this is a well-established pattern. However, there are roughly 4 >> times as many Bolt 1 executors to every Bolt 2 executor, and I was finding >> the throughput was very low between Bolts 1 and 2. Once I switched to >> localOrShuffleGrouping between Bolt 1 and Bolt 2, the throughput tripled. I >> did this based upon advice from this board to do localOrShuffleGrouping for >> large fan-in patterns like this (great advice, definitely worked great!). >> >> Unfortunately, this also means that there is no guarantee that all tuples >> for a given key will be sent to the same Bolt 2. To hopefully get the best >> of both worlds, I am thinking I can do the fieldsGrouping between >> KafkaSpout and Bolt 1, and therefore I get the same effect of all tuples >> for a given key going to the same Bolt 2. Of course, the key (pun intended) >> is that there is one Bolt 2 per worker, which will ensure all tuples for >> the same key will go to the same Bolt 1 which will then forward 'em to Bolt >> 2. >> >> Please confirm if this seems logical and that it should work. I think it >> should, but I may be missing something. >> >> Thanks! :) >> >> --John >> >> On Mon, Oct 5, 2015 at 9:20 AM, Javier Gonzalez <[email protected]> >> wrote: >> >>> If I'm reading this correctly, I think you're not getting the result you >>> want - having all tuples with a given key processed in the same bolt2 >>> instance. >>> >>> If you want to have all messages of a given key to be processed in the >>> same Bolt2, you need to do fields grouping from bolt1 to bolt2. By doing >>> fields grouping in the spout-bolt1 hop and shuffle/local in the bolt1-bolt2 >>> hop, you're ensuring that bolt1 instances always see the same key, but is >>> there any guarantee that the bolt2 you want is the nearest/only local bolt >>> available to any given instance of bolt1? >>> >>> Regards, >>> Javier >>> On Oct 5, 2015 7:33 AM, "John Yost" <[email protected]> wrote: >>> >>>> Hi Everyone, >>>> >>>> I am currently prototyping FieldsGrouping at the KafkaSpout vs Bolt >>>> level. I am curious as to whether anyone else has tried this and, if so, >>>> how well this worked. >>>> >>>> The reason I am attempting to do FieldsGrouping in the KafkaSpout is >>>> that I moved from fieldsGrouping to localOrShuffleGrouping between Bolt 1 >>>> and Bolt 2 in my topology due to a 4 to 1 fan in from Bolt 1 to Bolt 2 (for >>>> example, 200 Bolt 1 executors and 50 Bolt 2 executors) which was >>>> dramatically slowing throughput. It is still highly preferable to do >>>> fieldsGrouping one way or another so that I am getting all values for a >>>> given key to the same Bolt 2 executor, which is the impetus for attempting >>>> to do fieldsGrouping in the KafkaSpout. >>>> >>>> If anyone has any thoughts on this approach, I'd very much like to get >>>> your thoughts. >>>> >>>> Thanks >>>> >>>> --John >>>> >>> >>
