Thank you guys, I was looking for a benchmark in order to add it to an "official" document as a reference. Somebody asked me to do this, and I agree that the choice of the grouping policy 90% depends on the business logic. I was just wondering I there was any "public result" I wasn't able to find!
Thank you again. On Wed, Mar 5, 2014 at 7:22 PM, Michael Rose <[email protected]>wrote: > +1, localOrShuffle will be a winner, as long as it's evenly distributing > work. If 1 tuple could say produce a variable 1-100 resultant tuples (and > these results were expensive enough to process, e.g. IO), it might well be > worth shuffling vs. localShuffling. > > Michael Rose (@Xorlev <https://twitter.com/xorlev>) > Senior Platform Engineer, FullContact <http://www.fullcontact.com/> > [email protected] > > > On Wed, Mar 5, 2014 at 11:19 AM, Nathan Leung <[email protected]> wrote: > >> In my experience on a 1 Gb network localOrShuffleGrouping was a clear >> winner in terms of performance. But I haven't tested with 10 Gb, and if >> you have substantial business logic then that becomes a bigger factor than >> serializing/transferring data on the network. I think the performance of >> any given grouping is too dependent on your business logic; it will be >> difficult to quantify how well it performs in a canned benchmark. And >> sometimes your business logic will define a grouping for you (e.g. fields >> grouping) whether it's the best performer or not. >> >> >> On Wed, Mar 5, 2014 at 1:05 PM, Roberto Coluccio < >> [email protected]> wrote: >> >>> Hello Michael, thanks for your feedback. >>> >>> I'm looking for a performance comparison. I know that not all the >>> policies are "really comparable", but even obvious comparisons all listed >>> together could be a useful reference. >>> >>> Roberto >>> >>> >>> On Wed, Mar 5, 2014 at 6:58 PM, Michael Rose <[email protected]>wrote: >>> >>>> What kind of comparisons are you looking for? How they functionally >>>> work? >>>> >>>> Michael Rose (@Xorlev <https://twitter.com/xorlev>) >>>> Senior Platform Engineer, FullContact <http://www.fullcontact.com/> >>>> [email protected] >>>> >>>> >>>> On Wed, Mar 5, 2014 at 9:52 AM, Roberto Coluccio < >>>> [email protected]> wrote: >>>> >>>>> Hello folks, >>>>> >>>>> I was unable to find any complete example (or, better, related work in >>>>> the scientific literature) in which (almost) all the *stream grouping >>>>> policies* have been used and compared. Do you have any reference you >>>>> could please share with me? >>>>> >>>>> Thank you and best regards, >>>>> >>>>> Roberto Coluccio >>>>> >>>> >>>> >>> >> >
