No, it’s just that the latency is not always predictable :) But yes, Bagel uses hashing operations like groupByKey and reduceByKey (depending on whether you have a combiner). You could implement the same thing by hand. The Bagel source code is only about 200 lines actually.
Matei On Dec 19, 2013, at 11:46 AM, Dmitriy Lyubimov <[email protected]> wrote: > I guess Bagel-related questions are ignored, possibly because Bagel is slated > for retirement? > > > On Tue, Dec 17, 2013 at 10:35 AM, Dmitriy Lyubimov <[email protected]> wrote: > Hello, > > i have a quick question: > > It just recently occurred to me thtat in Spark group-by is not > shuffle-and-sort but rather "shuffle-and-hash", i.e. there's no sorting > phase. Right? > > In that light, a single bagel iteration should really cost just as much as > message grouping with the regular "group by key" thing. > > Right? > > Thank you in advance for the clarification. > -Dmitriy >
