No, it’s just that the latency is not always predictable :)

But yes, Bagel uses hashing operations like groupByKey and reduceByKey 
(depending on whether you have a combiner). You could implement the same thing 
by hand. The Bagel source code is only about 200 lines actually.

Matei

On Dec 19, 2013, at 11:46 AM, Dmitriy Lyubimov <[email protected]> wrote:

> I guess Bagel-related questions are ignored, possibly because Bagel is slated 
> for retirement?
> 
> 
> On Tue, Dec 17, 2013 at 10:35 AM, Dmitriy Lyubimov <[email protected]> wrote:
> Hello, 
> 
> i have a quick question: 
> 
> It just recently occurred to me thtat in Spark group-by is not 
> shuffle-and-sort but rather "shuffle-and-hash", i.e. there's no sorting 
> phase. Right?
> 
> In that light, a single bagel iteration should really cost just as much as 
> message grouping with the regular "group by key" thing. 
> 
> Right?
> 
> Thank you in advance for the clarification.
> -Dmitriy
> 

Reply via email to