Bagel message processing vs. group-by operational efficiency

Dmitriy Lyubimov Tue, 17 Dec 2013 13:24:05 -0800

Hello,

i have a quick question:


It just recently occurred to me thtat in Spark group-by is not
shuffle-and-sort but rather "shuffle-and-hash", i.e. there's no sorting
phase. Right?

In that light, a single bagel iteration should really cost just as much as
message grouping with the regular "group by key" thing.

Right?

Thank you in advance for the clarification.
-Dmitriy

Bagel message processing vs. group-by operational efficiency

Reply via email to