Hello,

i have a quick question:

It just recently occurred to me thtat in Spark group-by is not
shuffle-and-sort but rather "shuffle-and-hash", i.e. there's no sorting
phase. Right?

In that light, a single bagel iteration should really cost just as much as
message grouping with the regular "group by key" thing.

Right?

Thank you in advance for the clarification.
-Dmitriy

Reply via email to