Re: Bagel message processing vs. group-by operational efficiency

Dmitriy Lyubimov Thu, 19 Dec 2013 18:08:11 -0800

thanks. that's what i thought.

The biggest thing that might make it really good for distributed matrices
is message broadcast to several nodes (i think original Pregel has that).
Oh well i guess i will wait for GraphX -- that one should have it, i
suppose :)



On Thu, Dec 19, 2013 at 2:51 PM, Matei Zaharia <[email protected]>wrote:

> No, it’s just that the latency is not always predictable :)
>
> But yes, Bagel uses hashing operations like groupByKey and reduceByKey
> (depending on whether you have a combiner). You could implement the same
> thing by hand. The Bagel source code is only about 200 lines actually.
>
> Matei
>
> On Dec 19, 2013, at 11:46 AM, Dmitriy Lyubimov <[email protected]> wrote:
>
> I guess Bagel-related questions are ignored, possibly because Bagel is
> slated for retirement?
>
>
> On Tue, Dec 17, 2013 at 10:35 AM, Dmitriy Lyubimov <[email protected]>wrote:
>
>> Hello,
>>
>> i have a quick question:
>>
>> It just recently occurred to me thtat in Spark group-by is not
>> shuffle-and-sort but rather "shuffle-and-hash", i.e. there's no sorting
>> phase. Right?
>>
>> In that light, a single bagel iteration should really cost just as much
>> as message grouping with the regular "group by key" thing.
>>
>> Right?
>>
>> Thank you in advance for the clarification.
>> -Dmitriy
>>
>
>
>

Re: Bagel message processing vs. group-by operational efficiency

Reply via email to