Hi Ankur,

Given enough memory and proper caching, I don't understand why is this the
case?

GraphX may actually be slower when Spark is configured to launch many tasks
per machine, because shuffle communication between Spark tasks on the same
machine still occurs by reading and writing from disk, while GraphLab uses
shared memory for same-machine communication

Could you please elaborate more on it ?

Thanks.
Deb



On Mon, Mar 24, 2014 at 1:01 PM, Ankur Dave <ankurd...@gmail.com> wrote:

> Hi Niko,
>
> The GraphX team recently wrote a longer paper with more benchmarks and
> optimizations: http://arxiv.org/abs/1402.2394
>
> Regarding the performance of GraphX vs. GraphLab, I believe GraphX
> currently outperforms GraphLab only in end-to-end benchmarks of pipelines
> involving both graph-parallel operations (e.g. PageRank) and data-parallel
> operations (e.g. ETL and data cleaning). This is due to the overhead of
> moving data between GraphLab and a data-parallel system like Spark. There's
> an example of a pipeline in Section 5.2 in the linked paper, and the
> results are in Figure 10 on page 11.
>
> GraphX has a very similar architecture as GraphLab, so I wouldn't expect
> it to have better performance on pure graph algorithms. GraphX may actually
> be slower when Spark is configured to launch many tasks per machine,
> because shuffle communication between Spark tasks on the same machine still
> occurs by reading and writing from disk, while GraphLab uses shared memory
> for same-machine communication.
>
> I've CC'd Joey and Reynold as well.
>
> Ankur <http://www.ankurdave.com/>
>
> On Mar 24, 2014 11:00 AM, "Niko Stahl" <r.niko.st...@gmail.com> wrote:
>
>> I'm interested in extending the comparison between GraphX and GraphLab
>> presented in Xin et. al (2013). The evaluation presented there is rather
>> limited as it only compares the frameworks for one algorithm (PageRank) on
>> a cluster with a fixed number of nodes. Are there any graph algorithms
>> where one might expect GraphX to perform better than GraphLab? Do you
>> expect the scaling properties (i.e. performance as a function of # of
>> worker nodes) to differ?
>>
>

Reply via email to