Re: Comparing GraphX and GraphLab

2014-04-15 Thread Qi Song
want to know the default allocation of computing resources, as run-example may not allow me to allocate them by myself. Regards~ Qi Song -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Comparing-GraphX-and-GraphLab-tp3112p4265.html Sent from the Apache Spark

Re: Comparing GraphX and GraphLab

2014-03-24 Thread Niko Stahl
Hi Ankur, hi Deb, Thanks for the information and for the reference to the recent paper. I understand that GraphLab is highly optimized for graph algorithms and consistently outperforms GraphX for graph related tasks. I'd like to further evaluate the cost of moving data between Spark and some other

Re: Comparing GraphX and GraphLab

2014-03-24 Thread Debasish Das
Hi Ankur, Given enough memory and proper caching, I don't understand why is this the case? GraphX may actually be slower when Spark is configured to launch many tasks per machine, because shuffle communication between Spark tasks on the same machine still occurs by reading and writing from disk,

Re: Comparing GraphX and GraphLab

2014-03-24 Thread Ankur Dave
Hi Niko, The GraphX team recently wrote a longer paper with more benchmarks and optimizations: http://arxiv.org/abs/1402.2394 Regarding the performance of GraphX vs. GraphLab, I believe GraphX currently outperforms GraphLab only in end-to-end benchmarks of pipelines involving both graph-parallel

Re: Comparing GraphX and GraphLab

2014-03-24 Thread Debasish Das
Niko, Comparing some other components will be very useful as wellsvd++ from graphx vs the same algorithm in graphlabalso mllib.recommendation.als implicit/explicit compared to the collaborative filtering toolkit in graphlab... To stress test what's the biggest sparse dataset that you have

Comparing GraphX and GraphLab

2014-03-24 Thread Niko Stahl
Hello, I'm interested in extending the comparison between GraphX and GraphLab presented in Xin et. al (2013). The evaluation presented there is rather limited as it only compares the frameworks for one algorithm (PageRank) on a cluster with a fixed number of nodes. Are there any graph algorithms w