At 2014-11-24 19:02:08 -0800, Harihar Nahak <hna...@wynyardgroup.com> wrote:
> According to documentation GraphX runs 10x faster than normal Spark. So I
> run Page Rank algorithm in both the applications:
> [...]
> Local Mode (Machine : 8 Core; 16 GB memory; 2.80 Ghz Intel i7; Executor
> Memory: 4Gb, No. of Partition: 50; No. of Iterations: 2);   ==>
>
> *Spark Page Rank took -> 21.29 mins
> GraphX Page Rank took -> 42.01 mins *
>
> Cluster Mode (ubantu 12.4; spark 1.1/hadoop 2.4 cluster ; 3 workers , 1
> driver , 8 cores, 30 gb memory) (Executor memory 4gb; No. of edge partitions
> : 50, random vertex cut ; no. of iteration : 2) =>
>
> *Spark Page Rank took -> 10.54 mins
> GraphX Page Rank took -> 7.54 mins *
>
> Could you please help me to determine, when to use Spark and GraphX ? If
> GraphX took same amount of time than Spark then its better to use Spark
> because spark has variey of operators to deal with any type of RDD.

If you have a problem that's naturally expressible as a graph computation, it 
makes sense to use GraphX in my opinion. In addition to the optimizations that 
GraphX incorporates which you would otherwise have to implement manually, 
GraphX's programming model is likely a better fit. But even if you start off by 
using pure Spark, you'll still have the flexibility to use GraphX for other 
parts of the problem since it's part of the same system.

To address the benchmark results you got:

1. GraphX takes more time than Spark to load the graph, because it has to index 
it, but subsequent iterations should be faster. We benchmarked with 20 
iterations to show this effect, but you only used 2 iterations, which doesn't 
give much time to amortize the loading cost.

2. The benchmarks in the GraphX OSDI paper are against a naive implementation 
of PageRank in Spark, while the version you benchmarked against has some of the 
same optimizations as GraphX does. I believe we found that the optimized Spark 
PageRank was only 3x slower than GraphX.

3. When running those benchmarks, we used an experimental version of Spark with 
in-memory shuffle, which disproportionately benefits GraphX since its shuffle 
files are smaller due to specialized compression.

4. We haven't optimized GraphX for local mode, so it's not surprising that it's 
slower there.

Ankur

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to