Re: Learning GraphX Questions

Ankur Dave Fri, 13 Feb 2015 19:59:36 -0800

At 2015-02-13 12:19:46 -0800, Matthew Bucci <mrbucci...@gmail.com> wrote:
> 1) How do you actually run programs in GraphX? At the moment I've been doing
> everything live through the shell, but I'd obviously like to be able to work
> on it by writing and running scripts.


You can create your own projects that build against Spark and GraphX through a 
Maven dependency [1], then run those applications using the bin/spark-submit 
script included with Spark [2].

These guides assume you already know how to do this using your preferred build 
tool (SBT or Maven). In short, here's how to do it with SBT:

1. Install SBT locally (`brew install sbt` on OS X).

2. Inside your project directory, create a build.sbt file listing Spark and 
GraphX as a dependency, as in [3].

3. Run `sbt package` in a shell.

4. Pass the JAR in your_project_dir/target/scala-2.10/ to bin/spark-submit.

[1] 
http://spark.apache.org/docs/latest/programming-guide.html#linking-with-spark
[2] http://spark.apache.org/docs/latest/submitting-applications.html
[3] https://gist.github.com/ankurdave/1fb7234d8affb3a2e4f4

>> 2) Is there a way to check the status of the partitions of a graph? For
> example, I want to determine for starters if the number of partitions
> requested are always made, like if I ask for 8 partitions but only have 4
> cores what happens?

You can look at `graph.vertices` and `graph.edges`, which are both RDDs, so you 
can do for example: graph.vertices.partitions

> 3) Would I be able to partition by vertex instead of edges, even if I had to
> write it myself? I know partitioning by edges is favored in a majority of
> the cases, but for the sake of research I'd like to be able to do both.

If you pass PartitionStrategy.EdgePartition1D, this will partition edges by 
their source vertices, so all edges with the same source will be 
co-partitioned, and the communication pattern will be similar to 
vertex-partitioned (edge-cut) systems like Giraph.

> 4) Is there a better way to time processes outside of using built-in unix
> timing through the logs or something?

I think the options are Unix timing, log file timestamp parsing, looking at the 
web UI, or writing timing code within your program (System.currentTimeMillis 
and System.nanoTime).

Ankur

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Learning GraphX Questions

Reply via email to