On May 17, 2014 at 2:59pm, Hari wrote: > a) Is there a way to get the total time taken for the execution from start to finish? Assuming you're running the benchmark as a standalone program, such as by invoking the Analytics driver <https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/Analytics.scala> , you could wrap the driver invocation using time: /usr/bin/time -p ./bin/spark-submit ... If you're using spark-shell, you could use System.currentTimeMillis. > b) log4j properties need to be modified to turn off logging, but its not clear how to. Create conf/log4j.properties <http://spark.apache.org/docs/0.9.1/configuration.html#configuring-logging> by copying conf/log4j.properties.template and changing the first line to log4j.rootCategory=WARN, console > c) how can this be extended to a cluster? It should work just to invoke the driver on the cluster using spark-submit. If you aren't using the Analytics driver, make sure to set the same Spark properties <http://spark.apache.org/docs/0.9.1/configuration.html#spark-properties> as it does (spark.serializer, spark.kryo.registrator, and spark.locality.wait). > d) also how to quantify memory overhead if i added more functionality to the execution? You can see how much memory each cached RDD is taking up by looking at the web UI <http://spark.apache.org/docs/0.9.1/monitoring.html#web-interfaces> . > e) any scripts? reports generated? We don't have well-supported benchmark scripts for GraphX yet. Dan Crankshaw has some personal-use scripts <https://github.com/dcrankshaw/graphx-utils> for setting up GraphX and competing graph systems on a cluster and running some benchmarks. You could look at those for some ideas. There are benchmarks from earlier this year in the GraphX arXiv paper <http://arxiv.org/abs/1402.2394> . These are on the soc-LiveJournal <http://snap.stanford.edu/data/soc-LiveJournal1.html> and twitter-2010 <http://law.di.unimi.it/webdata/twitter-2010/> datasets.
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Benchmarking-Graphx-tp5965p6061.html Sent from the Apache Spark User List mailing list archive at Nabble.com.