Hey Christian, I posted an example to my local github repo (word count, of course) of running Spark 0.9.0 on a cluster, but it's pre-yarn:
https://github.com/jwills/crunch-demo/tree/spark Use the spark-run.sh script to run it; you need to set -Dspark.master at the commandline to point at the spark master on the cluster. It would be cool to integrate it with the instructions here for running Spark under YARN and see how it came out: http://spark.apache.org/docs/latest/running-on-yarn.html Of course, we'd need to commit that patch to upgrade Crunch to Spark 1.0.0: https://issues.apache.org/jira/browse/CRUNCH-410 J On Tue, Jun 17, 2014 at 7:47 AM, Christian Tzolov < [email protected]> wrote: > Is there an example of Crunch Spark pipeline for hadoop2/yarn cluster > manager? > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
