Yes please-- thanks Christian!
On Sun, Jun 22, 2014 at 4:35 PM, Christian Tzolov < [email protected]> wrote: > Hi Josh, > > After applying the https://issues.apache.org/jira/browse/CRUNCH-410 patch > i've manged to submit Crunch-Spark pipeline to Hadoop 2.2.0 cluster using > the YARN manager :) > > The run configuration looks like this. > > export HADOOP_CONF_DIR=<your hadoop conf dir> > export SPARK_SUBMIT_CLASSPATH=./commons-codec-1.4.jar:<your spark > installation folder>/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:./<your > application>-jar-with-dependencies.jar > <your spark installation folder>/bin/spark-submit --num-executors 10 > --master yarn-client --class <crunch pipeline main class> ./<your > application>-jar-with-dependencies.jar <your application arguments> > > (Note the commons-codec in the spark classpath!) > > I've noticed that the Crunch-spark runtime doesn't implement the > Converter#applyPTypeTransforms logic so i've put together a patch > (attached) to make my custom sources work. > > Shall I open a ticket and try to provide a complete patch the > Converter#applyPTypeTransforms? > > Cheers, > Christian > > > > On Wed, Jun 18, 2014 at 1:32 PM, Christian Tzolov < > [email protected]> wrote: > >> Hi Josh, >> >> Thanks for the references. I've applied the patch and started >> experimenting with the crunch-spark on yarn. Paying around the yarn-client, >> yarn-cluster master configuration. Not there yet. >> >> Cheers, >> Christian >> >> >> >> >> On Tue, Jun 17, 2014 at 5:09 PM, Josh Wills <[email protected]> wrote: >> >>> Hey Christian, >>> >>> I posted an example to my local github repo (word count, of course) of >>> running Spark 0.9.0 on a cluster, but it's pre-yarn: >>> >>> https://github.com/jwills/crunch-demo/tree/spark >>> >>> Use the spark-run.sh script to run it; you need to set -Dspark.master at >>> the commandline to point at the spark master on the cluster. It would be >>> cool to integrate it with the instructions here for running Spark under >>> YARN and see how it came out: >>> >>> http://spark.apache.org/docs/latest/running-on-yarn.html >>> >>> Of course, we'd need to commit that patch to upgrade Crunch to Spark >>> 1.0.0: https://issues.apache.org/jira/browse/CRUNCH-410 >>> >>> J >>> >>> >>> On Tue, Jun 17, 2014 at 7:47 AM, Christian Tzolov < >>> [email protected]> wrote: >>> >>>> Is there an example of Crunch Spark pipeline for hadoop2/yarn cluster >>>> manager? >>>> >>> >>> >>> >>> -- >>> Director of Data Science >>> Cloudera <http://www.cloudera.com> >>> Twitter: @josh_wills <http://twitter.com/josh_wills> >>> >> >> > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
