Re: Crunch Spark with YARN cluster Manager

Christian Tzolov Sun, 22 Jun 2014 16:36:07 -0700

Hi Josh,

After applying the  https://issues.apache.org/jira/browse/CRUNCH-410 patch
i've manged to submit Crunch-Spark pipeline to Hadoop 2.2.0 cluster using
the YARN manager :)


The run configuration looks like this.

export HADOOP_CONF_DIR=<your hadoop conf dir>
export SPARK_SUBMIT_CLASSPATH=./commons-codec-1.4.jar:<your spark
installation folder>/lib/spark-assembly-1.0.0-hadoop2.2.0.jar:./<your
application>-jar-with-dependencies.jar
<your spark installation folder>/bin/spark-submit --num-executors 10
--master yarn-client --class <crunch pipeline main class> ./<your
application>-jar-with-dependencies.jar <your application arguments>

(Note the commons-codec in the spark classpath!)

I've noticed that the Crunch-spark runtime doesn't implement the
Converter#applyPTypeTransforms logic so i've put together a patch
(attached) to make my custom sources work.

Shall I open a ticket and try to provide a complete patch  the
Converter#applyPTypeTransforms?

Cheers,
Christian



On Wed, Jun 18, 2014 at 1:32 PM, Christian Tzolov <
[email protected]> wrote:

> Hi Josh,
>
> Thanks for the references. I've applied the patch and started
> experimenting with the crunch-spark on yarn. Paying around the yarn-client,
> yarn-cluster master configuration. Not there yet.
>
> Cheers,
> Christian
>
>
>
>
> On Tue, Jun 17, 2014 at 5:09 PM, Josh Wills <[email protected]> wrote:
>
>> Hey Christian,
>>
>> I posted an example to my local github repo (word count, of course) of
>> running Spark 0.9.0 on a cluster, but it's pre-yarn:
>>
>> https://github.com/jwills/crunch-demo/tree/spark
>>
>> Use the spark-run.sh script to run it; you need to set -Dspark.master at
>> the commandline to point at the spark master on the cluster. It would be
>> cool to integrate it with the instructions here for running Spark under
>> YARN and see how it came out:
>>
>> http://spark.apache.org/docs/latest/running-on-yarn.html
>>
>> Of course, we'd need to commit that patch to upgrade Crunch to Spark
>> 1.0.0: https://issues.apache.org/jira/browse/CRUNCH-410
>>
>> J
>>
>>
>> On Tue, Jun 17, 2014 at 7:47 AM, Christian Tzolov <
>> [email protected]> wrote:
>>
>>> Is there an example of Crunch Spark pipeline for hadoop2/yarn cluster
>>> manager?
>>>
>>
>>
>>
>> --
>> Director of Data Science
>> Cloudera <http://www.cloudera.com>
>> Twitter: @josh_wills <http://twitter.com/josh_wills>
>>
>
>

0001-Implement-the-Converter-applyPTypeTransforms-semanti.patch
Description: Binary data

Re: Crunch Spark with YARN cluster Manager

Reply via email to