Re: deployment options for Spark and YARN w/ many app jar library dependencies

2015-06-18 Thread Sweeney, Matt
Thank you, Sandy! I'll investigate use of the extraClassPath variable. Both 
options are helpful.

Thanks,

Matt

On Jun 17, 2015, at 8:01 PM, Sandy Ryza 
sandy.r...@cloudera.commailto:sandy.r...@cloudera.com wrote:

Hi Matt,

If you place your jars on HDFS in a public location, YARN will cache them on 
each node after the first download.  You can also use the 
spark.executor.extraClassPath config to point to them.

-Sandy

On Wed, Jun 17, 2015 at 4:47 PM, Sweeney, Matt 
mswee...@fourv.commailto:mswee...@fourv.com wrote:
Hi folks,

I'm looking to deploy spark on YARN and I have read through the docs 
(https://spark.apache.org/docs/latest/running-on-yarn.html). One question that 
I still have is if there is an alternate means of including your own app jars 
as opposed to the process in the Adding Other Jars section of the docs. The 
app jars and dependencies that I need to include are significant in size (100s 
MBs) and I'd rather deploy them in advance onto the cluster nodes disk so that 
I don't have that overhead cost on the network for each spark-submit that is 
executed.

Thanks in advance for your help!

Matt



Re: deployment options for Spark and YARN w/ many app jar library dependencies

2015-06-17 Thread Sandy Ryza
Hi Matt,

If you place your jars on HDFS in a public location, YARN will cache them
on each node after the first download.  You can also use the
spark.executor.extraClassPath config to point to them.

-Sandy

On Wed, Jun 17, 2015 at 4:47 PM, Sweeney, Matt mswee...@fourv.com wrote:

  Hi folks,

  I’m looking to deploy spark on YARN and I have read through the docs (
 https://spark.apache.org/docs/latest/running-on-yarn.html). One question
 that I still have is if there is an alternate means of including your own
 app jars as opposed to the process in the “Adding Other Jars” section of
 the docs. The app jars and dependencies that I need to include are
 significant in size (100s MBs) and I’d rather deploy them in advance onto
 the cluster nodes disk so that I don’t have that overhead cost on the
 network for each spark-submit that is executed.

  Thanks in advance for your help!

  Matt



deployment options for Spark and YARN w/ many app jar library dependencies

2015-06-17 Thread Sweeney, Matt
Hi folks,

I’m looking to deploy spark on YARN and I have read through the docs 
(https://spark.apache.org/docs/latest/running-on-yarn.html). One question that 
I still have is if there is an alternate means of including your own app jars 
as opposed to the process in the “Adding Other Jars” section of the docs. The 
app jars and dependencies that I need to include are significant in size (100s 
MBs) and I’d rather deploy them in advance onto the cluster nodes disk so that 
I don’t have that overhead cost on the network for each spark-submit that is 
executed.

Thanks in advance for your help!

Matt