Re: deployment options for Spark and YARN w/ many app jar library dependencies
Thank you, Sandy! I'll investigate use of the extraClassPath variable. Both options are helpful. Thanks, Matt On Jun 17, 2015, at 8:01 PM, Sandy Ryza sandy.r...@cloudera.commailto:sandy.r...@cloudera.com wrote: Hi Matt, If you place your jars on HDFS in a public location, YARN will cache them on each node after the first download. You can also use the spark.executor.extraClassPath config to point to them. -Sandy On Wed, Jun 17, 2015 at 4:47 PM, Sweeney, Matt mswee...@fourv.commailto:mswee...@fourv.com wrote: Hi folks, I'm looking to deploy spark on YARN and I have read through the docs (https://spark.apache.org/docs/latest/running-on-yarn.html). One question that I still have is if there is an alternate means of including your own app jars as opposed to the process in the Adding Other Jars section of the docs. The app jars and dependencies that I need to include are significant in size (100s MBs) and I'd rather deploy them in advance onto the cluster nodes disk so that I don't have that overhead cost on the network for each spark-submit that is executed. Thanks in advance for your help! Matt
Re: deployment options for Spark and YARN w/ many app jar library dependencies
Hi Matt, If you place your jars on HDFS in a public location, YARN will cache them on each node after the first download. You can also use the spark.executor.extraClassPath config to point to them. -Sandy On Wed, Jun 17, 2015 at 4:47 PM, Sweeney, Matt mswee...@fourv.com wrote: Hi folks, I’m looking to deploy spark on YARN and I have read through the docs ( https://spark.apache.org/docs/latest/running-on-yarn.html). One question that I still have is if there is an alternate means of including your own app jars as opposed to the process in the “Adding Other Jars” section of the docs. The app jars and dependencies that I need to include are significant in size (100s MBs) and I’d rather deploy them in advance onto the cluster nodes disk so that I don’t have that overhead cost on the network for each spark-submit that is executed. Thanks in advance for your help! Matt
deployment options for Spark and YARN w/ many app jar library dependencies
Hi folks, I’m looking to deploy spark on YARN and I have read through the docs (https://spark.apache.org/docs/latest/running-on-yarn.html). One question that I still have is if there is an alternate means of including your own app jars as opposed to the process in the “Adding Other Jars” section of the docs. The app jars and dependencies that I need to include are significant in size (100s MBs) and I’d rather deploy them in advance onto the cluster nodes disk so that I don’t have that overhead cost on the network for each spark-submit that is executed. Thanks in advance for your help! Matt