Maybe try --driver-memory if you are using spark-submit? Thanks, Nishkam
On Mon, Sep 22, 2014 at 1:41 PM, Greg Hill <greg.h...@rackspace.com> wrote: > Ah, I see. It turns out that my problem is that that comparison is > ignoring SPARK_DRIVER_MEMORY and comparing to the default of 512m. Is that > a bug that's since fixed? I'm on 1.0.1 and using 'yarn-cluster' as the > master. 'yarn-client' seems to pick up the values and works fine. > > Greg > > From: Nishkam Ravi <nr...@cloudera.com> > Date: Monday, September 22, 2014 3:30 PM > To: Greg <greg.h...@rackspace.com> > Cc: Andrew Or <and...@databricks.com>, "user@spark.apache.org" < > user@spark.apache.org> > > Subject: Re: clarification for some spark on yarn configuration options > > Greg, if you look carefully, the code is enforcing that the > memoryOverhead be lower (and not higher) than spark.driver.memory. > > Thanks, > Nishkam > > On Mon, Sep 22, 2014 at 1:26 PM, Greg Hill <greg.h...@rackspace.com> > wrote: > >> I thought I had this all figured out, but I'm getting some weird errors >> now that I'm attempting to deploy this on production-size servers. It's >> complaining that I'm not allocating enough memory to the memoryOverhead >> values. I tracked it down to this code: >> >> >> https://github.com/apache/spark/blob/ed1980ffa9ccb87d76694ba910ef22df034bca49/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala#L70 >> >> Unless I'm reading it wrong, those checks are enforcing that you set >> spark.yarn.driver.memoryOverhead to be higher than spark.driver.memory, but >> that makes no sense to me since that memory is just supposed to be what >> YARN needs on top of what you're allocating for Spark. My understanding >> was that the overhead values should be quite a bit lower (and by default >> they are). >> >> Also, why must the executor be allocated less memory than the driver's >> memory overhead value? >> >> What am I misunderstanding here? >> >> Greg >> >> From: Andrew Or <and...@databricks.com> >> Date: Tuesday, September 9, 2014 5:49 PM >> To: Greg <greg.h...@rackspace.com> >> Cc: "user@spark.apache.org" <user@spark.apache.org> >> Subject: Re: clarification for some spark on yarn configuration options >> >> Hi Greg, >> >> SPARK_EXECUTOR_INSTANCES is the total number of workers in the cluster. >> The equivalent "spark.executor.instances" is just another way to set the >> same thing in your spark-defaults.conf. Maybe this should be documented. :) >> >> "spark.yarn.executor.memoryOverhead" is just an additional margin added >> to "spark.executor.memory" for the container. In addition to the executor's >> memory, the container in which the executor is launched needs some extra >> memory for system processes, and this is what this "overhead" (somewhat of >> a misnomer) is for. The same goes for the driver equivalent. >> >> "spark.driver.memory" behaves differently depending on which version of >> Spark you are using. If you are using Spark 1.1+ (this was released very >> recently), you can directly set "spark.driver.memory" and this will take >> effect. Otherwise, setting this doesn't actually do anything for client >> deploy mode, and you have two alternatives: (1) set the environment >> variable equivalent SPARK_DRIVER_MEMORY in spark-env.sh, and (2) if you are >> using Spark submit (or bin/spark-shell, or bin/pyspark, which go through >> bin/spark-submit), pass the "--driver-memory" command line argument. >> >> If you want your PySpark application (driver) to pick up extra class >> path, you can pass the "--driver-class-path" to Spark submit. If you are >> using Spark 1.1+, you may set "spark.driver.extraClassPath" in your >> spark-defaults.conf. There is also an environment variable you could set >> (SPARK_CLASSPATH), though this is now deprecated. >> >> Let me know if you have more questions about these options, >> -Andrew >> >> >> 2014-09-08 6:59 GMT-07:00 Greg Hill <greg.h...@rackspace.com>: >> >>> Is SPARK_EXECUTOR_INSTANCES the total number of workers in the cluster >>> or the workers per slave node? >>> >>> Is spark.executor.instances an actual config option? I found that in >>> a commit, but it's not in the docs. >>> >>> What is the difference between spark.yarn.executor.memoryOverhead and >>> spark.executor.memory >>> ? Same question for the 'driver' variant, but I assume it's the same >>> answer. >>> >>> Is there a spark.driver.memory option that's undocumented or do you >>> have to use the environment variable SPARK_DRIVER_MEMORY? >>> >>> What config option or environment variable do I need to set to get >>> pyspark interactive to pick up the yarn class path? The ones that work for >>> spark-shell and spark-submit don't seem to work for pyspark. >>> >>> Thanks in advance. >>> >>> Greg >>> >> >> >