Re: clarification for some spark on yarn configuration options

Nishkam Ravi Mon, 22 Sep 2014 14:06:07 -0700

Maybe try --driver-memory if you are using spark-submit?

Thanks,
Nishkam


On Mon, Sep 22, 2014 at 1:41 PM, Greg Hill <greg.h...@rackspace.com> wrote:

>  Ah, I see.  It turns out that my problem is that that comparison is
> ignoring SPARK_DRIVER_MEMORY and comparing to the default of 512m.  Is that
> a bug that's since fixed?  I'm on 1.0.1 and using 'yarn-cluster' as the
> master.  'yarn-client' seems to pick up the values and works fine.
>
>  Greg
>
>   From: Nishkam Ravi <nr...@cloudera.com>
> Date: Monday, September 22, 2014 3:30 PM
> To: Greg <greg.h...@rackspace.com>
> Cc: Andrew Or <and...@databricks.com>, "user@spark.apache.org" <
> user@spark.apache.org>
>
> Subject: Re: clarification for some spark on yarn configuration options
>
>   Greg, if you look carefully, the code is enforcing that the
> memoryOverhead be lower (and not higher) than spark.driver.memory.
>
>  Thanks,
> Nishkam
>
> On Mon, Sep 22, 2014 at 1:26 PM, Greg Hill <greg.h...@rackspace.com>
> wrote:
>
>>  I thought I had this all figured out, but I'm getting some weird errors
>> now that I'm attempting to deploy this on production-size servers.  It's
>> complaining that I'm not allocating enough memory to the memoryOverhead
>> values.  I tracked it down to this code:
>>
>>
>> https://github.com/apache/spark/blob/ed1980ffa9ccb87d76694ba910ef22df034bca49/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala#L70
>>
>>  Unless I'm reading it wrong, those checks are enforcing that you set
>> spark.yarn.driver.memoryOverhead to be higher than spark.driver.memory, but
>> that makes no sense to me since that memory is just supposed to be what
>> YARN needs on top of what you're allocating for Spark.  My understanding
>> was that the overhead values should be quite a bit lower (and by default
>> they are).
>>
>>  Also, why must the executor be allocated less memory than the driver's
>> memory overhead value?
>>
>>  What am I misunderstanding here?
>>
>>  Greg
>>
>>   From: Andrew Or <and...@databricks.com>
>> Date: Tuesday, September 9, 2014 5:49 PM
>> To: Greg <greg.h...@rackspace.com>
>> Cc: "user@spark.apache.org" <user@spark.apache.org>
>> Subject: Re: clarification for some spark on yarn configuration options
>>
>>   Hi Greg,
>>
>>  SPARK_EXECUTOR_INSTANCES is the total number of workers in the cluster.
>> The equivalent "spark.executor.instances" is just another way to set the
>> same thing in your spark-defaults.conf. Maybe this should be documented. :)
>>
>>  "spark.yarn.executor.memoryOverhead" is just an additional margin added
>> to "spark.executor.memory" for the container. In addition to the executor's
>> memory, the container in which the executor is launched needs some extra
>> memory for system processes, and this is what this "overhead" (somewhat of
>> a misnomer) is for. The same goes for the driver equivalent.
>>
>>  "spark.driver.memory" behaves differently depending on which version of
>> Spark you are using. If you are using Spark 1.1+ (this was released very
>> recently), you can directly set "spark.driver.memory" and this will take
>> effect. Otherwise, setting this doesn't actually do anything for client
>> deploy mode, and you have two alternatives: (1) set the environment
>> variable equivalent SPARK_DRIVER_MEMORY in spark-env.sh, and (2) if you are
>> using Spark submit (or bin/spark-shell, or bin/pyspark, which go through
>> bin/spark-submit), pass the "--driver-memory" command line argument.
>>
>>  If you want your PySpark application (driver) to pick up extra class
>> path, you can pass the "--driver-class-path" to Spark submit. If you are
>> using Spark 1.1+, you may set "spark.driver.extraClassPath" in your
>> spark-defaults.conf. There is also an environment variable you could set
>> (SPARK_CLASSPATH), though this is now deprecated.
>>
>>  Let me know if you have more questions about these options,
>> -Andrew
>>
>>
>> 2014-09-08 6:59 GMT-07:00 Greg Hill <greg.h...@rackspace.com>:
>>
>>>  Is SPARK_EXECUTOR_INSTANCES the total number of workers in the cluster
>>> or the workers per slave node?
>>>
>>>  Is spark.executor.instances an actual config option?  I found that in
>>> a commit, but it's not in the docs.
>>>
>>>  What is the difference between spark.yarn.executor.memoryOverhead and 
>>> spark.executor.memory
>>> ?  Same question for the 'driver' variant, but I assume it's the same
>>> answer.
>>>
>>>  Is there a spark.driver.memory option that's undocumented or do you
>>> have to use the environment variable SPARK_DRIVER_MEMORY?
>>>
>>>  What config option or environment variable do I need to set to get
>>> pyspark interactive to pick up the yarn class path?  The ones that work for
>>> spark-shell and spark-submit don't seem to work for pyspark.
>>>
>>>  Thanks in advance.
>>>
>>>  Greg
>>>
>>
>>
>

Re: clarification for some spark on yarn configuration options

Reply via email to