Re: Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf

2015-07-15 Thread Kelly, Jonathan
Would there be any problem in having spark.executor.instances (or 
--num-executors) be completely ignored (i.e., even for non-zero values) if 
spark.dynamicAllocation.enabled is true (i.e., rather than throwing an 
exception)?

I can see how the exception would be helpful if, say, you tried to pass both 
-c spark.executor.instances (or --num-executors) *and* -c 
spark.dynamicAllocation.enabled=true to spark-submit on the command line (as 
opposed to having one of them in spark-defaults.conf and one of them in the 
spark-submit args), but currently there doesn't seem to be any way to 
distinguish between arguments that were actually passed to spark-submit and 
settings that simply came from spark-defaults.conf.

If there were a way to distinguish them, I think the ideal situation would be 
for the validation exception to be thrown only if spark.executor.instances and 
spark.dynamicAllocation.enabled=true were both passed via spark-submit args or 
were both present in spark-defaults.conf, but passing 
spark.dynamicAllocation.enabled=true to spark-submit would take precedence over 
spark.executor.instances configured in spark-defaults.conf, and vice versa.

Jonathan Kelly
Elastic MapReduce - SDE
Blackfoot (SEA33) 06.850.F0

From: Jonathan Kelly jonat...@amazon.commailto:jonat...@amazon.com
Date: Tuesday, July 14, 2015 at 4:23 PM
To: user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Unable to use dynamicAllocation if spark.executor.instances is set in 
spark-defaults.conf

I've set up my cluster with a pre-calcualted value for spark.executor.instances 
in spark-defaults.conf such that I can run a job and have it maximize the 
utilization of the cluster resources by default. However, if I want to run a 
job with dynamicAllocation (by passing -c spark.dynamicAllocation.enabled=true 
to spark-submit), I get this exception:

Exception in thread main java.lang.IllegalArgumentException: Explicitly 
setting the number of executors is not compatible with 
spark.dynamicAllocation.enabled!
at 
org.apache.spark.deploy.yarn.ClientArguments.parseArgs(ClientArguments.scala:192)
at org.apache.spark.deploy.yarn.ClientArguments.init(ClientArguments.scala:59)
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:54)
...

The exception makes sense, of course, but ideally I would like it to ignore 
what I've put in spark-defaults.conf for spark.executor.instances if I've 
enabled dynamicAllocation. The most annoying thing about this is that if I have 
spark.executor.instances present in spark-defaults.conf, I cannot figure out 
any way to spark-submit a job with spark.dynamicAllocation.enabled=true without 
getting this error. That is, even if I pass -c spark.executor.instances=0 -c 
spark.dynamicAllocation.enabled=true, I still get this error because the 
validation in ClientArguments.parseArgs() that's checking for this condition 
simply checks for the presence of spark.executor.instances rather than whether 
or not its value is  0.

Should the check be changed to allow spark.executor.instances to be set to 0 if 
spark.dynamicAllocation.enabled is true? That would be an OK compromise, but 
I'd really prefer to be able to enable dynamicAllocation simply by setting 
spark.dynamicAllocation.enabled=true rather than by also having to set 
spark.executor.instances to 0.

Thanks,
Jonathan


Re: Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf

2015-07-15 Thread Kelly, Jonathan
bump

From: Jonathan Kelly jonat...@amazon.commailto:jonat...@amazon.com
Date: Tuesday, July 14, 2015 at 4:23 PM
To: user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Unable to use dynamicAllocation if spark.executor.instances is set in 
spark-defaults.conf

I've set up my cluster with a pre-calcualted value for spark.executor.instances 
in spark-defaults.conf such that I can run a job and have it maximize the 
utilization of the cluster resources by default. However, if I want to run a 
job with dynamicAllocation (by passing -c spark.dynamicAllocation.enabled=true 
to spark-submit), I get this exception:

Exception in thread main java.lang.IllegalArgumentException: Explicitly 
setting the number of executors is not compatible with 
spark.dynamicAllocation.enabled!
at 
org.apache.spark.deploy.yarn.ClientArguments.parseArgs(ClientArguments.scala:192)
at org.apache.spark.deploy.yarn.ClientArguments.init(ClientArguments.scala:59)
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:54)
...

The exception makes sense, of course, but ideally I would like it to ignore 
what I've put in spark-defaults.conf for spark.executor.instances if I've 
enabled dynamicAllocation. The most annoying thing about this is that if I have 
spark.executor.instances present in spark-defaults.conf, I cannot figure out 
any way to spark-submit a job with spark.dynamicAllocation.enabled=true without 
getting this error. That is, even if I pass -c spark.executor.instances=0 -c 
spark.dynamicAllocation.enabled=true, I still get this error because the 
validation in ClientArguments.parseArgs() that's checking for this condition 
simply checks for the presence of spark.executor.instances rather than whether 
or not its value is  0.

Should the check be changed to allow spark.executor.instances to be set to 0 if 
spark.dynamicAllocation.enabled is true? That would be an OK compromise, but 
I'd really prefer to be able to enable dynamicAllocation simply by setting 
spark.dynamicAllocation.enabled=true rather than by also having to set 
spark.executor.instances to 0.

Thanks,
Jonathan


Re: Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf

2015-07-15 Thread Sandy Ryza
Hi Jonathan,

This is a problem that has come up for us as well, because we'd like
dynamic allocation to be turned on by default in some setups, but not break
existing users with these properties.  I'm hoping to figure out a way to
reconcile these by Spark 1.5.

-Sandy

On Wed, Jul 15, 2015 at 3:18 PM, Kelly, Jonathan jonat...@amazon.com
wrote:

   Would there be any problem in having spark.executor.instances (or
 --num-executors) be completely ignored (i.e., even for non-zero values) if
 spark.dynamicAllocation.enabled is true (i.e., rather than throwing an
 exception)?

  I can see how the exception would be helpful if, say, you tried to pass
 both -c spark.executor.instances (or --num-executors) *and* -c
 spark.dynamicAllocation.enabled=true to spark-submit on the command line
 (as opposed to having one of them in spark-defaults.conf and one of them in
 the spark-submit args), but currently there doesn't seem to be any way to
 distinguish between arguments that were actually passed to spark-submit and
 settings that simply came from spark-defaults.conf.

  If there were a way to distinguish them, I think the ideal situation
 would be for the validation exception to be thrown only if
 spark.executor.instances and spark.dynamicAllocation.enabled=true were both
 passed via spark-submit args or were both present in spark-defaults.conf,
 but passing spark.dynamicAllocation.enabled=true to spark-submit would take
 precedence over spark.executor.instances configured in spark-defaults.conf,
 and vice versa.


  Jonathan Kelly

 Elastic MapReduce - SDE

 Blackfoot (SEA33) 06.850.F0

   From: Jonathan Kelly jonat...@amazon.com
 Date: Tuesday, July 14, 2015 at 4:23 PM
 To: user@spark.apache.org user@spark.apache.org
 Subject: Unable to use dynamicAllocation if spark.executor.instances is
 set in spark-defaults.conf

   I've set up my cluster with a pre-calcualted value for
 spark.executor.instances in spark-defaults.conf such that I can run a job
 and have it maximize the utilization of the cluster resources by default.
 However, if I want to run a job with dynamicAllocation (by passing -c
 spark.dynamicAllocation.enabled=true to spark-submit), I get this exception:

  Exception in thread main java.lang.IllegalArgumentException:
 Explicitly setting the number of executors is not compatible with
 spark.dynamicAllocation.enabled!
 at
 org.apache.spark.deploy.yarn.ClientArguments.parseArgs(ClientArguments.scala:192)
 at
 org.apache.spark.deploy.yarn.ClientArguments.init(ClientArguments.scala:59)
 at
 org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:54)
  …

  The exception makes sense, of course, but ideally I would like it to
 ignore what I've put in spark-defaults.conf for spark.executor.instances if
 I've enabled dynamicAllocation. The most annoying thing about this is that
 if I have spark.executor.instances present in spark-defaults.conf, I cannot
 figure out any way to spark-submit a job with
 spark.dynamicAllocation.enabled=true without getting this error. That is,
 even if I pass -c spark.executor.instances=0 -c
 spark.dynamicAllocation.enabled=true, I still get this error because the
 validation in ClientArguments.parseArgs() that's checking for this
 condition simply checks for the presence of spark.executor.instances rather
 than whether or not its value is  0.

  Should the check be changed to allow spark.executor.instances to be set
 to 0 if spark.dynamicAllocation.enabled is true? That would be an OK
 compromise, but I'd really prefer to be able to enable dynamicAllocation
 simply by setting spark.dynamicAllocation.enabled=true rather than by also
 having to set spark.executor.instances to 0.


  Thanks,

 Jonathan



Re: Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf

2015-07-15 Thread Andrew Or
Yeah, we could make it a log a warning instead.

2015-07-15 14:29 GMT-07:00 Kelly, Jonathan jonat...@amazon.com:

  Thanks! Is there an existing JIRA I should watch?


  ~ Jonathan

   From: Sandy Ryza sandy.r...@cloudera.com
 Date: Wednesday, July 15, 2015 at 2:27 PM
 To: Jonathan Kelly jonat...@amazon.com
 Cc: user@spark.apache.org user@spark.apache.org
 Subject: Re: Unable to use dynamicAllocation if spark.executor.instances
 is set in spark-defaults.conf

   Hi Jonathan,

  This is a problem that has come up for us as well, because we'd like
 dynamic allocation to be turned on by default in some setups, but not break
 existing users with these properties.  I'm hoping to figure out a way to
 reconcile these by Spark 1.5.

  -Sandy

 On Wed, Jul 15, 2015 at 3:18 PM, Kelly, Jonathan jonat...@amazon.com
 wrote:

   Would there be any problem in having spark.executor.instances (or
 --num-executors) be completely ignored (i.e., even for non-zero values) if
 spark.dynamicAllocation.enabled is true (i.e., rather than throwing an
 exception)?

  I can see how the exception would be helpful if, say, you tried to pass
 both -c spark.executor.instances (or --num-executors) *and* -c
 spark.dynamicAllocation.enabled=true to spark-submit on the command line
 (as opposed to having one of them in spark-defaults.conf and one of them in
 the spark-submit args), but currently there doesn't seem to be any way to
 distinguish between arguments that were actually passed to spark-submit and
 settings that simply came from spark-defaults.conf.

  If there were a way to distinguish them, I think the ideal situation
 would be for the validation exception to be thrown only if
 spark.executor.instances and spark.dynamicAllocation.enabled=true were both
 passed via spark-submit args or were both present in spark-defaults.conf,
 but passing spark.dynamicAllocation.enabled=true to spark-submit would take
 precedence over spark.executor.instances configured in spark-defaults.conf,
 and vice versa.


  Jonathan Kelly

 Elastic MapReduce - SDE

 Blackfoot (SEA33) 06.850.F0

   From: Jonathan Kelly jonat...@amazon.com
 Date: Tuesday, July 14, 2015 at 4:23 PM
 To: user@spark.apache.org user@spark.apache.org
 Subject: Unable to use dynamicAllocation if spark.executor.instances is
 set in spark-defaults.conf

I've set up my cluster with a pre-calcualted value for
 spark.executor.instances in spark-defaults.conf such that I can run a job
 and have it maximize the utilization of the cluster resources by default.
 However, if I want to run a job with dynamicAllocation (by passing -c
 spark.dynamicAllocation.enabled=true to spark-submit), I get this exception:

  Exception in thread main java.lang.IllegalArgumentException:
 Explicitly setting the number of executors is not compatible with
 spark.dynamicAllocation.enabled!
 at
 org.apache.spark.deploy.yarn.ClientArguments.parseArgs(ClientArguments.scala:192)
 at
 org.apache.spark.deploy.yarn.ClientArguments.init(ClientArguments.scala:59)
 at
 org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:54)
  …

  The exception makes sense, of course, but ideally I would like it to
 ignore what I've put in spark-defaults.conf for spark.executor.instances if
 I've enabled dynamicAllocation. The most annoying thing about this is that
 if I have spark.executor.instances present in spark-defaults.conf, I cannot
 figure out any way to spark-submit a job with
 spark.dynamicAllocation.enabled=true without getting this error. That is,
 even if I pass -c spark.executor.instances=0 -c
 spark.dynamicAllocation.enabled=true, I still get this error because the
 validation in ClientArguments.parseArgs() that's checking for this
 condition simply checks for the presence of spark.executor.instances rather
 than whether or not its value is  0.

  Should the check be changed to allow spark.executor.instances to be set
 to 0 if spark.dynamicAllocation.enabled is true? That would be an OK
 compromise, but I'd really prefer to be able to enable dynamicAllocation
 simply by setting spark.dynamicAllocation.enabled=true rather than by also
 having to set spark.executor.instances to 0.


  Thanks,

 Jonathan





Unable to use dynamicAllocation if spark.executor.instances is set in spark-defaults.conf

2015-07-14 Thread Kelly, Jonathan
I've set up my cluster with a pre-calcualted value for spark.executor.instances 
in spark-defaults.conf such that I can run a job and have it maximize the 
utilization of the cluster resources by default. However, if I want to run a 
job with dynamicAllocation (by passing -c spark.dynamicAllocation.enabled=true 
to spark-submit), I get this exception:

Exception in thread main java.lang.IllegalArgumentException: Explicitly 
setting the number of executors is not compatible with 
spark.dynamicAllocation.enabled!
at 
org.apache.spark.deploy.yarn.ClientArguments.parseArgs(ClientArguments.scala:192)
at org.apache.spark.deploy.yarn.ClientArguments.init(ClientArguments.scala:59)
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:54)
...

The exception makes sense, of course, but ideally I would like it to ignore 
what I've put in spark-defaults.conf for spark.executor.instances if I've 
enabled dynamicAllocation. The most annoying thing about this is that if I have 
spark.executor.instances present in spark-defaults.conf, I cannot figure out 
any way to spark-submit a job with spark.dynamicAllocation.enabled=true without 
getting this error. That is, even if I pass -c spark.executor.instances=0 -c 
spark.dynamicAllocation.enabled=true, I still get this error because the 
validation in ClientArguments.parseArgs() that's checking for this condition 
simply checks for the presence of spark.executor.instances rather than whether 
or not its value is  0.

Should the check be changed to allow spark.executor.instances to be set to 0 if 
spark.dynamicAllocation.enabled is true? That would be an OK compromise, but 
I'd really prefer to be able to enable dynamicAllocation simply by setting 
spark.dynamicAllocation.enabled=true rather than by also having to set 
spark.executor.instances to 0.

Thanks,
Jonathan