[ 
https://issues.apache.org/jira/browse/SPARK-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072925#comment-14072925
 ] 

Sandy Ryza edited comment on SPARK-2664 at 7/24/14 7:18 AM:
------------------------------------------------------------

I think the right behavior here is worth a little thought.  What's the mental 
model we expect the user to have about the relationship between properties 
specified through --conf and properties that get their own flag?  My first 
thought is - if we're ok with taking properties like master through --conf, is 
there a point (beyond compatibility) in having flags for these properties at 
all?

Flags that aren't Spark confs are there because they impact what happens before 
the SparkContext is created.  These fall into a couple categories:
1. Flags that have no property Spark conf equivalent like --executor-cores 
2. Flags that have a direct Spark conf equivalent like --executor-cores 
(spark.executor.memory)
3. Flags that impact a Spark conf like --deploy-mode (which can mean we set 
spark.master to yarn-cluster)

I think the two ways to look at it are:
1. We're OK with taking properties that have related flags.  In the case of a 
property in the 2nd category, we have a policy over which takes precedence.  In 
the case of a property in the 3rd category, we have some (possibly complex) 
resolution logic.  This approach would be the most accepting, but requires the 
user to have a model of how these conflicts get resolved.
2. We're not OK with taking properties that have related flags.  --conf 
specifies property that gets passed to the SparkContext and has no effect on 
anything that happens before it's created. To save users from themselves, if 
someone passes spark.master or spark.app.name through --conf, we ignore it or 
throw an error.

I'm a little more partial to approach 2 because I think the mental model is a 
little simpler.

Either way, we should probably enforce the same behavior when a config comes 
from the defaults file.

Lastly, how do we allow setting a default for one of these special flags?  E.g. 
make it so that all jobs run on YARN or Mesos by default.  With approach 1, 
this is relatively straightforward - we use the same logic we'd use on a 
property that comes in through --conf for making defaults take effect.  We 
might need to add spark properties for flags that don't have them already like 
--executor-cores.  With approach 2, we'd need to add support in the defaults 
file or somewhere else for specifying flag defaults.


was (Author: sandyr):
I think the right behavior here is worth a little thought.  What's the mental 
model we expect the user to have about the relationship between properties 
specified through --conf and properties that get their own flag?  My first 
thought is - if we're ok with taking properties like master through --conf, is 
there a point (beyond compatibility) in having flags for these properties at 
all?

Flags that aren't conf are there because they impact what happens before the 
SparkContext is created.  These fall into a couple categories:
1. Flags that have no property Spark conf equivalent like --executor-cores 
2. Flags that have a direct Spark conf equivalent like --executor-cores 
(spark.executor.memory)
3. Flags that impact a Spark conf like --deploy-mode (which can mean we set 
spark.master to yarn-cluster)

I think the two ways to look at it are:
1. We're OK with taking properties that have related flags.  In the case of a 
property in the 2nd category, we have a policy over which takes precedence.  In 
the case of a property in the 3rd category, we have some (possibly complex) 
resolution logic.  This approach would be the most accepting, but requires the 
user to have a model of how these conflicts get resolved.
2. We're not OK with taking properties that have related flags.  --conf 
specifies property that gets passed to the SparkContext and has no effect on 
anything that happens before it's created. To save users from themselves, if 
someone passes spark.master or spark.app.name through --conf, we ignore it or 
throw an error.

I'm a little more partial to approach 2 because I think the mental model is a 
little simpler.

Either way, we should probably enforce the same behavior when a config comes 
from the defaults file.

Lastly, how do we allow setting a default for one of these special flags?  E.g. 
make it so that all jobs run on YARN or Mesos by default.  With approach 1, 
this is relatively straightforward - we use the same logic we'd use on a 
property that comes in through --conf for making defaults take effect.  We 
might need to add spark properties for flags that don't have them already like 
--executor-cores.  With approach 2, we'd need to add support in the defaults 
file or somewhere else for specifying flag defaults.

> Deal with `--conf` options in spark-submit that relate to flags
> ---------------------------------------------------------------
>
>                 Key: SPARK-2664
>                 URL: https://issues.apache.org/jira/browse/SPARK-2664
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Patrick Wendell
>            Assignee: Sandy Ryza
>            Priority: Blocker
>
> If someone sets a spark conf that relates to an existing flag `--master`, we 
> should set it correctly like we do with the defaults file. Otherwise it can 
> have confusing semantics. I noticed this after merging it, otherwise I would 
> have mentioned it in the review.
> I think it's as simple as modifying loadDefaults to check the user-supplied 
> options also. We might change it to loadUserProperties since it's no longer 
> just the defaults file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to