date:20140724

[jira] [Commented] (SPARK-2652) Turning default configurations for PySpark

2014-07-24 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072909#comment-14072909
 ] 

Apache Spark commented on SPARK-2652:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/1568

> Turning default configurations for PySpark
> --
>
> Key: SPARK-2652
> URL: https://issues.apache.org/jira/browse/SPARK-2652
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 1.0.0
>Reporter: Davies Liu
>Assignee: Davies Liu
>  Labels: Configuration, Python
> Fix For: 1.1.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Some default value of configuration does not make sense for PySpark, change 
> them to reasonable ones, such as spark.serializer and 
> spark.kryo.referenceTracking



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (SPARK-2661) Unpersist last RDD in bagel iteration

2014-07-24 Thread Matei Zaharia (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia resolved SPARK-2661.
--

Resolution: Fixed

> Unpersist last RDD in bagel iteration
> -
>
> Key: SPARK-2661
> URL: https://issues.apache.org/jira/browse/SPARK-2661
> Project: Spark
>  Issue Type: Improvement
>Reporter: Adrian Wang
>Assignee: Adrian Wang
> Fix For: 1.1.0
>
>
> In bagel iteration, we only depend on RDD[n] to get RDD[n+1], so we can 
> unpersist RDD[n-1] after we get RDD[n]. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2664) Deal with `--conf` options in spark-submit that relate to flags

2014-07-24 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072925#comment-14072925
 ] 

Sandy Ryza commented on SPARK-2664:
---

I think the right behavior here is worth a little thought.  What's the mental 
model we expect the user to have about the relationship between properties 
specified through --conf and properties that get their own flag?  My first 
thought is - if we're ok with taking properties like master through --conf, is 
there a point (beyond compatibility) in having flags for these properties at 
all?

Flags that aren't conf are there because they impact what happens before the 
SparkContext is created.  These fall into a couple categories:
1. Flags that have no property Spark conf equivalent like --executor-cores 
2. Flags that have a direct Spark conf equivalent like --executor-cores 
(spark.executor.memory)
3. Flags that impact a Spark conf like --deploy-mode (which can mean we set 
spark.master to yarn-cluster)

I think the two ways to look at it are:
1. We're OK with taking properties that have related flags.  In the case of a 
property in the 2nd category, we have a policy over which takes precedence.  In 
the case of a property in the 3rd category, we have some (possibly complex) 
resolution logic.  This approach would be the most accepting, but requires the 
user to have a model of how these conflicts get resolved.
2. We're not OK with taking properties that have related flags.  --conf 
specifies property that gets passed to the SparkContext and has no effect on 
anything that happens before it's created. To save users from themselves, if 
someone passes spark.master or spark.app.name through --conf, we ignore it or 
throw an error.

I'm a little more partial to approach 2 because I think the mental model is a 
little simpler.

Either way, we should probably enforce the same behavior when a config comes 
from the defaults file.

Lastly, how do we allow setting a default for one of these special flags?  E.g. 
make it so that all jobs run on YARN or Mesos by default.  With approach 1, 
this is relatively straightforward - we use the same logic we'd use on a 
property that comes in through --conf for making defaults take effect.  We 
might need to add spark properties for flags that don't have them already like 
--executor-cores.  With approach 2, we'd need to add support in the defaults 
file or somewhere else for specifying flag defaults.

> Deal with `--conf` options in spark-submit that relate to flags
> ---
>
> Key: SPARK-2664
> URL: https://issues.apache.org/jira/browse/SPARK-2664
> Project: Spark
>  Issue Type: Bug
>Reporter: Patrick Wendell
>Assignee: Sandy Ryza
>Priority: Blocker
>
> If someone sets a spark conf that relates to an existing flag `--master`, we 
> should set it correctly like we do with the defaults file. Otherwise it can 
> have confusing semantics. I noticed this after merging it, otherwise I would 
> have mentioned it in the review.
> I think it's as simple as modifying loadDefaults to check the user-supplied 
> options also. We might change it to loadUserProperties since it's no longer 
> just the defaults file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (SPARK-2664) Deal with `--conf` options in spark-submit that relate to flags

2014-07-24 Thread Sandy Ryza (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072925#comment-14072925
]

Sandy Ryza edited comment on SPARK-2664 at 7/24/14 7:18 AM:

I think the right behavior here is worth a little thought. What's the mental
model we expect the user to have about the relationship between properties
specified through --conf and properties that get their own flag? My first
thought is - if we're ok with taking properties like master through --conf, is
there a point (beyond compatibility) in having flags for these properties at
all?

Flags that aren't Spark confs are there because they impact what happens before
the SparkContext is created. These fall into a couple categories:
1. Flags that have no property Spark conf equivalent like --executor-cores
2. Flags that have a direct Spark conf equivalent like --executor-cores
(spark.executor.memory)
3. Flags that impact a Spark conf like --deploy-mode (which can mean we set
spark.master to yarn-cluster)

I think the two ways to look at it are:
1. We're OK with taking properties that have related flags. In the case of a
property in the 2nd category, we have a policy over which takes precedence. In
the case of a property in the 3rd category, we have some (possibly complex)
resolution logic. This approach would be the most accepting, but requires the
user to have a model of how these conflicts get resolved.
2. We're not OK with taking properties that have related flags. --conf
specifies property that gets passed to the SparkContext and has no effect on
anything that happens before it's created. To save users from themselves, if
someone passes spark.master or spark.app.name through --conf, we ignore it or
throw an error.

I'm a little more partial to approach 2 because I think the mental model is a
little simpler.

Either way, we should probably enforce the same behavior when a config comes
from the defaults file.

Lastly, how do we allow setting a default for one of these special flags? E.g.
make it so that all jobs run on YARN or Mesos by default. With approach 1,
this is relatively straightforward - we use the same logic we'd use on a
property that comes in through --conf for making defaults take effect. We
might need to add spark properties for flags that don't have them already like
--executor-cores. With approach 2, we'd need to add support in the defaults
file or somewhere else for specifying flag defaults.

was (Author: sandyr):
I think the right behavior here is worth a little thought. What's the mental
model we expect the user to have about the relationship between properties
specified through --conf and properties that get their own flag? My first
thought is - if we're ok with taking properties like master through --conf, is
there a point (beyond compatibility) in having flags for these properties at
all?

Flags that aren't conf are there because they impact what happens before the
SparkContext is created. These fall into a couple categories:
1. Flags that have no property Spark conf equivalent like --executor-cores
2. Flags that have a direct Spark conf equivalent like --executor-cores
(spark.executor.memory)
3. Flags that impact a Spark conf like --deploy-mode (which can mean we set
spark.master to yarn-cluster)

1 2 >

1 - 100 of 127 matches

Mail list logo