[GitHub] spark pull request: [SPARK-4082] remove unnecessary broadcast for ...

2014-10-24 Thread davies
GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/2935

[SPARK-4082] remove unnecessary broadcast for conf

We already broadcast the task (RDD and closure) itself, so some small data 
used in RDD or closure do not needed to be broadcasted explicitly any more.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark fix_conf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2935.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2935


commit b4cd73eb804da65f456af37b89fcda1acdb263f0
Author: Davies Liu dav...@databricks.com
Date:   2014-10-24T21:31:03Z

remove unnecessary broadcast for conf




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4082] remove unnecessary broadcast for ...

2014-10-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2935#issuecomment-60452576
  
  [Test build #22167 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22167/consoleFull)
 for   PR 2935 at commit 
[`b4cd73e`](https://github.com/apache/spark/commit/b4cd73eb804da65f456af37b89fcda1acdb263f0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4082] remove unnecessary broadcast for ...

2014-10-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2935#issuecomment-60452971
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22167/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4082] remove unnecessary broadcast for ...

2014-10-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2935#issuecomment-60452968
  
  [Test build #22167 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22167/consoleFull)
 for   PR 2935 at commit 
[`b4cd73e`](https://github.com/apache/spark/commit/b4cd73eb804da65f456af37b89fcda1acdb263f0).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4082] remove unnecessary broadcast for ...

2014-10-24 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2935#issuecomment-60460198
  
Due to a Hadoop thread-safety issue in `Configuration`'s constructor, we 
need to hold a lock in any code that might call `new Configuration()` on the 
executor.  We don't want to hold the lock for the entire task deserialization 
because that won't allow tasks to be launched in parallel.  Instead, though, we 
could make our own wrapper that holds a Configuration and grabs 
CONFIGURATION_INSTANTIATION_LOCK in its  `readObject` or `readExternal` method.

For some context on this issue, see #2683 and 
https://issues.apache.org/jira/browse/SPARK-2585


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4082] remove unnecessary broadcast for ...

2014-10-24 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2935#issuecomment-60460256
  
BTW, maybe you meant to link to 
https://issues.apache.org/jira/browse/SPARK-4083?  I think this is a duplicate 
of https://issues.apache.org/jira/browse/SPARK-2585


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4082] remove unnecessary broadcast for ...

2014-10-24 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2935#issuecomment-60460447
  
Also, adding our own synchronizing wrapper will let us roll back some of 
the complexity introduced by #2684 for ensuring thread-safety, since each task 
will get its own deserialized copy of the configuration.

I suppose that this could have a small performance penalty because we'll 
always construct a new Configuration (which might be expensive), but I think it 
should be pretty minimal (we can try measuring it) and is probably offset by 
other performance improvements in 1.2.

By the way, CONFIGURATION_INSTANTIATION_LOCK should probably be moved to 
SparkHadoopUtil so that it's accessible from more places that might create 
Configurations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org