[GitHub] spark pull request: [SPARK-4082] remove unnecessary broadcast for ...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/2935 [SPARK-4082] remove unnecessary broadcast for conf We already broadcast the task (RDD and closure) itself, so some small data used in RDD or closure do not needed to be broadcasted explicitly any more. You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark fix_conf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2935.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2935 commit b4cd73eb804da65f456af37b89fcda1acdb263f0 Author: Davies Liu dav...@databricks.com Date: 2014-10-24T21:31:03Z remove unnecessary broadcast for conf --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4082] remove unnecessary broadcast for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2935#issuecomment-60452576 [Test build #22167 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22167/consoleFull) for PR 2935 at commit [`b4cd73e`](https://github.com/apache/spark/commit/b4cd73eb804da65f456af37b89fcda1acdb263f0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4082] remove unnecessary broadcast for ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2935#issuecomment-60452971 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22167/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4082] remove unnecessary broadcast for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2935#issuecomment-60452968 [Test build #22167 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22167/consoleFull) for PR 2935 at commit [`b4cd73e`](https://github.com/apache/spark/commit/b4cd73eb804da65f456af37b89fcda1acdb263f0). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4082] remove unnecessary broadcast for ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2935#issuecomment-60460198 Due to a Hadoop thread-safety issue in `Configuration`'s constructor, we need to hold a lock in any code that might call `new Configuration()` on the executor. We don't want to hold the lock for the entire task deserialization because that won't allow tasks to be launched in parallel. Instead, though, we could make our own wrapper that holds a Configuration and grabs CONFIGURATION_INSTANTIATION_LOCK in its `readObject` or `readExternal` method. For some context on this issue, see #2683 and https://issues.apache.org/jira/browse/SPARK-2585 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4082] remove unnecessary broadcast for ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2935#issuecomment-60460256 BTW, maybe you meant to link to https://issues.apache.org/jira/browse/SPARK-4083? I think this is a duplicate of https://issues.apache.org/jira/browse/SPARK-2585 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4082] remove unnecessary broadcast for ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2935#issuecomment-60460447 Also, adding our own synchronizing wrapper will let us roll back some of the complexity introduced by #2684 for ensuring thread-safety, since each task will get its own deserialized copy of the configuration. I suppose that this could have a small performance penalty because we'll always construct a new Configuration (which might be expensive), but I think it should be pretty minimal (we can try measuring it) and is probably offset by other performance improvements in 1.2. By the way, CONFIGURATION_INSTANTIATION_LOCK should probably be moved to SparkHadoopUtil so that it's accessible from more places that might create Configurations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org