[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-85121921 [Test build #29007 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29007/consoleFull) for PR 5093 at commit [`b98f78c`](https://github.com/apache/spark/commit/b98f78c8f652b27e66d3fe554b9b972927017658). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-85121933 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29007/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6471][SQL]: Metastore schema should onl...
GitHub user saucam opened a pull request: https://github.com/apache/spark/pull/5141 [SPARK-6471][SQL]: Metastore schema should only be a subset of parquet schema to support dropping of columns using replace columns Currently in the parquet relation 2 implementation, error is thrown in case merged schema is not exactly the same as metastore schema. But to support cases like deletion of column using replace column command, we can relax the restriction so that even if metastore schema is a subset of merged parquet schema, the query will work. You can merge this pull request into a Git repository by running: $ git pull https://github.com/saucam/spark replace_col Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5141.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5141 commit 5f2f4674084b4f6202c0eb884b798f0980659b4b Author: Yash Datta yash.da...@guavus.com Date: 2015-03-23T17:35:45Z SPARK-6471: Metastore schema should only be a subset of parquet schema to support dropping of columns using replace columns --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6414: Spark driver failed with NPE on jo...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5124#discussion_r26961357 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -805,7 +806,7 @@ class DAGScheduler( } val properties = if (jobIdToActiveJob.contains(jobId)) { - jobIdToActiveJob(stage.jobId).properties + jobIdToActiveJob(stage.jobId).properties.orNull --- End diff -- I don't know if there's a good reason for this, but I don't think we can change it at this point without breaking binary compatibility. We could use annotations / comments to make those fields' nullability more apparent, though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6463] [SQL]AttributeSet.equal should co...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5133#issuecomment-85127620 [Test build #29010 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29010/consoleFull) for PR 5133 at commit [`035ea67`](https://github.com/apache/spark/commit/035ea6726353cd14455fc2552fd8262cf3bffcf8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4435#issuecomment-85127615 [Test build #29011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29011/consoleFull) for PR 4435 at commit [`99764e1`](https://github.com/apache/spark/commit/99764e1afc48608ad6f0a81778a6f03e1ca7a4f1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6414: Spark driver failed with NPE on jo...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5124#issuecomment-85132931 (Sorry, that should have been `SparkContext.localProperties.initialValue` above; I've revised my comment) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5142#issuecomment-85142892 [Test build #29018 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29018/consoleFull) for PR 5142 at commit [`e661a8f`](https://github.com/apache/spark/commit/e661a8f3b146eef23aa668b2c321fecdc8fc). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4588#discussion_r26960440 --- Diff: core/src/test/scala/org/apache/spark/rpc/RpcEnvSuite.scala --- @@ -0,0 +1,526 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.rpc + +import java.util.concurrent.{TimeUnit, CountDownLatch, TimeoutException} + +import scala.collection.mutable +import scala.concurrent.Await +import scala.concurrent.duration._ +import scala.language.postfixOps + +import org.scalatest.{BeforeAndAfterAll, FunSuite} +import org.scalatest.concurrent.Eventually._ + +import org.apache.spark.{SparkException, SparkConf} + +/** + * Common tests for an RpcEnv implementation. + */ +abstract class RpcEnvSuite extends FunSuite with BeforeAndAfterAll { + + var env: RpcEnv = _ + + override def beforeAll(): Unit = { +val conf = new SparkConf() +env = createRpcEnv(conf, local, 12345) + } + + override def afterAll(): Unit = { +if(env != null) { + env.shutdown() +} + } + + def createRpcEnv(conf: SparkConf, name: String, port: Int): RpcEnv + + test(send a message locally) { +@volatile var message: String = null +val rpcEndpointRef = env.setupEndpoint(send-locally, new RpcEndpoint { + override val rpcEnv = env + + override def receive = { +case msg: String = message = msg + } +}) +rpcEndpointRef.send(hello) +eventually(timeout(5 seconds), interval(10 millis)) { + assert(hello === message) +} + } + + test(send a message remotely) { +@volatile var message: String = null +// Set up a RpcEndpoint using env +env.setupEndpoint(send-remotely, new RpcEndpoint { + override val rpcEnv = env + + override def receive = { +case msg: String = message = msg + } +}) + +val anotherEnv = createRpcEnv(new SparkConf(), remote ,13345) +// Use anotherEnv to find out the RpcEndpointRef +val rpcEndpointRef = anotherEnv.setupEndpointRef(local, env.address, send-remotely) +try { + rpcEndpointRef.send(hello) + eventually(timeout(5 seconds), interval(10 millis)) { +assert(hello === message) + } +} finally { + anotherEnv.shutdown() + anotherEnv.awaitTermination() +} + } + + test(send a RpcEndpointRef) { +val endpoint = new RpcEndpoint { + override val rpcEnv = env + + override def receiveAndReply(context: RpcCallContext) = { +case Hello = context.reply(self) +case Echo = context.reply(Echo) + } +} +val rpcEndpointRef = env.setupEndpoint(send-ref, endpoint) + +val newRpcEndpointRef = rpcEndpointRef.askWithReply[RpcEndpointRef](Hello) +val reply = newRpcEndpointRef.askWithReply[String](Echo) +assert(Echo === reply) + } + + test(ask a message locally) { +val rpcEndpointRef = env.setupEndpoint(ask-locally, new RpcEndpoint { + override val rpcEnv = env + + override def receiveAndReply(context: RpcCallContext) = { +case msg: String = { + context.reply(msg) +} + } +}) +val reply = rpcEndpointRef.askWithReply[String](hello) +assert(hello === reply) + } + + test(ask a message remotely) { +env.setupEndpoint(ask-remotely, new RpcEndpoint { + override val rpcEnv = env + + override def receiveAndReply(context: RpcCallContext) = { +case msg: String = { + context.reply(msg) +} + } +}) + +val anotherEnv = createRpcEnv(new SparkConf(), remote, 13345) +// Use anotherEnv to find out the RpcEndpointRef +val rpcEndpointRef = anotherEnv.setupEndpointRef(local, env.address, ask-remotely)
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4435#issuecomment-85123710 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29002/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4435#issuecomment-85123697 [Test build #29002 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29002/consoleFull) for PR 4435 at commit [`1f361c8`](https://github.com/apache/spark/commit/1f361c88d6170a2aae01257bacbc4eebc159202e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class AllJobsResource(uiRoot: UIRoot) ` * `class AllRDDResource(uiRoot: UIRoot) ` * `class AllStagesResource(uiRoot: UIRoot) ` * `class ApplicationListResource(uiRoot: UIRoot) ` * `class CustomObjectMapper extends ContextResolver[ObjectMapper]` * `class SparkEnumSerializer extends JsonSerializer[SparkEnum] ` * `class ExecutorListResource(uiRoot: UIRoot) ` * `class JsonRootResource extends UIRootFromServletContext ` * `trait UIRootFromServletContext ` * `class NotFoundException(msg: String) extends WebApplicationException(` * `class OneApplicationResource(uiRoot: UIRoot) ` * `class OneJobResource(uiRoot: UIRoot) ` * `class OneRDDResource(uiRoot: UIRoot) ` * `class OneStageAttemptResource(uiRoot: UIRoot) ` * `class OneStageResource(uiRoot: UIRoot) ` * `class SecurityFilter extends ContainerRequestFilter with UIRootFromServletContext ` * `class ApplicationInfo(` * `class ExecutorStageSummary(` * `class ExecutorSummary(` * `class JobData(` * `class RDDStorageInfo(` * `class RDDDataDistribution(` * `class RDDPartitionInfo(` * `class StageData(` * `class TaskData(` * `class TaskMetrics(` * `class InputMetrics(` * `class OutputMetrics(` * `class ShuffleReadMetrics(` * `class ShuffleWriteMetrics(` * `class AccumulableInfo (` * `throw new SparkException(It appears you are using SparkEnum in a class which does not +` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6414: Spark driver failed with NPE on jo...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5124#issuecomment-85124958 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6414: Spark driver failed with NPE on jo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5124#issuecomment-85125446 [Test build #29009 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29009/consoleFull) for PR 5124 at commit [`687434c`](https://github.com/apache/spark/commit/687434c9ab65601dde095d3cf6bb2f0de2ea90e1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4435#issuecomment-85129025 [Test build #29013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29013/consoleFull) for PR 4435 at commit [`51eaedb`](https://github.com/apache/spark/commit/51eaedbc864dc41aa5d803b8f3c19cc40bb3040e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-85131070 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29014/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-85131065 [Test build #29014 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29014/consoleFull) for PR 5093 at commit [`f2abc8c`](https://github.com/apache/spark/commit/f2abc8c49490970f7b0bd5829a0696655beb4c09). * This patch **passes all tests**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-85131060 [Test build #29014 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29014/consoleFull) for PR 5093 at commit [`f2abc8c`](https://github.com/apache/spark/commit/f2abc8c49490970f7b0bd5829a0696655beb4c09). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6322][SQL] CTAS should consider the cas...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5014#issuecomment-85131959 [Test build #28999 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28999/consoleFull) for PR 5014 at commit [`5b611cb`](https://github.com/apache/spark/commit/5b611cb5b3cbdcd39ce08c15ead83921866d1c5d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-85132747 @sarutak it sounds like the plan is to significantly change the implementation; if that's the case, then yes, closing this PR and opening a new one when the new functionality is ready is the right strategy. FYI: there's been some effort towards implementing a much more restricted version of this that uses D3: https://issues.apache.org/jira/browse/SPARK-6418. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5142#issuecomment-85135520 [Test build #29016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29016/consoleFull) for PR 5142 at commit [`6a61364`](https://github.com/apache/spark/commit/6a6136424ab2805148e141471fb2e22d37223d05). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4688#discussion_r26965565 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkSubmitOptionParser.java --- @@ -57,6 +57,8 @@ protected final String REPOSITORIES = --repositories; protected final String STATUS = --status; protected final String TOTAL_EXECUTOR_CORES = --total-executor-cores; + protected final String PRINCIPAL = --principal; --- End diff -- nit: should probably be moved below with other YARN-only options. Also, sorting. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
GitHub user brennonyork opened a pull request: https://github.com/apache/spark/pull/5142 [SPARK-4086][GraphX]: Fold-style aggregation for VertexRDD Adds five new methods into the `VertexRDD` suite to allow for fold-style calling conventions. Those methods are: * `leftZipJoinWithFold` * `leftJoinWithFold` * `innerZipJoinWithFold` * `innerJoinWithFold` * `aggregateUsingIndexWithFold` Each of above has a set of tests within the `VertexRDDSuite` to ensure proper functionality. You can merge this pull request into a Git repository by running: $ git pull https://github.com/brennonyork/spark SPARK-4086 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5142.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5142 commit c2ef961e1168bb2de57bc4e12d118d9d5883345b Author: Brennon York brennon.y...@capitalone.com Date: 2015-03-18T21:32:48Z added leftJoin*WithFold commit 639046c6fcc0f9f82f5b1fa6fc4092efa2a6ecff Author: Brennon York brennon.y...@capitalone.com Date: 2015-03-19T22:57:51Z added innerJoin with folds commit 1229f9fa3ddcadc39a17d1af0146275208f4c34e Author: Brennon York brennon.y...@capitalone.com Date: 2015-03-23T18:07:19Z added aggregateUsingIndexWithFold commit 98197e743cf71d954fd45a456a88a7ae2ff47888 Author: Brennon York brennon.y...@capitalone.com Date: 2015-03-23T18:10:46Z updated test to better demonstrate the aggregate fold-style values correctly being passed in commit 6a6136424ab2805148e141471fb2e22d37223d05 Author: Brennon York brennon.y...@capitalone.com Date: 2015-03-23T18:25:03Z added proper docstrings --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6369] [SQL] [WIP] Uses commit coordinat...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/5139#issuecomment-85141856 @aarondav if you have time, I'd appreciate your input here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6414: Spark driver failed with NPE on jo...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5124#issuecomment-85132588 It looks like this NPE bug has been around for a while, but it seems pretty hard to hit (which is probably why it hasn't been reported before). I think that we should be able to trigger / reproduce this by creating a new SparkContext, ensuring that the thread-local properties are null, launching a long-running job, then attempting to cancel all jobs in some non-existent job group. Can we add a regression test for this? Shouldn't be too hard if my hunch is right. It looks like we don't directly expose the Properties object to users, so if we wanted to we could go even further and convert all of the upstream nullable `Properties` into `Options[Properties]` as well. If you look at the call chain leading to this use of `properties`, it looks like it can only be null if no local properties have ever been set the job submitting thread, its parent thread, or any of its other ancestor threads. Therefore, maybe we can just eliminate the whole null / Option stuff entirely by ensuring that the thread-local has an `initialValue` instead of having it be `null` in some circumstances and not others. Therefore, here's my suggestion: - Add a regression test and confirm that it reproduces the original bug. - Override `SparkContext.initialValue` to return a new empty properties object (since this is [how we lazily initialize](https://github.com/hunglin/spark/blob/687434c9ab65601dde095d3cf6bb2f0de2ea90e1/core/src/main/scala/org/apache/spark/SparkContext.scala#L478) the properties in the existing code. Update the other parts of SparkContext that set this to account for this change. - Add a few `assert(properties != null)` so that we catch errors up-front. I'd add these checks at the entry points of the DAGScheduler, e.g. the `private[spark]` `submitJob` methods that are called from SparkContext. Your patch looks good overall, but if we can I think we should just fix the underlying messiness. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6428] Added explicit types for all publ...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/5125#discussion_r26966209 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -986,7 +986,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli union(Seq(first) ++ rest) /** Get an RDD that has no partitions or elements. */ - def emptyRDD[T: ClassTag] = new EmptyRDD[T](this) + def emptyRDD[T: ClassTag]: EmptyRDD[T] = new EmptyRDD[T](this) --- End diff -- shouldn't the return type here by `RDD[T]`, since `EmptyRDD` is `private[spark]` and just an implementation detail? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4688#issuecomment-85142115 Looks OK to me. The code in `Client.scala` is getting pretty hard to follow, would probably benefit from some cleanup later on... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-4848] Stand-alone cluster: Allow differ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5140#issuecomment-85147160 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29005/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5775] BugFix: GenericRow cannot be cast...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4697#issuecomment-85147251 Thanks! Merged to branch-1.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-4848] Stand-alone cluster: Allow differ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5140#issuecomment-85147094 [Test build #29005 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29005/consoleFull) for PR 5140 at commit [`d739640`](https://github.com/apache/spark/commit/d739640308ca0884bf5cd678dbedf3cc85c3cec9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4588#discussion_r26960403 --- Diff: core/src/test/scala/org/apache/spark/rpc/RpcEnvSuite.scala --- @@ -0,0 +1,526 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.rpc + +import java.util.concurrent.{TimeUnit, CountDownLatch, TimeoutException} + +import scala.collection.mutable +import scala.concurrent.Await +import scala.concurrent.duration._ +import scala.language.postfixOps + +import org.scalatest.{BeforeAndAfterAll, FunSuite} +import org.scalatest.concurrent.Eventually._ + +import org.apache.spark.{SparkException, SparkConf} + +/** + * Common tests for an RpcEnv implementation. + */ +abstract class RpcEnvSuite extends FunSuite with BeforeAndAfterAll { + + var env: RpcEnv = _ + + override def beforeAll(): Unit = { +val conf = new SparkConf() +env = createRpcEnv(conf, local, 12345) + } + + override def afterAll(): Unit = { +if(env != null) { + env.shutdown() +} + } + + def createRpcEnv(conf: SparkConf, name: String, port: Int): RpcEnv + + test(send a message locally) { +@volatile var message: String = null +val rpcEndpointRef = env.setupEndpoint(send-locally, new RpcEndpoint { + override val rpcEnv = env + + override def receive = { +case msg: String = message = msg + } +}) +rpcEndpointRef.send(hello) +eventually(timeout(5 seconds), interval(10 millis)) { + assert(hello === message) +} + } + + test(send a message remotely) { +@volatile var message: String = null +// Set up a RpcEndpoint using env +env.setupEndpoint(send-remotely, new RpcEndpoint { + override val rpcEnv = env + + override def receive = { +case msg: String = message = msg + } +}) + +val anotherEnv = createRpcEnv(new SparkConf(), remote ,13345) +// Use anotherEnv to find out the RpcEndpointRef +val rpcEndpointRef = anotherEnv.setupEndpointRef(local, env.address, send-remotely) +try { + rpcEndpointRef.send(hello) + eventually(timeout(5 seconds), interval(10 millis)) { +assert(hello === message) + } +} finally { + anotherEnv.shutdown() + anotherEnv.awaitTermination() +} + } + + test(send a RpcEndpointRef) { +val endpoint = new RpcEndpoint { + override val rpcEnv = env + + override def receiveAndReply(context: RpcCallContext) = { +case Hello = context.reply(self) +case Echo = context.reply(Echo) + } +} +val rpcEndpointRef = env.setupEndpoint(send-ref, endpoint) + +val newRpcEndpointRef = rpcEndpointRef.askWithReply[RpcEndpointRef](Hello) +val reply = newRpcEndpointRef.askWithReply[String](Echo) +assert(Echo === reply) + } + + test(ask a message locally) { +val rpcEndpointRef = env.setupEndpoint(ask-locally, new RpcEndpoint { + override val rpcEnv = env + + override def receiveAndReply(context: RpcCallContext) = { +case msg: String = { + context.reply(msg) +} + } +}) +val reply = rpcEndpointRef.askWithReply[String](hello) +assert(hello === reply) + } + + test(ask a message remotely) { +env.setupEndpoint(ask-remotely, new RpcEndpoint { + override val rpcEnv = env + + override def receiveAndReply(context: RpcCallContext) = { +case msg: String = { + context.reply(msg) +} + } +}) + +val anotherEnv = createRpcEnv(new SparkConf(), remote, 13345) +// Use anotherEnv to find out the RpcEndpointRef +val rpcEndpointRef = anotherEnv.setupEndpointRef(local, env.address, ask-remotely)
[GitHub] spark pull request: [SPARK-6471][SQL]: Metastore schema should onl...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5141#issuecomment-85122746 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5559] [Streaming] [Test] Remove oppotun...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/4337#issuecomment-85122775 Sorry, I had no time until last weekend but now I have. I'll address that soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-85123943 [Test build #29008 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29008/consoleFull) for PR 5093 at commit [`6912584`](https://github.com/apache/spark/commit/69125849f4ca32d17e1db6fa47f61c9b992a9a94). * This patch **passes all tests**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-85123947 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29008/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-85123940 [Test build #29008 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29008/consoleFull) for PR 5093 at commit [`6912584`](https://github.com/apache/spark/commit/69125849f4ca32d17e1db6fa47f61c9b992a9a94). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4435#issuecomment-85124453 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29003/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4435#issuecomment-85124440 [Test build #29003 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29003/consoleFull) for PR 4435 at commit [`a066055`](https://github.com/apache/spark/commit/a066055441f370598bdef7868ff3bd51b4f0136d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class AllStagesResource(uiRoot: UIRoot) ` * `class OneStageResource(uiRoot: UIRoot) ` * `class ApplicationInfo(` * `class ExecutorStageSummary(` * `class ExecutorSummary(` * `class JobData(` * `class RDDStorageInfo(` * `class RDDDataDistribution(` * `class RDDPartitionInfo(` * `class StageData(` * `class TaskData(` * `class TaskMetrics(` * `class InputMetrics(` * `class OutputMetrics(` * `class ShuffleReadMetrics(` * `class ShuffleWriteMetrics(` * `class AccumulableInfo (` * `throw new SparkException(It appears you are using SparkEnum in a class which does not +` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6463] [SQL]AttributeSet.equal should co...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/5133#issuecomment-85126028 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5559] [Streaming] [Test] Remove oppotun...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4337#issuecomment-85134053 [Test build #29015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29015/consoleFull) for PR 4337 at commit [`16f109f`](https://github.com/apache/spark/commit/16f109f13a90d28c3d187f47cb2d0dcd5fc782bc). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4688#discussion_r26965498 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkSubmitOptionParser.java --- @@ -108,6 +110,8 @@ { REPOSITORIES }, { STATUS }, { TOTAL_EXECUTOR_CORES }, +{ PRINCIPAL}, +{ KEYTAB} --- End diff -- nit: can you add these in sorted order, and add a trailing `,` to the last one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6428] Added explicit types for all publ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5125#discussion_r26966555 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -986,7 +986,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli union(Seq(first) ++ rest) /** Get an RDD that has no partitions or elements. */ - def emptyRDD[T: ClassTag] = new EmptyRDD[T](this) + def emptyRDD[T: ClassTag]: EmptyRDD[T] = new EmptyRDD[T](this) --- End diff -- it should - except then it broke binary compatibility :( --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4435#issuecomment-85141005 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29011/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3533][Core][PySpark] Add saveAsTextFile...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/4895#issuecomment-85119879 @srowen [SPARK-3533](https://issues.apache.org/jira/browse/SPARK-3533) has a lot of votes and watchers, and there are a few linked questions on Stack Overflow from there, the most popular one being [this question](http://stackoverflow.com/q/23995040/877069), which has 12 upvotes ATM and close to 4,000 views in about a year, as well as several linked questions asking about the same thing. From a user perspective, I can definitely say that this is a sought-after method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4588#discussion_r26960291 --- Diff: core/src/test/scala/org/apache/spark/rpc/RpcEnvSuite.scala --- @@ -0,0 +1,526 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.rpc + +import java.util.concurrent.{TimeUnit, CountDownLatch, TimeoutException} + +import scala.collection.mutable +import scala.concurrent.Await +import scala.concurrent.duration._ +import scala.language.postfixOps + +import org.scalatest.{BeforeAndAfterAll, FunSuite} +import org.scalatest.concurrent.Eventually._ + +import org.apache.spark.{SparkException, SparkConf} + +/** + * Common tests for an RpcEnv implementation. + */ +abstract class RpcEnvSuite extends FunSuite with BeforeAndAfterAll { + + var env: RpcEnv = _ + + override def beforeAll(): Unit = { +val conf = new SparkConf() +env = createRpcEnv(conf, local, 12345) + } + + override def afterAll(): Unit = { +if(env != null) { + env.shutdown() +} + } + + def createRpcEnv(conf: SparkConf, name: String, port: Int): RpcEnv + + test(send a message locally) { +@volatile var message: String = null +val rpcEndpointRef = env.setupEndpoint(send-locally, new RpcEndpoint { + override val rpcEnv = env + + override def receive = { +case msg: String = message = msg + } +}) +rpcEndpointRef.send(hello) +eventually(timeout(5 seconds), interval(10 millis)) { + assert(hello === message) +} + } + + test(send a message remotely) { +@volatile var message: String = null +// Set up a RpcEndpoint using env +env.setupEndpoint(send-remotely, new RpcEndpoint { + override val rpcEnv = env + + override def receive = { +case msg: String = message = msg + } +}) + +val anotherEnv = createRpcEnv(new SparkConf(), remote ,13345) +// Use anotherEnv to find out the RpcEndpointRef +val rpcEndpointRef = anotherEnv.setupEndpointRef(local, env.address, send-remotely) +try { + rpcEndpointRef.send(hello) + eventually(timeout(5 seconds), interval(10 millis)) { +assert(hello === message) + } +} finally { + anotherEnv.shutdown() + anotherEnv.awaitTermination() +} + } + + test(send a RpcEndpointRef) { +val endpoint = new RpcEndpoint { + override val rpcEnv = env + + override def receiveAndReply(context: RpcCallContext) = { +case Hello = context.reply(self) +case Echo = context.reply(Echo) + } +} +val rpcEndpointRef = env.setupEndpoint(send-ref, endpoint) + +val newRpcEndpointRef = rpcEndpointRef.askWithReply[RpcEndpointRef](Hello) +val reply = newRpcEndpointRef.askWithReply[String](Echo) +assert(Echo === reply) + } + + test(ask a message locally) { +val rpcEndpointRef = env.setupEndpoint(ask-locally, new RpcEndpoint { + override val rpcEnv = env + + override def receiveAndReply(context: RpcCallContext) = { +case msg: String = { + context.reply(msg) +} + } +}) +val reply = rpcEndpointRef.askWithReply[String](hello) +assert(hello === reply) + } + + test(ask a message remotely) { +env.setupEndpoint(ask-remotely, new RpcEndpoint { + override val rpcEnv = env + + override def receiveAndReply(context: RpcCallContext) = { +case msg: String = { + context.reply(msg) +} + } +}) + +val anotherEnv = createRpcEnv(new SparkConf(), remote, 13345) +// Use anotherEnv to find out the RpcEndpointRef +val rpcEndpointRef = anotherEnv.setupEndpointRef(local, env.address, ask-remotely)
[GitHub] spark pull request: [SPARK-5124][Core] A standard RPC interface an...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4588#issuecomment-8519 Left mostly minor comments, otherwise looks good. We can iron out any kinks later. There's just some odd code in the test suite, where you're calling `stop` in a shared RPC env variable. That looks a little suspicious. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6124] Support jdbc connection propertie...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4859#issuecomment-85129177 [Test build #29012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29012/consoleFull) for PR 4859 at commit [`9f32724`](https://github.com/apache/spark/commit/9f327244eb3ca3ad3a483570ab82999869973150). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6124] Support jdbc connection propertie...
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/4859#issuecomment-85131353 This LGTM, we'll merge this and later add a jdbc() version. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-85131296 I apologize for my late reply. I had no time until last weekend. Actually, I'm reconsidering what and how should be visualized and trying implementing. I could show you a concrete implementation in a few weeks. Should I close this PR for now and reopen later? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6322][SQL] CTAS should consider the cas...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5014#issuecomment-85131979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28999/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6345][STREAMING][MLLIB] Fix for trainin...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/5037#issuecomment-85133430 BTW, I really think we should merge this soon for 1.3.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-85135353 [Test build #29017 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29017/consoleFull) for PR 5093 at commit [`56f74a8`](https://github.com/apache/spark/commit/56f74a8a1e7f4c808827ba1f0b09f1f3b40db028). * This patch **passes all tests**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-85135359 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29017/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5093#issuecomment-85135350 [Test build #29017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29017/consoleFull) for PR 5093 at commit [`56f74a8`](https://github.com/apache/spark/commit/56f74a8a1e7f4c808827ba1f0b09f1f3b40db028). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6322][SQL] CTAS should consider the cas...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/5014#discussion_r26965253 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -142,7 +142,7 @@ case class CreateTableAsSelect[T]( tableName: String, child: LogicalPlan, allowExisting: Boolean, -desc: Option[T] = None) extends UnaryNode { +desc: T) extends UnaryNode { --- End diff -- Let's get rid of the type parameter and rename it to `CreateHiveTableAsSelect` (be a little bit more specific on what this one does). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5142#issuecomment-85135767 [Test build #29016 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29016/consoleFull) for PR 5142 at commit [`6a61364`](https://github.com/apache/spark/commit/6a6136424ab2805148e141471fb2e22d37223d05). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5142#issuecomment-85135778 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29016/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6308] [MLlib] [Sql] Override TypeName i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5118#issuecomment-85136374 [Test build #29001 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29001/consoleFull) for PR 5118 at commit [`6c8ffab`](https://github.com/apache/spark/commit/6c8ffab396d76e329100c9c33a609f1b993e1abb). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6308] [MLlib] [Sql] Override TypeName i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5118#issuecomment-85136390 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29001/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/4688#discussion_r26966346 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -540,6 +560,27 @@ private[spark] class Client( amContainer } + def setupCredentials(): Unit = { +if (args.principal != null) { + Preconditions.checkNotNull( --- End diff -- sorry for flip-flopping here. The methods in `Preconditions` work weirdly in Scala as you've noticed. Should probably use Scala's `require()` here (which throws IllegalArgumentException, which is also a little more correct). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-85139282 @kayousterhout The basic idea is not changed but I try to use vis.js instead of D3.js because vis.js is easy to build rich timeline view. This is under development version of the new implementation. [Timeline view for an application] ![2015-03-23 11 38 50](https://cloud.githubusercontent.com/assets/4736016/6787702/da5b6a06-d151-11e4-89c5-d8d1ba68297f.png) [Timeline view for a stage] ![2015-03-23 11 40 09](https://cloud.githubusercontent.com/assets/4736016/6787735/f54c25e4-d151-11e4-8a7a-2f6d9b0325be.png) Actually, I talked with Matei at New York last week and showed the implementation, then got some feedbacks. One of the feedbacks is that each square which meaning each task should show it's proportion of duration like you are suggesting at SPARK-6418. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6428] Added explicit types for all publ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5125#discussion_r26966597 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -986,7 +986,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli union(Seq(first) ++ rest) /** Get an RDD that has no partitions or elements. */ - def emptyRDD[T: ClassTag] = new EmptyRDD[T](this) + def emptyRDD[T: ClassTag]: EmptyRDD[T] = new EmptyRDD[T](this) --- End diff -- BTW functions are like this are why we should always declare types explicitly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4435#issuecomment-85140966 [Test build #29011 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29011/consoleFull) for PR 4435 at commit [`99764e1`](https://github.com/apache/spark/commit/99764e1afc48608ad6f0a81778a6f03e1ca7a4f1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class AllStagesResource(uiRoot: UIRoot) ` * `class OneStageResource(uiRoot: UIRoot) ` * `class ApplicationInfo(` * `class ExecutorStageSummary(` * `class ExecutorSummary(` * `class JobData(` * `class RDDStorageInfo(` * `class RDDDataDistribution(` * `class RDDPartitionInfo(` * `class StageData(` * `class TaskData(` * `class TaskMetrics(` * `class InputMetrics(` * `class OutputMetrics(` * `class ShuffleReadMetrics(` * `class ShuffleWriteMetrics(` * `class AccumulableInfo (` * `throw new SparkException(It appears you are using SparkEnum in a class which does not +` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6428] Added explicit types for all publ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/5125#discussion_r26967311 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -986,7 +986,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli union(Seq(first) ++ rest) /** Get an RDD that has no partitions or elements. */ - def emptyRDD[T: ClassTag] = new EmptyRDD[T](this) + def emptyRDD[T: ClassTag]: EmptyRDD[T] = new EmptyRDD[T](this) --- End diff -- This is written up a bit more in https://issues.apache.org/jira/browse/SPARK-2331 which should be reopened for the new 2+ bucket in JIRA. This should be fixed when binary compatibility can be broken. Yes, big +1 to tightening up types like this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5095][MESOS] Support capping cores and ...
Github user tnachen commented on a diff in the pull request: https://github.com/apache/spark/pull/4027#discussion_r26987588 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala --- @@ -63,20 +63,25 @@ private[spark] class CoarseMesosSchedulerBackend( // Maximum number of cores to acquire (TODO: we'll need more flexible controls here) val maxCores = conf.get(spark.cores.max, Int.MaxValue.toString).toInt + val maxExecutorsPerSlave = conf.getInt(spark.mesos.coarse.executors.max, 1) + val maxCpusPerExecutor = conf.getInt(spark.mesos.coarse.cores.max, Int.MaxValue) --- End diff -- It's quite hard to differentiate between spark.cores.max since spark.cores itself is already very vague, where it's a configuration to set the total number of cores a Spark app can schedule. spark.mesos.coarse.cores.max is the Maximum cores a coarse grained Spark executor can take up to, and the scheduler will schedule any cores between 1 to spark.mesos.coarse.cores.max. So calling it coresPerExecutor doesn't seem right as it's not a hard value that the scheduler tries to schedule. How about spark.mesos.coarse.coresPerExecutor.max? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-85232216 @hellertime how about just add the Apache license on the top of the Dockerfile? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6122][Core] Upgrade Tachyon client vers...
Github user calvinjia commented on the pull request: https://github.com/apache/spark/pull/4867#issuecomment-85234685 @JoshRosen Just from a quick glance at the output log, it seems to be a style issue (line 100 characters). I don't think this patch should have caused the issues, since the errors have been the same as the ones since build #1937. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Add...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5074#discussion_r26988014 --- Diff: docs/programming-guide.md --- @@ -1086,6 +1086,62 @@ for details. /tr /table +### Shuffle operations + +Certain operations within Spark trigger an event known as the shuffle. The shuffle is Spark's +mechanism for re-distributing data so that is grouped differently across partitions. This typically +involves re-arranging and copying data across executors and machines, making shuffle a complex and +costly operation. + + Background + +To understand what happens during the shuffle we can consider the example of the +[`reduceByKey`](#ReduceByLink) operation. The `reduceByKey` operation generates a new RDD where all +values for a single key are combined into a tuple - the key and the result of executing a reduce +function against all values associated with that key. The challenge is that not all values for a +single key necessarily reside on the same partition, or even the same machine, but they must be +co-located to present a single array per key. + +In Spark, data is generally not distributed across partitions to be in the necessary place for a +specific operation. During computations, a single task will operate on a single partition - thus, to +organize all the data for a single `reduceByKey` reduce task to execute, Spark needs to perform an +all-to-all operation. It must read from all partitions to find all the values for all keys, and then +organize those such that all values for any key lie within the same partition - this is called the +**shuffle**. + +Although the set of elements in each partition of newly shuffled data will be deterministic, the +ordering of these elements is not. If one desires predictably ordered data following shuffle +operations, [`mapPartitions`](#MapPartLink) can be used to sort each partition or `sortBy` can be +used to perform a global sort. A similar operation, +[`repartitionAndSortWithinPartitions`](#Repartition2Link`) coupled with `mapPartitions`, +may be used to enact a Hadoop style shuffle. + +Operations which can cause a shuffle include **repartition** operations like +[`repartition`](#RepartitionLink), and [`coalesce`](#CoalesceLink), **'byKey** operations +(except for counting) like [`groupByKey`](#GroupByLink) and [`reduceByKey`](#ReduceByLink), and +**join** operations like [`cogroup`](#CogroupLink) and [`join`](#JoinLink). + + Performance Impact +**Shuffle** is an expensive operation since it involves disk I/O, data serialization, and +network I/O. To organize data for the shuffle, Spark generates two sets of tasks - map tasks to +organize the data, and a set of reduce tasks to aggregate it. Internally, results from individual +map jobs are kept in memory until they can't fit. Then, these are sorted based on the target reduce +task and written to a single file. On the reduce side, tasks read the relevant sorted blocks. + +Certain shuffle operations can consume significant amounts of heap memory since they generate hash +tables in memory. Specifically, `reduceByKey` and `aggregateByKey` on the map-side and `'byKey` +operations on the reduce-side. When data does not fit in memory Spark will spill these tables to +disk, incurring the additional overhead of disk I/O and increased garbage collection. + +Shuffle also generates a large number of intermediate files on disk. As of Spark 1.3, these files +are not cleaned up from Spark's temporary storage until Spark is stopped, which means that +long-running Spark jobs may consume available disk space. This is done so the shuffle doesn't need +to be re-computed if the lineage is re-computed. The temporary storage directory is specified by the +`spark.local.dir` configuration parameter when configuring the Spark context. + +Shuffle behavior can be fine-tuned by adjusting a variety of configuration parameters. See the --- End diff -- fine-tuned - tuned. Also, can we link to the relevant tuning section? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-85256109 [Test build #29037 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29037/consoleFull) for PR 3074 at commit [`a2856cd`](https://github.com/apache/spark/commit/a2856cdc99229d96f5b76a619bfbd21105513404). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-85256117 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29037/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6325] [core,yarn] Do not change target ...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/5018#issuecomment-85262710 @vanzin @sryza Thanks for working on the fix. I was away for the past week and did not have the chance to review this before it went in. Regarding the code being overly complicated, the reason why the bookkeeping is done in each of the three places you pointed out is the following: - We need to do it in `ExecutorAllocationManager` so we don't keep requesting beyond the configured maximum - We need to do it in `CoarseGrainedExecutorBackend` because the user can bypass the dynamic scaling logic and explicitly request executors through `sc.requestTotalExecutors`. - We need to do it in `YarnAllocator` to ensure we don't over-allocate containers, regardless of whether `sc.requestTotalExecutors` or the dynamic scaling logic is used. My intent is also to simplify the code so as to minimize the possibility of this feature breaking again in a future release. If either of you have concrete suggestions on refactoring it for this purpose, I would love to hear them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6209] Clean up connections in ExecutorC...
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4944#issuecomment-85264973 Yeah it looks okay to me but I also would feel more comfortable if a second core committer took a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6428] Added explicit types for all publ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5125#discussion_r26995821 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala --- @@ -169,8 +177,8 @@ private[deploy] class DriverRunner( runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise) } - def runCommandWithRetry(command: ProcessBuilderLike, initialize: Process = Unit, -supervise: Boolean) { + def runCommandWithRetry( + command: ProcessBuilderLike, initialize: Process = Unit, supervise: Boolean) { --- End diff -- Should this have an explicit `:Unit = `? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6477][Build]: Run MIMA tests before the...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/5145#issuecomment-85267784 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6478] New RDD.pipeWithPartition method
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5147#issuecomment-85275593 I'm a little hesitant to want to add a new `withPartition` or `withSplit`-like method, since we've been deprecating those in favor of using things like TaskContext. Can you address your use-case with `TaskContext.get` and `printPipeContext`? For example, how about this: ```scala myRDD.pipe( command=, printPipeContext = (p = p(PARTITION= + TaskContext.get.partitionId()) ``` Is the problem that you want to be able to store the partition as part of the command or environment? If so, then maybe we could generalize this so that the function is invoked with a TaskContext instead of a Partition (in other words, change it to pipeWithContext). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4259#issuecomment-85275422 [Test build #29042 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29042/consoleFull) for PR 4259 at commit [`66a4dc3`](https://github.com/apache/spark/commit/66a4dc31aa56ced603cd1172719dc4510fcdbaa1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5750][SPARK-3441][SPARK-5836][CORE] Add...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5074#discussion_r26987728 --- Diff: docs/programming-guide.md --- @@ -1086,6 +1086,62 @@ for details. /tr /table +### Shuffle operations + +Certain operations within Spark trigger an event known as the shuffle. The shuffle is Spark's +mechanism for re-distributing data so that is grouped differently across partitions. This typically +involves re-arranging and copying data across executors and machines, making shuffle a complex and +costly operation. + + Background + +To understand what happens during the shuffle we can consider the example of the +[`reduceByKey`](#ReduceByLink) operation. The `reduceByKey` operation generates a new RDD where all +values for a single key are combined into a tuple - the key and the result of executing a reduce +function against all values associated with that key. The challenge is that not all values for a +single key necessarily reside on the same partition, or even the same machine, but they must be +co-located to present a single array per key. + +In Spark, data is generally not distributed across partitions to be in the necessary place for a +specific operation. During computations, a single task will operate on a single partition - thus, to +organize all the data for a single `reduceByKey` reduce task to execute, Spark needs to perform an +all-to-all operation. It must read from all partitions to find all the values for all keys, and then +organize those such that all values for any key lie within the same partition - this is called the +**shuffle**. + +Although the set of elements in each partition of newly shuffled data will be deterministic, the +ordering of these elements is not. If one desires predictably ordered data following shuffle +operations, [`mapPartitions`](#MapPartLink) can be used to sort each partition or `sortBy` can be +used to perform a global sort. A similar operation, +[`repartitionAndSortWithinPartitions`](#Repartition2Link`) coupled with `mapPartitions`, +may be used to enact a Hadoop style shuffle. + +Operations which can cause a shuffle include **repartition** operations like +[`repartition`](#RepartitionLink), and [`coalesce`](#CoalesceLink), **'byKey** operations +(except for counting) like [`groupByKey`](#GroupByLink) and [`reduceByKey`](#ReduceByLink), and +**join** operations like [`cogroup`](#CogroupLink) and [`join`](#JoinLink). + + Performance Impact +**Shuffle** is an expensive operation since it involves disk I/O, data serialization, and +network I/O. To organize data for the shuffle, Spark generates two sets of tasks - map tasks to +organize the data, and a set of reduce tasks to aggregate it. Internally, results from individual +map jobs are kept in memory until they can't fit. Then, these are sorted based on the target reduce --- End diff -- map jobs - map tasks Slightly more precise to say that they're sorted based on the target partition (because multiple partitions could end up in the same task if `coalesce` is called). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-85235696 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: New RDD.pipeWithPartition method
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5147#issuecomment-85235660 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-85235864 [Test build #29036 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29036/consoleFull) for PR 3074 at commit [`a2856cd`](https://github.com/apache/spark/commit/a2856cdc99229d96f5b76a619bfbd21105513404). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6122][Core] Upgrade Tachyon client vers...
Github user calvinjia commented on the pull request: https://github.com/apache/spark/pull/4867#issuecomment-85237240 @srowen Oh I see the build still failed after the fix to that patch. Strange that there would be issues between now and the last test run for this patch since there should not have been any dependency changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5338][MESOS] Add cluster mode support f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5144#issuecomment-85243242 [Test build #29038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29038/consoleFull) for PR 5144 at commit [`df925b7`](https://github.com/apache/spark/commit/df925b780348e72e3a6f592590f2e868e74cf8a3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6406] Launcher backward compatibility i...
Github user nishkamravi2 commented on the pull request: https://github.com/apache/spark/pull/5085#issuecomment-85244894 Thanks for the comments @vanzin. Will address them soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5142#issuecomment-85247503 [Test build #29033 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29033/consoleFull) for PR 5142 at commit [`c6744b8`](https://github.com/apache/spark/commit/c6744b82776263889c7a5eb7664835419834d28b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4086][GraphX]: Fold-style aggregation f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5142#issuecomment-85247520 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29033/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6322][SQL] CTAS should consider the cas...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/5014#discussion_r26994637 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -142,7 +142,7 @@ case class CreateTableAsSelect[T]( tableName: String, child: LogicalPlan, allowExisting: Boolean, -desc: Option[T] = None) extends UnaryNode { +desc: T) extends UnaryNode { --- End diff -- The `CreateTableAsSelect` is designed as a common logical plan node, that's why I made the `desc` as `T`, and also the optional parameter. Otherwise, every SQL dialect will implements it's own `CTAS` node(logical plan). Or is the `CreateTableUsingAsSelect` a more generic interface for the same purpose? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4123][Project Infra][WIP]: Show new dep...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/5093#discussion_r26996263 --- Diff: dev/tests/pr_new_dependencies.sh --- @@ -0,0 +1,85 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# This script follows the base format for testing pull requests against +# another branch and returning results to be published. More details can be +# found at dev/run-tests-jenkins. +# +# Arg1: The Github Pull Request Actual Commit +#+ known as `ghprbActualCommit` in `run-tests-jenkins` +# Arg2: The SHA1 hash +#+ known as `sha1` in `run-tests-jenkins` +# + +ghprbActualCommit=$1 +sha1=$2 + +MVN_BIN=`pwd`/build/mvn +CURR_CP_FILE=my-classpath.txt +MASTER_CP_FILE=master-classpath.txt + +${MVN_BIN} clean compile dependency:build-classpath 2/dev/null | \ --- End diff -- If that's the case, we should probably gate this entire thing in a check as to whether any pom.xml files are modified. Then for most builds this will not add any time, since most builds do not modify dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4259#issuecomment-85275731 [Test build #29042 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29042/consoleFull) for PR 4259 at commit [`66a4dc3`](https://github.com/apache/spark/commit/66a4dc31aa56ced603cd1172719dc4510fcdbaa1). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/4259#issuecomment-85275887 @jkbradley and @mengxr I just rebased it. Will do couple optimizations to avoid the scaling on the datasets which can be done in the optimization instead. You guys can start to give me feedback so we have ample time to address issues before 1.4. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4259#issuecomment-85275736 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29042/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4259#issuecomment-85278373 [Test build #29043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29043/consoleFull) for PR 4259 at commit [`ea3e1dc`](https://github.com/apache/spark/commit/ea3e1dc55583d1fdd69c74a0201c5743a0baef2a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5095][MESOS] Support capping cores and ...
Github user tnachen commented on a diff in the pull request: https://github.com/apache/spark/pull/4027#discussion_r26988123 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala --- @@ -204,35 +209,43 @@ private[spark] class CoarseMesosSchedulerBackend( for (offer - offers) { val slaveId = offer.getSlaveId.toString -val mem = getResource(offer.getResourcesList, mem) -val cpus = getResource(offer.getResourcesList, cpus).toInt -if (totalCoresAcquired maxCores -mem = MemoryUtils.calculateTotalMemory(sc) -cpus = 1 +var totalMem = getResource(offer.getResourcesList, mem) +var totalCpus = getResource(offer.getResourcesList, cpus).toInt --- End diff -- I'm calling it remainingCores --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5095][MESOS] Support capping cores and ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4027#issuecomment-85239748 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29034/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5095][MESOS] Support capping cores and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4027#issuecomment-85239717 [Test build #29034 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29034/consoleFull) for PR 4027 at commit [`6d04da1`](https://github.com/apache/spark/commit/6d04da11e44d395416f208a20d250c17c672fcc9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6480 [CORE] histogram() bucket function ...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/5148 SPARK-6480 [CORE] histogram() bucket function is wrong in some simple edge cases Fix fastBucketFunction for histogram() to handle edge conditions more correctly. Add a test, and fix existing one accordingly You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-6480 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5148.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5148 commit 23ec01e8276478f716ebd6307eb88d7d1581ef14 Author: Sean Owen so...@cloudera.com Date: 2015-03-23T23:21:25Z Fix fastBucketFunction for histogram() to handle edge conditions more correctly. Add a test, and fix existing one accordingly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5961][Streaming]Allow specific nodes in...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/5114#discussion_r26993189 --- Diff: external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala --- @@ -44,12 +44,14 @@ import org.jboss.netty.handler.codec.compression._ private[streaming] class FlumeInputDStream[T: ClassTag]( - @transient ssc_ : StreamingContext, - host: String, - port: Int, - storageLevel: StorageLevel, - enableDecompression: Boolean -) extends ReceiverInputDStream[SparkFlumeEvent](ssc_) { +@transient ssc_ : StreamingContext, --- End diff -- What has changed in these lines? Why are they in the diff? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-85254916 [Test build #29036 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29036/consoleFull) for PR 3074 at commit [`a2856cd`](https://github.com/apache/spark/commit/a2856cdc99229d96f5b76a619bfbd21105513404). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6420] Driver's Block Manager does not u...
Github user marsishandsome commented on the pull request: https://github.com/apache/spark/pull/5095#issuecomment-85257353 @tgravescs You are right. Maybe we should provide two choices: Ip and Hostname. Both will be automatically figured out by Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6428] Added explicit types for all publ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5125#discussion_r26995786 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala --- @@ -33,6 +33,8 @@ import org.apache.spark.deploy.master.DriverState import org.apache.spark.deploy.master.DriverState.DriverState import org.apache.spark.util.{Clock, SystemClock} +import scala.collection.mutable --- End diff -- Nit: this should be grouped with the other Scala imports. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6428] Added explicit types for all publ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5125#discussion_r26996110 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -70,7 +71,8 @@ class JdbcRDD[T: ClassTag]( }).toArray } - override def compute(thePart: Partition, context: TaskContext) = new NextIterator[T] { + override def compute(thePart: Partition, context: TaskContext): Iterator[T] = new NextIterator[T] + { --- End diff -- This brace on a line by itself looks a bit funny to me; maybe we could split at the argument list instead? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6322][SQL] CTAS should consider the cas...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/5014#discussion_r26997137 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -142,7 +142,7 @@ case class CreateTableAsSelect[T]( tableName: String, child: LogicalPlan, allowExisting: Boolean, -desc: Option[T] = None) extends UnaryNode { +desc: T) extends UnaryNode { --- End diff -- I think `CreateTableUsingAsSelect` is just for data source API? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org