[GitHub] spark pull request: [SPARK-11369] [ML] [R] SparkR glm should suppo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9331#issuecomment-151808021 **[Test build #44521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44521/consoleFull)** for PR 9331 at commit [`90927e6`](https://github.com/apache/spark/commit/90927e6e4cd46a6752fe4cdd7d1214112d218278). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43246322 --- Diff: yarn/src/test/scala/org/apache/spark/scheduler/cluster/SimpleExtensionService.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import java.util.concurrent.atomic.AtomicBoolean + + --- End diff -- got it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43246273 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala --- @@ -51,6 +51,38 @@ private[spark] abstract class YarnSchedulerBackend( private implicit val askTimeout = RpcUtils.askRpcTimeout(sc.conf) + /** Application ID. Must be set by a subclass before starting the service */ + private var appId: ApplicationId = null + + /** Attempt ID. This is unset for client-side schedulers */ + private var attemptId: Option[ApplicationAttemptId] = None + + /** Scheduler extension services */ + private val services: SchedulerExtensionServices = new SchedulerExtensionServices() + + /** +* Bind to YARN. This *must* be done before calling [[start()]]. +* +* @param appId YARN application ID +* @param attemptId Optional YARN attempt ID +*/ + protected def bindToYARN(appId: ApplicationId, attemptId: Option[ApplicationAttemptId]): Unit = { --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11315] [YARN] WiP Add YARN extension se...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8744#issuecomment-151827220 **[Test build #44525 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44525/consoleFull)** for PR 8744 at commit [`e89959c`](https://github.com/apache/spark/commit/e89959cb7592e92b4306e357dc200d259ede814d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/9326#issuecomment-151837034 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11188][SQL][WIP] Elide stacktraces in b...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/9194#issuecomment-151842731 As a side note, I think the target user of `bin/spark-sql` is probably less advanced and thus all the info logs probably aren't that useful. It doesn't have to be in this PR, but I'd be generally supportive of having a different default log4j config that is used for this binary logs to the console at warning instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Header formatting fix
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/9312#issuecomment-151847282 Hm, OK. The thing is (unfortunately) a fair number of the copyright headers aren't strictly correctly formatted. Many start with javadoc-style opening for example. Functionally it makes no difference and the automatic copyright checker will deal with it all. So it doesn't really seem worth fixing this up everywhere. I can maybe see fixing this if making changes to the header of the file otherwise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-11252][network]ShuffleClient should rel...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9227#issuecomment-151819613 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44517/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43246543 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala --- @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import java.util.concurrent.atomic.AtomicBoolean + +import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, ApplicationId} + +import org.apache.spark.util.Utils +import org.apache.spark.{Logging, SparkContext} + +/** + * An extension service that can be loaded into a Spark YARN scheduler. + * A Service that can be started and stopped + * + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ +trait SchedulerExtensionService { + + /** + * Start the extension service. This should be a no-op if + * called more than once. + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit + + /** + * Stop the service + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ + def stop(): Unit +} + +/** + * Binding information for a [[SchedulerExtensionService]] + * @param sparkContext current spark context + * @param applicationId YARN application ID + * @param attemptId optional AttemptID. + */ +case class SchedulerExtensionServiceBinding( +sparkContext: SparkContext, +applicationId: ApplicationId, +attemptId: Option[ApplicationAttemptId] = None) + +/** + * Container for [[SchedulerExtensionService]] instances. + * + * Loads Extension Services from the configuration property + * `"spark.yarn.services"`, instantiates and starts them. + * When stopped, it stops all child entries. + * + * The order in which child extension services are started and stopped + * is undefined. + * + */ +private[spark] class SchedulerExtensionServices extends SchedulerExtensionService +with Logging { + private var services: List[SchedulerExtensionService] = Nil + private var sparkContext: SparkContext = _ + private var appId: ApplicationId = _ + private var attemptId: Option[ApplicationAttemptId] = _ + private val started = new AtomicBoolean(false) + private var binding: SchedulerExtensionServiceBinding = _ + + /** + * Binding operation will load the named services and call bind on them too; the + * entire set of services are then ready for `init()` and `start()` calls + + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit = { +if (started.getAndSet(true)) { + logWarning("Ignoring re-entrant start operation") + return +} +require(binding.sparkContext != null, "Null context parameter") +require(binding.applicationId != null, "Null appId parameter") +this.binding = binding --- End diff -- OK, saving binding as a field; converting the others to local vars. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-151827062 **[Test build #44526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44526/consoleFull)** for PR 5423 at commit [`2c1db93`](https://github.com/apache/spark/commit/2c1db93bb1fe72a03e4b866741b6b803b30bb2b3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9182#issuecomment-151827066 **[Test build #44524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44524/consoleFull)** for PR 9182 at commit [`a4358d5`](https://github.com/apache/spark/commit/a4358d5b23dd2d7db706574124e4c69d1171ffb4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8835][Streaming] Provide pluggable Cong...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9200#issuecomment-151832701 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8835][Streaming] Provide pluggable Cong...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9200#issuecomment-151832617 **[Test build #44520 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44520/consoleFull)** for PR 9200 at commit [`b2dd6b8`](https://github.com/apache/spark/commit/b2dd6b87865eed5519d8ad278e09ba17c1334c6c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `class BernoulliSampler[T: ClassTag](fraction: Double,`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11073] [core] [yarn] Remove akka depend...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/9257#discussion_r43251769 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -365,33 +366,37 @@ private[spark] class SecurityManager(sparkConf: SparkConf) * we throw an exception. */ private def generateSecretKey(): String = { -if (!isAuthenticationEnabled) return null -// first check to see if the secret is already set, else generate a new one if on yarn -val sCookie = if (SparkHadoopUtil.get.isYarnMode) { - val secretKey = SparkHadoopUtil.get.getSecretKeyFromUserCredentials(sparkSecretLookupKey) - if (secretKey != null) { -logDebug("in yarn mode, getting secret from credentials") -return new Text(secretKey).toString +if (!isAuthenticationEnabled) { + null +} else if (SparkHadoopUtil.get.isYarnMode) { + // In YARN mode, the secure cookie will be created by the driver and stashed in the + // user's credentials, where executors can get it. The check for an array of size 0 + // is because of the test code in YarnSparkHadoopUtilSuite. + val secretKey = SparkHadoopUtil.get.getSecretKeyFromUserCredentials(SECRET_LOOKUP_KEY) + if (secretKey == null || secretKey.length == 0) { +val rnd = new SecureRandom() +val length = sparkConf.getInt("spark.authenticate.secretBitLength", 256) / 8 +val secret = new Array[Byte](length) +rnd.nextBytes(secret) + +val cookie = HashCodes.fromBytes(secret).toString() + SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, cookie) +cookie } else { -logDebug("getSecretKey: yarn mode, secret key from credentials is null") --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9326#issuecomment-151839311 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43252081 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala --- @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import java.util.concurrent.atomic.AtomicBoolean + +import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, ApplicationId} + +import org.apache.spark.util.Utils +import org.apache.spark.{Logging, SparkContext} + +/** + * An extension service that can be loaded into a Spark YARN scheduler. + * A Service that can be started and stopped + * + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ +trait SchedulerExtensionService { + + /** + * Start the extension service. This should be a no-op if + * called more than once. + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit + + /** + * Stop the service + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ + def stop(): Unit +} + +/** + * Binding information for a [[SchedulerExtensionService]] + * @param sparkContext current spark context + * @param applicationId YARN application ID + * @param attemptId optional AttemptID. + */ +case class SchedulerExtensionServiceBinding( +sparkContext: SparkContext, +applicationId: ApplicationId, +attemptId: Option[ApplicationAttemptId] = None) --- End diff -- When will this be not set? I assume in client mode? Could you mention that in the scaladoc above? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9326#issuecomment-151841140 **[Test build #44527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44527/consoleFull)** for PR 9326 at commit [`402d8e4`](https://github.com/apache/spark/commit/402d8e495d0fec01c3b7bb7fc8dcdf4efa56d1d2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11369] [ML] [R] SparkR glm should suppo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9331#issuecomment-151824901 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44521/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11369] [ML] [R] SparkR glm should suppo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9331#issuecomment-151824900 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11295 Add packages to JUnit output for P...
Github user gliptak commented on the pull request: https://github.com/apache/spark/pull/9263#issuecomment-151833067 This last run had a different failure than the previous run with the same code ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43253567 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala --- @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import java.util.concurrent.atomic.AtomicBoolean + +import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, ApplicationId} + +import org.apache.spark.util.Utils +import org.apache.spark.{Logging, SparkContext} + +/** + * An extension service that can be loaded into a Spark YARN scheduler. + * A Service that can be started and stopped + * + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ +trait SchedulerExtensionService { + + /** + * Start the extension service. This should be a no-op if + * called more than once. + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit + + /** + * Stop the service + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ + def stop(): Unit +} + +/** + * Binding information for a [[SchedulerExtensionService]] + * @param sparkContext current spark context + * @param applicationId YARN application ID + * @param attemptId optional AttemptID. + */ +case class SchedulerExtensionServiceBinding( +sparkContext: SparkContext, +applicationId: ApplicationId, +attemptId: Option[ApplicationAttemptId] = None) + +/** + * Container for [[SchedulerExtensionService]] instances. + * + * Loads Extension Services from the configuration property + * `"spark.yarn.services"`, instantiates and starts them. + * When stopped, it stops all child entries. + * + * The order in which child extension services are started and stopped + * is undefined. + * + */ +private[spark] class SchedulerExtensionServices extends SchedulerExtensionService +with Logging { + private var services: List[SchedulerExtensionService] = Nil + private val started = new AtomicBoolean(false) + private var binding: SchedulerExtensionServiceBinding = _ + + /** + * Binding operation will load the named services and call bind on them too; the + * entire set of services are then ready for `init()` and `start()` calls + + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit = { +if (started.getAndSet(true)) { + logWarning("Ignoring re-entrant start operation") + return +} +require(binding.sparkContext != null, "Null context parameter") +require(binding.applicationId != null, "Null appId parameter") +this.binding = binding +val sparkContext = binding.sparkContext +val appId = binding.applicationId +val attemptId = binding.attemptId +logInfo(s"Starting Yarn extension services with app ${binding.applicationId}" + + s" and attemptId $attemptId") + +services = sparkContext.getConf.getOption(SchedulerExtensionServices.SPARK_YARN_SERVICES) + .map { s => + s.split(",").map(_.trim()).filter(!_.isEmpty) +.map { sClass => + val instance = Utils.classForName(sClass) +.newInstance() +.asInstanceOf[SchedulerExtensionService] + // bind this service + instance.start(binding) + logInfo(s"Service $sClass started") + instance +} +}.map(_.toList).getOrElse(Nil) --- End diff -- minor: instead of another call to `map` you could add the `toList` call to the code inside the previous closure. --- If your project is set up for it, you can reply to this
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43254205 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala --- @@ -51,6 +51,38 @@ private[spark] abstract class YarnSchedulerBackend( private implicit val askTimeout = RpcUtils.askRpcTimeout(sc.conf) + /** Application ID. Must be set by a subclass before starting the service */ + private var appId: ApplicationId = null + + /** Attempt ID. This is unset for client-side schedulers */ + private var attemptId: Option[ApplicationAttemptId] = None + + /** Scheduler extension services */ + private val services: SchedulerExtensionServices = new SchedulerExtensionServices() + + /** +* Bind to YARN. This *must* be done before calling [[start()]]. +* +* @param appId YARN application ID +* @param attemptId Optional YARN attempt ID +*/ + protected def bindToYarn(appId: ApplicationId, attemptId: Option[ApplicationAttemptId]): Unit = { +this.appId = appId +this.attemptId = attemptId + } + + override def start() { +require(appId != null, "application ID unset") +val binding = SchedulerExtensionServiceBinding(sc, appId, attemptId) +services.start(binding) +super.start() + } + + override def stop(): Unit = { +super.stop() --- End diff -- super minor, but maybe do a try..finally here just in case `super.stop()` throws? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43254282 --- Diff: yarn/src/test/scala/org/apache/spark/scheduler/cluster/ExtensionServiceIntegrationSuite.scala --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import org.scalatest.BeforeAndAfter + +import org.apache.spark.{Logging, SparkConf, SparkContext, SparkFunSuite} + +/** + * Test the integration with [[SchedulerExtensionServices]] + */ +class ExtensionServiceIntegrationSuite extends SparkFunSuite + with BeforeAndAfter + with Logging { + + val applicationId = new StubApplicationId(0, L) + val attemptId = new StubApplicationAttemptId(applicationId, 1) + var sparkCtx: SparkContext = _ + + /* + * Setup phase creates the spark context + */ + before { +val sparkConf = new SparkConf() +sparkConf.set(SchedulerExtensionServices.SPARK_YARN_SERVICES, + "org.apache.spark.scheduler.cluster.SimpleExtensionService") --- End diff -- nit: `classOf[SimpleExtensionService].getName()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11370][SQL] fix a bug in GroupedIterato...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9330#issuecomment-151850104 **[Test build #44529 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44529/consoleFull)** for PR 9330 at commit [`9282e48`](https://github.com/apache/spark/commit/9282e488d58859a473e1413a611719829846971a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11210][SPARKR] Add window functions int...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9196#issuecomment-151810105 **[Test build #44523 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44523/consoleFull)** for PR 9196 at commit [`b52a98d`](https://github.com/apache/spark/commit/b52a98d75b340e0f8d290deae528057bb5d28738). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43246410 --- Diff: yarn/src/test/scala/org/apache/spark/scheduler/cluster/StubApplicationAttemptId.scala --- @@ -0,0 +1,50 @@ +/* --- End diff -- I'm using them more in the tests in the later patches. I can (and will) move them into the test helper, but be assured, there's a lot more tests to come. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11370][SQL] fix a bug in GroupedIterato...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9330#discussion_r43249575 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala --- @@ -96,46 +97,64 @@ class GroupedIterator private( ret } - def fetchNextGroupIterator(): Boolean = { -if (currentRow != null || input.hasNext) { - val inputIterator = new Iterator[InternalRow] { -// Return true if we have a row and it is in the current group, or if fetching a new row is -// successful. -def hasNext = { - (currentRow != null && keyOrdering.compare(currentGroup, currentRow) == 0) || -fetchNextRowInGroup() -} + private def fetchNextGroupIterator(): Boolean = { +assert(currentIterator eq null) + +if (currentRow.eq(null) && input.hasNext) { + currentRow = input.next() +} + +if (currentRow eq null) { + // These is no data left, return false. + false +} else { + // Skip to next group. + while (input.hasNext && keyOrdering.compare(currentGroup, currentRow) == 0) { +currentRow = input.next() + } + + if (keyOrdering.compare(currentGroup, currentRow) == 0) { +// These is no more group. return false. --- End diff -- nit: "there" or maybe more clearly "we are no longer in the current group, return false." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11313][SQL] implement cogroup on DataSe...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/9324#issuecomment-151836209 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/9232#discussion_r43251011 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala --- @@ -245,4 +247,55 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging System.clearProperty("SPARK_YARN_MODE") } } + + test("Obtain tokens For HiveMetastore") { +val hadoopConf = new Configuration() +hadoopConf.set("hive.metastore.kerberos.principal", "bob") +// thrift picks up on port 0 and bails out, without trying to talk to endpoint +hadoopConf.set("hive.metastore.uris", "http://localhost:0;) +val util = new YarnSparkHadoopUtil +val e = intercept[InvocationTargetException] { + util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice") +} +assertNestedHiveException(e) +// expect exception trapping code to unwind this hive-side exception +assertNestedHiveException(intercept[InvocationTargetException] { + util.obtainTokenForHiveMetastore(hadoopConf) +}) + } + + def assertNestedHiveException(e: InvocationTargetException): Throwable = { +val inner = e.getCause +if (inner == null) { + fail("No inner cause", e) +} +if (!inner.isInstanceOf[HiveException]) { + fail(s"Not a hive exception", inner) +} +inner + } + + test("handleTokenIntrospectionFailure") { +val util = new YarnSparkHadoopUtil +// downgraded exceptions +util.handleTokenIntrospectionFailure("hive", new ClassNotFoundException("cnfe")) --- End diff -- I think that because there's really only one exception that's currently interesting, you need more code to implement this "shared policy" approach than just catching the one interesting exception in each call site. It's true that if you need to modify the policy you'd need you'd need to duplicate code (or switch to your current approach), but then do you envision needing to do that? What if the policy for each service needs to be different? Personally I think that the current approach is a little confusing for someone reading the code (and inconsistent; for example the current code catches `Exception` and then feeds it to a method that matches on `Throwable`), and because the policy is so simple, the sharing argument doesn't justify making the code harder to follow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11313][SQL] implement cogroup on DataSe...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9324 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11073] [core] [yarn] Remove akka depend...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/9257#discussion_r43251680 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -365,33 +366,37 @@ private[spark] class SecurityManager(sparkConf: SparkConf) * we throw an exception. */ private def generateSecretKey(): String = { -if (!isAuthenticationEnabled) return null -// first check to see if the secret is already set, else generate a new one if on yarn -val sCookie = if (SparkHadoopUtil.get.isYarnMode) { - val secretKey = SparkHadoopUtil.get.getSecretKeyFromUserCredentials(sparkSecretLookupKey) - if (secretKey != null) { -logDebug("in yarn mode, getting secret from credentials") -return new Text(secretKey).toString +if (!isAuthenticationEnabled) { + null +} else if (SparkHadoopUtil.get.isYarnMode) { + // In YARN mode, the secure cookie will be created by the driver and stashed in the + // user's credentials, where executors can get it. The check for an array of size 0 + // is because of the test code in YarnSparkHadoopUtilSuite. + val secretKey = SparkHadoopUtil.get.getSecretKeyFromUserCredentials(SECRET_LOOKUP_KEY) + if (secretKey == null || secretKey.length == 0) { +val rnd = new SecureRandom() +val length = sparkConf.getInt("spark.authenticate.secretBitLength", 256) / 8 +val secret = new Array[Byte](length) +rnd.nextBytes(secret) + +val cookie = HashCodes.fromBytes(secret).toString() + SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, cookie) +cookie } else { -logDebug("getSecretKey: yarn mode, secret key from credentials is null") --- End diff -- I'd prefer to see this one left. Otherwise there is no easy way to see what its doing for the secret. In general I'm against removing debug stuff unless its really really noisy. This should only be printed once and can be useful debugging user settings or issues with secrets. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11369] [ML] [R] SparkR glm should suppo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9331#issuecomment-151806056 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11369] [ML] [R] SparkR glm should suppo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9331#issuecomment-151806066 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8835][Streaming] Provide pluggable Cong...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9200#issuecomment-151804357 **[Test build #44520 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44520/consoleFull)** for PR 9200 at commit [`b2dd6b8`](https://github.com/apache/spark/commit/b2dd6b87865eed5519d8ad278e09ba17c1334c6c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43244486 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala --- @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import java.util.concurrent.atomic.AtomicBoolean + +import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, ApplicationId} + +import org.apache.spark.util.Utils +import org.apache.spark.{Logging, SparkContext} + +/** + * An extension service that can be loaded into a Spark YARN scheduler. + * A Service that can be started and stopped + * + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ +trait SchedulerExtensionService { + + /** + * Start the extension service. This should be a no-op if + * called more than once. + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit + + /** + * Stop the service + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ + def stop(): Unit +} + +/** + * Binding information for a [[SchedulerExtensionService]] + * @param sparkContext current spark context + * @param applicationId YARN application ID + * @param attemptId optional AttemptID. + */ +case class SchedulerExtensionServiceBinding( +sparkContext: SparkContext, +applicationId: ApplicationId, +attemptId: Option[ApplicationAttemptId] = None) + +/** + * Container for [[SchedulerExtensionService]] instances. + * + * Loads Extension Services from the configuration property + * `"spark.yarn.services"`, instantiates and starts them. + * When stopped, it stops all child entries. + * + * The order in which child extension services are started and stopped + * is undefined. + * + */ +private[spark] class SchedulerExtensionServices extends SchedulerExtensionService +with Logging { + private var services: List[SchedulerExtensionService] = Nil + private var sparkContext: SparkContext = _ + private var appId: ApplicationId = _ + private var attemptId: Option[ApplicationAttemptId] = _ + private val started = new AtomicBoolean(false) + private var binding: SchedulerExtensionServiceBinding = _ + + /** + * Binding operation will load the named services and call bind on them too; the + * entire set of services are then ready for `init()` and `start()` calls + + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit = { +if (started.getAndSet(true)) { + logWarning("Ignoring re-entrant start operation") + return +} +require(binding.sparkContext != null, "Null context parameter") +require(binding.applicationId != null, "Null appId parameter") +this.binding = binding +sparkContext = binding.sparkContext +appId = binding.applicationId +attemptId = binding.attemptId +logInfo(s"Starting Yarn extension services with app ${binding.applicationId}" + +s" and attemptId $attemptId") --- End diff -- fixed, +lines directly below --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11369] [ML] [R] SparkR glm should suppo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9331#issuecomment-151824678 **[Test build #44521 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44521/consoleFull)** for PR 9331 at commit [`90927e6`](https://github.com/apache/spark/commit/90927e6e4cd46a6752fe4cdd7d1214112d218278). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9326#issuecomment-151839373 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11370][SQL] fix a bug in GroupedIterato...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9330#issuecomment-151847794 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10185] [SQL] Feat sql comma separated p...
Github user edvald commented on the pull request: https://github.com/apache/spark/pull/8416#issuecomment-151805526 Hey all. Just ran into this bug when upgrading to 1.5.1, very glad it was resolved! That said, I may not be able to run the updated code in my scenario - is there a suggested workaround for 1.5.x for loading multiple files, instead of using comma separated paths? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9232#issuecomment-151808462 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9232#issuecomment-151808446 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11370][SQL] fix a bug in GroupedIterato...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9330#discussion_r43249637 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/GroupedIteratorSuite.scala --- @@ -0,0 +1,65 @@ +package org.apache.spark.sql.execution --- End diff -- Need to add the apache header --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11073] [core] [yarn] Remove akka depend...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9257#issuecomment-151841537 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43252317 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala --- @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import java.util.concurrent.atomic.AtomicBoolean + +import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, ApplicationId} + +import org.apache.spark.util.Utils +import org.apache.spark.{Logging, SparkContext} + +/** + * An extension service that can be loaded into a Spark YARN scheduler. + * A Service that can be started and stopped + * + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ +trait SchedulerExtensionService { + + /** + * Start the extension service. This should be a no-op if + * called more than once. + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit + + /** + * Stop the service + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ + def stop(): Unit +} + +/** + * Binding information for a [[SchedulerExtensionService]] + * @param sparkContext current spark context + * @param applicationId YARN application ID + * @param attemptId optional AttemptID. + */ +case class SchedulerExtensionServiceBinding( +sparkContext: SparkContext, +applicationId: ApplicationId, +attemptId: Option[ApplicationAttemptId] = None) + +/** + * Container for [[SchedulerExtensionService]] instances. + * + * Loads Extension Services from the configuration property + * `"spark.yarn.services"`, instantiates and starts them. + * When stopped, it stops all child entries. + * + * The order in which child extension services are started and stopped + * is undefined. + * + */ +private[spark] class SchedulerExtensionServices extends SchedulerExtensionService +with Logging { + private var services: List[SchedulerExtensionService] = Nil + private val started = new AtomicBoolean(false) + private var binding: SchedulerExtensionServiceBinding = _ + + /** + * Binding operation will load the named services and call bind on them too; the + * entire set of services are then ready for `init()` and `start()` calls + + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit = { +if (started.getAndSet(true)) { + logWarning("Ignoring re-entrant start operation") + return +} +require(binding.sparkContext != null, "Null context parameter") +require(binding.applicationId != null, "Null appId parameter") +this.binding = binding +val sparkContext = binding.sparkContext +val appId = binding.applicationId +val attemptId = binding.attemptId +logInfo(s"Starting Yarn extension services with app ${binding.applicationId}" + + s" and attemptId $attemptId") + +services = sparkContext.getConf.getOption(SchedulerExtensionServices.SPARK_YARN_SERVICES) + .map { s => + s.split(",").map(_.trim()).filter(!_.isEmpty) --- End diff -- nit: indentation here is weird. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11073] [core] [yarn] Remove akka depend...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9257#issuecomment-151841517 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43253907 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala --- @@ -51,6 +51,38 @@ private[spark] abstract class YarnSchedulerBackend( private implicit val askTimeout = RpcUtils.askRpcTimeout(sc.conf) + /** Application ID. Must be set by a subclass before starting the service */ + private var appId: ApplicationId = null + + /** Attempt ID. This is unset for client-side schedulers */ --- End diff -- nit: "client mode schedulers" is more in line with how the rest of code refers to things. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/9182#issuecomment-151845427 Looks OK to me, mostly just style nits. Also, needs a rebase. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11210][SPARKR] Add window functions int...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9196#issuecomment-151809495 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11210][SPARKR] Add window functions int...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9196#issuecomment-151809518 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11210][SPARKR] Add window functions int...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/9196#discussion_r43243744 --- Diff: R/pkg/R/functions.R --- @@ -2111,3 +2133,66 @@ setMethod("ntile", jc <- callJStatic("org.apache.spark.sql.functions", "ntile", as.integer(x)) column(jc) }) + +#' percentRank +#' +#' Window function: returns the relative rank (i.e. percentile) of rows within a window partition. +#' +#' This is computed by: +#' +#' (rank of row in its partition - 1) / (number of rows in the partition - 1) +#' +#' This is equivalent to the PERCENT_RANK function in SQL. +#' +#' @rdname percentRank +#' @name percentRank +#' @family window_funcs +#' @export +#' @examples \dontrun{percentRank()} +setMethod("percentRank", + signature(x = "missing"), + function() { +jc <- callJStatic("org.apache.spark.sql.functions", "percentRank") +column(jc) + }) + +#' rank +#' +#' Window function: returns the rank of rows within a window partition. +#' +#' The difference between rank and denseRank is that denseRank leaves no gaps in ranking +#' sequence when there are ties. That is, if you were ranking a competition using denseRank +#' and had three people tie for second place, you would say that all three were in second +#' place and that the next person came in third. +#' +#' This is equivalent to the RANK function in SQL. +#' +#' @rdname rank +#' @name rank +#' @family window_funcs +#' @export +#' @examples \dontrun{rank()} +setMethod("rank", --- End diff -- Since base::rank() has a different signature with this rank(), it is possible to expose both of them under the same name rank(). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11210][SPARKR] Add window functions int...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9196#issuecomment-151819415 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11210][SPARKR] Add window functions int...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9196#issuecomment-151819418 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44523/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11210][SPARKR] Add window functions int...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9196#issuecomment-151819173 **[Test build #44523 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44523/consoleFull)** for PR 9196 at commit [`b52a98d`](https://github.com/apache/spark/commit/b52a98d75b340e0f8d290deae528057bb5d28738). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-11252][network]ShuffleClient should rel...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9227#issuecomment-151819610 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [spark-11252][network]ShuffleClient should rel...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9227#issuecomment-151819457 **[Test build #44517 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44517/consoleFull)** for PR 9227 at commit [`f6a2c01`](https://github.com/apache/spark/commit/f6a2c01bb06c03a31f5efce5d7bb634ad364d775). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-151825740 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11315] [YARN] WiP Add YARN extension se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8744#issuecomment-151825768 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11370][SQL] fix a bug in GroupedIterato...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9330#discussion_r43249449 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala --- @@ -83,11 +83,12 @@ class GroupedIterator private( /** Holds a copy of an input row that is in the current group. */ var currentGroup = currentRow.copy() - var currentIterator: Iterator[InternalRow] = null + assert(keyOrdering.compare(currentGroup, currentRow) == 0) --- End diff -- This is the whole row, not just the key. This allows us to do the equality check on the key columns only (which might short circuit) instead of doing a full projection on each row to extract the key columns. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11370][SQL] fix a bug in GroupedIterato...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9330#discussion_r43249474 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala --- @@ -83,11 +83,12 @@ class GroupedIterator private( /** Holds a copy of an input row that is in the current group. */ var currentGroup = currentRow.copy() - var currentIterator: Iterator[InternalRow] = null + assert(keyOrdering.compare(currentGroup, currentRow) == 0) + var currentIterator = createGroupValuesIterator() // Return true if we already have the next iterator or fetching a new iterator is successful. - def hasNext: Boolean = currentIterator != null || fetchNextGroupIterator + def hasNext: Boolean = currentIterator.ne(null) || fetchNextGroupIterator --- End diff -- I think these are the same, and I prefer the idiomatic version. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11295 Add packages to JUnit output for P...
Github user gliptak commented on a diff in the pull request: https://github.com/apache/spark/pull/9263#discussion_r43250312 --- Diff: python/pyspark/mllib/tests.py --- @@ -76,7 +76,8 @@ pass ser = PickleSerializer() -sc = SparkContext('local[4]', "MLlib tests") +conf = SparkConf().set("spark.driver.allowMultipleContexts", "true") --- End diff -- Reviewing the tests.py-s https://github.com/apache/spark/blob/master/python/pyspark/streaming/tests.py initiates SparkContext differently: ``` @classmethod def setUpClass(cls): class_name = cls.__name__ conf = SparkConf().set("spark.default.parallelism", 1) cls.sc = SparkContext(appName=class_name, conf=conf) cls.sc.setCheckpointDir("/tmp") @classmethod def tearDownClass(cls): cls.sc.stop() # Clean up in the JVM just in case there has been some issues in Python API try: jSparkContextOption = SparkContext._jvm.SparkContext.get() if jSparkContextOption.nonEmpty(): jSparkContextOption.get().stop() except: pass ``` Could this approach be retrofitted into https://github.com/apache/spark/blob/master/python/pyspark/mllib/tests.py to allow for concurrency? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43251937 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala --- @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import java.util.concurrent.atomic.AtomicBoolean + +import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, ApplicationId} + +import org.apache.spark.util.Utils +import org.apache.spark.{Logging, SparkContext} --- End diff -- nit: this goes before the previous import --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43252433 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala --- @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import java.util.concurrent.atomic.AtomicBoolean + +import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, ApplicationId} + +import org.apache.spark.util.Utils +import org.apache.spark.{Logging, SparkContext} + +/** + * An extension service that can be loaded into a Spark YARN scheduler. + * A Service that can be started and stopped + * + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ +trait SchedulerExtensionService { + + /** + * Start the extension service. This should be a no-op if + * called more than once. + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit + + /** + * Stop the service + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ + def stop(): Unit +} + +/** + * Binding information for a [[SchedulerExtensionService]] + * @param sparkContext current spark context + * @param applicationId YARN application ID + * @param attemptId optional AttemptID. + */ +case class SchedulerExtensionServiceBinding( +sparkContext: SparkContext, +applicationId: ApplicationId, +attemptId: Option[ApplicationAttemptId] = None) + +/** + * Container for [[SchedulerExtensionService]] instances. + * + * Loads Extension Services from the configuration property + * `"spark.yarn.services"`, instantiates and starts them. + * When stopped, it stops all child entries. + * + * The order in which child extension services are started and stopped + * is undefined. + * + */ +private[spark] class SchedulerExtensionServices extends SchedulerExtensionService +with Logging { + private var services: List[SchedulerExtensionService] = Nil + private val started = new AtomicBoolean(false) + private var binding: SchedulerExtensionServiceBinding = _ + + /** + * Binding operation will load the named services and call bind on them too; the + * entire set of services are then ready for `init()` and `start()` calls + + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit = { +if (started.getAndSet(true)) { + logWarning("Ignoring re-entrant start operation") + return +} +require(binding.sparkContext != null, "Null context parameter") +require(binding.applicationId != null, "Null appId parameter") +this.binding = binding +val sparkContext = binding.sparkContext +val appId = binding.applicationId +val attemptId = binding.attemptId +logInfo(s"Starting Yarn extension services with app ${binding.applicationId}" + + s" and attemptId $attemptId") + +services = sparkContext.getConf.getOption(SchedulerExtensionServices.SPARK_YARN_SERVICES) + .map { s => + s.split(",").map(_.trim()).filter(!_.isEmpty) +.map { sClass => + val instance = Utils.classForName(sClass) +.newInstance() --- End diff -- Hmmm... `SchedulerExtensionServices` should probably mention that implementations must have an empty constructor. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43254504 --- Diff: yarn/src/test/scala/org/apache/spark/scheduler/cluster/StubApplicationAttemptId.scala --- @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, ApplicationId} + + --- End diff -- nit: nuke the extra empty lines. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11370][SQL] fix a bug in GroupedIterato...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9330#issuecomment-151847822 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11369] [ML] [R] SparkR glm should suppo...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/9331 [SPARK-11369] [ML] [R] SparkR glm should support setting standardize SparkR glm currently support : ```formula, family = c(âgaussianâ, âbinomialâ), data, lambda = 0, alpha = 0``` We should also support setting standardize which has been defined at [design documentation](https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit) You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-11369 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9331.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9331 commit 90927e6e4cd46a6752fe4cdd7d1214112d218278 Author: Yanbo LiangDate: 2015-10-28T11:12:42Z SparkR glm should support setting standardize --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9232#discussion_r43242989 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala --- @@ -245,4 +247,55 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging System.clearProperty("SPARK_YARN_MODE") } } + + test("Obtain tokens For HiveMetastore") { +val hadoopConf = new Configuration() +hadoopConf.set("hive.metastore.kerberos.principal", "bob") +// thrift picks up on port 0 and bails out, without trying to talk to endpoint +hadoopConf.set("hive.metastore.uris", "http://localhost:0;) +val util = new YarnSparkHadoopUtil +val e = intercept[InvocationTargetException] { + util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice") +} +assertNestedHiveException(e) +// expect exception trapping code to unwind this hive-side exception +assertNestedHiveException(intercept[InvocationTargetException] { + util.obtainTokenForHiveMetastore(hadoopConf) +}) + } + + def assertNestedHiveException(e: InvocationTargetException): Throwable = { +val inner = e.getCause +if (inner == null) { + fail("No inner cause", e) +} +if (!inner.isInstanceOf[HiveException]) { + fail(s"Not a hive exception", inner) +} +inner + } + + test("handleTokenIntrospectionFailure") { +val util = new YarnSparkHadoopUtil +// downgraded exceptions +util.handleTokenIntrospectionFailure("hive", new ClassNotFoundException("cnfe")) --- End diff -- As soon as this patch is in I'll turn to [SPARK-11317](https://issues.apache.org/jira/browse/SPARK-11317), which is essentially "apply the same catching, filtering and reporting strategy for HBase tokens as for Hive ones". It's not as critical as this one (token retrieval is working), but as nothing gets logged except "InvocationTargetException" with no stack trace, trying to recognise the issue is a Kerberos auth problem, let alone trying to fix it, is a weekend's effort, rather than 20 minutes worth. Because the policy goes in both places, having it separate and re-usable makes it a zero-cut-and-paste reuse, with that single test for failures without having to mock up failures across two separate clauses. And future maintenance costs are kept down if someone ever decides to change the policy again. Would you be happier if I cleaned up the HBase code as part of this same patch? Because I can and it will make the benefits of the factored out behaviour clearer. It's just messy to fix two things in one patch, especially if someone ever needs to play cherry pick or reverting games. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9232#issuecomment-151809791 **[Test build #44522 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44522/consoleFull)** for PR 9232 at commit [`217faba`](https://github.com/apache/spark/commit/217faba0d372ac66c57420372db62244e628da39). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43244882 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala --- @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import java.util.concurrent.atomic.AtomicBoolean + +import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, ApplicationId} + +import org.apache.spark.util.Utils +import org.apache.spark.{Logging, SparkContext} + +/** + * An extension service that can be loaded into a Spark YARN scheduler. + * A Service that can be started and stopped + * + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ +trait SchedulerExtensionService { + + /** + * Start the extension service. This should be a no-op if + * called more than once. + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit + + /** + * Stop the service + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ + def stop(): Unit +} + +/** + * Binding information for a [[SchedulerExtensionService]] + * @param sparkContext current spark context + * @param applicationId YARN application ID + * @param attemptId optional AttemptID. + */ +case class SchedulerExtensionServiceBinding( +sparkContext: SparkContext, +applicationId: ApplicationId, +attemptId: Option[ApplicationAttemptId] = None) + +/** + * Container for [[SchedulerExtensionService]] instances. + * + * Loads Extension Services from the configuration property + * `"spark.yarn.services"`, instantiates and starts them. + * When stopped, it stops all child entries. + * + * The order in which child extension services are started and stopped + * is undefined. + * + */ +private[spark] class SchedulerExtensionServices extends SchedulerExtensionService +with Logging { + private var services: List[SchedulerExtensionService] = Nil + private var sparkContext: SparkContext = _ + private var appId: ApplicationId = _ + private var attemptId: Option[ApplicationAttemptId] = _ + private val started = new AtomicBoolean(false) + private var binding: SchedulerExtensionServiceBinding = _ + + /** + * Binding operation will load the named services and call bind on them too; the + * entire set of services are then ready for `init()` and `start()` calls + + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit = { +if (started.getAndSet(true)) { + logWarning("Ignoring re-entrant start operation") + return +} +require(binding.sparkContext != null, "Null context parameter") +require(binding.applicationId != null, "Null appId parameter") +this.binding = binding +sparkContext = binding.sparkContext +appId = binding.applicationId +attemptId = binding.attemptId +logInfo(s"Starting Yarn extension services with app ${binding.applicationId}" + +s" and attemptId $attemptId") + +services = sparkContext.getConf.getOption(SchedulerExtensionServices.SPARK_YARN_SERVICES) +.map { s => + s.split(",").map(_.trim()).filter(!_.isEmpty) +.map { sClass => +val instance = Utils.classForName(sClass) +.newInstance() +.asInstanceOf[SchedulerExtensionService] +// bind this service +instance.start(binding) +logInfo(s"Service $sClass started") +instance + } +}.map(_.toList).getOrElse(Nil) + } + + /**
[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9232#issuecomment-151818616 **[Test build #44522 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44522/consoleFull)** for PR 9232 at commit [`217faba`](https://github.com/apache/spark/commit/217faba0d372ac66c57420372db62244e628da39). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * ` logInfo(s\"$service class not found $e\")`\n * ` logDebug(\"$service class not found\", e)`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9232#issuecomment-151818863 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44522/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9232#issuecomment-151818859 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11349] [ML] Support transform string la...
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/9302#issuecomment-151806442 I think further more we should provide a param named ```family``` for ```RFormula``` to indicate the estimator/model which will be applied to the dataframe transformed by this ```RFormula``` transformer, and then we can do more strict label validation check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43244977 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala --- @@ -0,0 +1,137 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import java.util.concurrent.atomic.AtomicBoolean + +import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, ApplicationId} + +import org.apache.spark.util.Utils +import org.apache.spark.{Logging, SparkContext} + +/** + * An extension service that can be loaded into a Spark YARN scheduler. + * A Service that can be started and stopped + * + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ +trait SchedulerExtensionService { + + /** + * Start the extension service. This should be a no-op if + * called more than once. + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit + + /** + * Stop the service + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ + def stop(): Unit +} + +/** + * Binding information for a [[SchedulerExtensionService]] + * @param sparkContext current spark context + * @param applicationId YARN application ID + * @param attemptId optional AttemptID. + */ +case class SchedulerExtensionServiceBinding( +sparkContext: SparkContext, +applicationId: ApplicationId, +attemptId: Option[ApplicationAttemptId] = None) + +/** + * Container for [[SchedulerExtensionService]] instances. + * + * Loads Extension Services from the configuration property + * `"spark.yarn.services"`, instantiates and starts them. + * When stopped, it stops all child entries. + * + * The order in which child extension services are started and stopped + * is undefined. + * + */ +private[spark] class SchedulerExtensionServices extends SchedulerExtensionService +with Logging { + private var services: List[SchedulerExtensionService] = Nil + private var sparkContext: SparkContext = _ + private var appId: ApplicationId = _ + private var attemptId: Option[ApplicationAttemptId] = _ + private val started = new AtomicBoolean(false) + private var binding: SchedulerExtensionServiceBinding = _ + + /** + * Binding operation will load the named services and call bind on them too; the + * entire set of services are then ready for `init()` and `start()` calls + + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit = { +if (started.getAndSet(true)) { + logWarning("Ignoring re-entrant start operation") + return +} +require(binding.sparkContext != null, "Null context parameter") +require(binding.applicationId != null, "Null appId parameter") +this.binding = binding +sparkContext = binding.sparkContext +appId = binding.applicationId +attemptId = binding.attemptId +logInfo(s"Starting Yarn extension services with app ${binding.applicationId}" + +s" and attemptId $attemptId") + +services = sparkContext.getConf.getOption(SchedulerExtensionServices.SPARK_YARN_SERVICES) +.map { s => + s.split(",").map(_.trim()).filter(!_.isEmpty) +.map { sClass => +val instance = Utils.classForName(sClass) +.newInstance() --- End diff -- I thought about that, but consider this: when would you want failure to load your listed extension services as something not to fail on? Do you want it to quitely downgrade, vs noisily fail? maybe we could make it an option --- If your project is set up for it,
[GitHub] spark pull request: [SPARK-11315] [YARN] WiP Add YARN extension se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8744#issuecomment-151825736 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9182#issuecomment-151825730 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-151825769 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9182#issuecomment-151825741 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8835][Streaming] Provide pluggable Cong...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9200#issuecomment-151832704 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44520/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11303] [SQL] filter should not be pushe...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/9294#issuecomment-151843230 I'm going to pick this into branch-1.5 too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11073] [core] [yarn] Remove akka depend...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9257#issuecomment-151843133 **[Test build #44528 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44528/consoleFull)** for PR 9257 at commit [`b4a29bf`](https://github.com/apache/spark/commit/b4a29bf56dbee1e60d36df8d2272e7bfc8794f3b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43253815 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala --- @@ -17,17 +17,17 @@ package org.apache.spark.scheduler.cluster -import scala.collection.mutable.ArrayBuffer -import scala.concurrent.{Future, ExecutionContext} +import scala.concurrent.{ExecutionContext, Future} +import scala.util.control.NonFatal + +import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, ApplicationId} -import org.apache.spark.{Logging, SparkContext} import org.apache.spark.rpc._ -import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages._ import org.apache.spark.scheduler._ +import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages._ import org.apache.spark.ui.JettyUtils -import org.apache.spark.util.{ThreadUtils, RpcUtils} - -import scala.util.control.NonFatal +import org.apache.spark.util.{RpcUtils, ThreadUtils} +import org.apache.spark.{Logging, SparkContext} --- End diff -- nit: move before previous import --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11370][SQL] fix a bug in GroupedIterato...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/9330#discussion_r43255063 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala --- @@ -83,11 +83,12 @@ class GroupedIterator private( /** Holds a copy of an input row that is in the current group. */ var currentGroup = currentRow.copy() - var currentIterator: Iterator[InternalRow] = null + assert(keyOrdering.compare(currentGroup, currentRow) == 0) --- End diff -- Ah, sorry I missed it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11376] [SQL] Removes duplicated `mutabl...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/9335 [SPARK-11376] [SQL] Removes duplicated `mutableRow` field This PR fixes a mistake in the code generated by `GenerateColumnAccessor`. Interestingly, although the code is illegal in Java (the class has two fields with the same name), Janino accepts it happily and accidentally works properly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark spark-11376.fix-generated-code Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9335.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9335 commit 06a56f293bf013f304aac1eee56c2fa3f2bf0f92 Author: Cheng LianDate: 2015-10-28T14:12:07Z Removes duplicated `mutableRow` field --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11376] [SQL] Removes duplicated `mutabl...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9335#issuecomment-151863211 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7842#issuecomment-151863159 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7842#issuecomment-151863260 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11378][STREAMING] make StreamingContext...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9336#issuecomment-151870654 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/9232#discussion_r43266858 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala --- @@ -245,4 +247,55 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging System.clearProperty("SPARK_YARN_MODE") } } + + test("Obtain tokens For HiveMetastore") { +val hadoopConf = new Configuration() +hadoopConf.set("hive.metastore.kerberos.principal", "bob") +// thrift picks up on port 0 and bails out, without trying to talk to endpoint +hadoopConf.set("hive.metastore.uris", "http://localhost:0;) +val util = new YarnSparkHadoopUtil +val e = intercept[InvocationTargetException] { + util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice") +} +assertNestedHiveException(e) +// expect exception trapping code to unwind this hive-side exception +assertNestedHiveException(intercept[InvocationTargetException] { + util.obtainTokenForHiveMetastore(hadoopConf) +}) + } + + def assertNestedHiveException(e: InvocationTargetException): Throwable = { +val inner = e.getCause +if (inner == null) { + fail("No inner cause", e) +} +if (!inner.isInstanceOf[HiveException]) { + fail(s"Not a hive exception", inner) +} +inner + } + + test("handleTokenIntrospectionFailure") { +val util = new YarnSparkHadoopUtil +// downgraded exceptions +util.handleTokenIntrospectionFailure("hive", new ClassNotFoundException("cnfe")) --- End diff -- BTW, if you really want to implement a shared policy, I'd recommend adding something like `scala.util.control.NonFatal`. That makes the exception handling cleaner; it would look more like this: try { // code that can throw } catch { case IgnorableException(e) => logDebug(...) } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9552] Add force control for killExecuto...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7888#issuecomment-151875843 **[Test build #44534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44534/consoleFull)** for PR 7888 at commit [`c23f887`](https://github.com/apache/spark/commit/c23f887b62a75415bab74036e78d03b92b1a5541). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-151876039 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1537] [YARN] [WiP] Add history provider...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-151875771 **[Test build #44526 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44526/consoleFull)** for PR 5423 at commit [`2c1db93`](https://github.com/apache/spark/commit/2c1db93bb1fe72a03e4b866741b6b803b30bb2b3). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9182#issuecomment-151877399 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11315] [YARN] WiP Add YARN extension se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8744#issuecomment-151878000 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44525/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11073] [core] [yarn] Remove akka depend...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9257#issuecomment-151880853 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7842#issuecomment-151883050 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11371 Make "mean" an alias for "avg" ope...
GitHub user ted-yu opened a pull request: https://github.com/apache/spark/pull/9332 SPARK-11371 Make "mean" an alias for "avg" operator From Reynold in the thread 'Exception when using some aggregate operators' (http://search-hadoop.com/m/q3RTt0xFr22nXB4/): I don't think these are bugs. The SQL standard for average is "avg", not "mean". Similarly, a distinct count is supposed to be written as "count(distinct col)", not "countDistinct(col)". We can, however, make "mean" an alias for "avg" to improve compatibility between DataFrame and SQL. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ted-yu/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9332.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9332 commit f1447f0cba860a84ed60929b4871936198fe4150 Author: tedyuDate: 2015-10-28T14:12:12Z SPARK-11371 Make "mean" an alias for "avg" operator --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11376] [SQL] Removes duplicated `mutabl...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/9335#issuecomment-151862392 cc @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7842#issuecomment-151866430 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44532/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8542][MLlib]PMML export for Decision Tr...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7842#issuecomment-151866425 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11378][STREAMING] make StreamingContext...
GitHub user manygrams opened a pull request: https://github.com/apache/spark/pull/9336 [SPARK-11378][STREAMING] make StreamingContext.awaitTerminationOrTimeout return properly This adds a failing test checking that `awaitTerminationOrTimeout` returns the expected value, and then fixes that failing test with the addition of a `return`. @tdas @zsxwing You can merge this pull request into a Git repository by running: $ git pull https://github.com/manygrams/spark fix_await_termination_or_timeout Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9336.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9336 commit 7bd9a3fc8593c8d1dce07a9223683bbb8d39cf10 Author: Nick EvansDate: 2015-10-28T14:40:41Z make StreamingContext.awaitTerminationOrTimeout return properly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org