[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/9253#issuecomment-152155372 Looks great! I look forward to getting this merged. Once you address the comments I will do so. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9253#issuecomment-152155531 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10986][Mesos] Set the context class loa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9282#issuecomment-152155500 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10986][Mesos] Set the context class loa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9282#issuecomment-152155517 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9253#issuecomment-152155505 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9287#issuecomment-152173539 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11400][SQL] BroadcastNestedLoopJoin sho...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9351#issuecomment-152173452 **[Test build #44587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44587/consoleFull)** for PR 9351 at commit [`16c5b89`](https://github.com/apache/spark/commit/16c5b8914a49eb2a55e68fe5cf7022a5fcee34fc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9287#issuecomment-152173505 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11400][SQL] BroadcastNestedLoopJoin sho...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9351#issuecomment-152173585 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44587/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9258#issuecomment-152178361 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11388][Build]Fix self closing tags.
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/9339#issuecomment-152178217 That's a known flaky pyspark test. Change LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10986][Mesos] Set the context class loa...
Github user dragos commented on the pull request: https://github.com/apache/spark/pull/9282#issuecomment-152180069 The serializer is delegating to the context class loader for instantiating classes it receives on the wire. When this class loader is missing (`null`), the JVM looks up the class in the *primordial* classloader, which usually contains only the JDK classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11393][SQL] CoGroupedIterator should re...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/9346#issuecomment-152157478 Maybe "not idempotent" is not a proper word to describe this problem, `GroupedIterator` has a special [constraint](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala#L93-L95) which is diffrent from normal iterator, and `CoGroupedIterator` breaks this constraint at the condition described in PR description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11393][SQL] CoGroupedIterator should re...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/9346#issuecomment-152158690 btw as https://github.com/apache/spark/pull/9330 has been merge, the problem is not generating an extra empty group but making the last group empty. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43377401 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import java.util.concurrent.atomic.AtomicBoolean + +import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, ApplicationId} + +import org.apache.spark.util.Utils +import org.apache.spark.{Logging, SparkContext} + +/** + * An extension service that can be loaded into a Spark YARN scheduler. + * A Service that can be started and stopped. + * + * 1. For implementations to be loadable by [[SchedulerExtensionServices]], + * they must provide an empty constructor. + * 2. The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ +trait SchedulerExtensionService { + + /** + * Start the extension service. This should be a no-op if + * called more than once. + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit + + /** + * Stop the service + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ + def stop(): Unit +} + +/** + * Binding information for a [[SchedulerExtensionService]]. + * + * The attempt ID will be set if the service is started within a YARN application master; + * there is then a different attempt ID for every time that AM is restarted. + * When the service binding is instantiated on a client, there's no attempt ID, as it lacks + * this information. + * @param sparkContext current spark context + * @param applicationId YARN application ID + * @param attemptId YARN attemptID -if known. + */ +case class SchedulerExtensionServiceBinding( +sparkContext: SparkContext, +applicationId: ApplicationId, +attemptId: Option[ApplicationAttemptId] = None) + +/** + * Container for [[SchedulerExtensionService]] instances. + * + * Loads Extension Services from the configuration property + * `"spark.yarn.services"`, instantiates and starts them. + * When stopped, it stops all child entries. + * + * The order in which child extension services are started and stopped + * is undefined. + * + */ +private[spark] class SchedulerExtensionServices extends SchedulerExtensionService +with Logging { + private var services: List[SchedulerExtensionService] = Nil + private val started = new AtomicBoolean(false) + private var binding: SchedulerExtensionServiceBinding = _ + + /** + * Binding operation will load the named services and call bind on them too; the + * entire set of services are then ready for `init()` and `start()` calls. + * + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit = { +if (started.getAndSet(true)) { + logWarning("Ignoring re-entrant start operation") + return +} +require(binding.sparkContext != null, "Null context parameter") +require(binding.applicationId != null, "Null appId parameter") +this.binding = binding +val sparkContext = binding.sparkContext +val appId = binding.applicationId +val attemptId = binding.attemptId +logInfo(s"Starting Yarn extension services with app ${binding.applicationId}" + + s" and attemptId $attemptId") + +services = sparkContext.getConf.getOption(SchedulerExtensionServices.SPARK_YARN_SERVICES) + .map { s => +s.split(",").map(_.trim()).filter(!_.isEmpty) + .map { sClass => +val instance = Utils.classForName(sClass) +
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9182#issuecomment-152164064 **[Test build #44593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44593/consoleFull)** for PR 9182 at commit [`8a6a1f1`](https://github.com/apache/spark/commit/8a6a1f13235fd00dcc58c4106b0314098f961e67). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10849][SQL] Adding field metadata prope...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9352#issuecomment-152169589 **[Test build #44586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44586/consoleFull)** for PR 9352 at commit [`4048c2d`](https://github.com/apache/spark/commit/4048c2dc5626e926a04774bffecaf7c6a6ac4cf7). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10849][SQL] Adding field metadata prope...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9352#issuecomment-152169728 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44586/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11400][SQL] BroadcastNestedLoopJoin sho...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9351#issuecomment-152173584 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10641][SQL] Add Skewness and Kurtosis S...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9003#issuecomment-152174392 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10641][SQL] Add Skewness and Kurtosis S...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9003#issuecomment-152174382 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9287#issuecomment-152176171 **[Test build #44596 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44596/consoleFull)** for PR 9287 at commit [`281d4f4`](https://github.com/apache/spark/commit/281d4f44f7c237a8a76db47ea61a4c981a28a409). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `class ExternalShuffleService(`\n * ` case class BoundPortsResponse(`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9287#issuecomment-152176180 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9287#issuecomment-152176182 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44596/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10749][MESOS] Support multiple roles wi...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/8872#issuecomment-152151363 @tnachen can you address the comments? I would like to get this merged. Also it's still failing style tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11338][WebUI] Prepend app links on Hist...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/9291#issuecomment-152151242 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11361][Streaming] Show scopes of RDD op...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/9315#issuecomment-152151178 @tdas This looks good. I just think the code can be simplified a little. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10986][Mesos] Set the context class loa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9282#issuecomment-152156815 **[Test build #44591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44591/consoleFull)** for PR 9282 at commit [`ec1c11b`](https://github.com/apache/spark/commit/ec1c11b6f599d950abb5c0496c2c85c5951f9fa7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/9057#issuecomment-152173191 @JasmineGeorge, it would be great if you can add a test for the validator to ensure the exported xml file can be loaded in JPMML and score the same results. Please use my latest branch https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class I renamed the datasets' names to be generic so that we can use them for different algorithms for example iris can be used for both kmeans and multiclass logistic regression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...
Github user jacek-lewandowski commented on the pull request: https://github.com/apache/spark/pull/9287#issuecomment-152174907 jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9340#issuecomment-152177498 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43374435 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala --- @@ -17,17 +17,17 @@ package org.apache.spark.scheduler.cluster -import scala.collection.mutable.ArrayBuffer -import scala.concurrent.{Future, ExecutionContext} +import scala.concurrent.{ExecutionContext, Future} +import scala.util.control.NonFatal + +import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, ApplicationId} -import org.apache.spark.{Logging, SparkContext} import org.apache.spark.rpc._ -import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages._ import org.apache.spark.scheduler._ +import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages._ import org.apache.spark.ui.JettyUtils -import org.apache.spark.util.{ThreadUtils, RpcUtils} - -import scala.util.control.NonFatal +import org.apache.spark.util.{RpcUtils, ThreadUtils} +import org.apache.spark.{Logging, SparkContext} --- End diff -- I know what's up. It's sorting alphabetically within a group, and comes `{` after the alphabet, so child packages come first. I'll review these things by hand & will have to do the same through the other patches. Something to call out on the spark style guide maybe âit does cover the IDEA import patterns, but not this quirk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11338][WebUI] Prepend app links on Hist...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9291#issuecomment-152153722 **[Test build #44589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44589/consoleFull)** for PR 9291 at commit [`8bcb3dc`](https://github.com/apache/spark/commit/8bcb3dc16dd07916ef829bceced46f1d436d1b10). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/9253#discussion_r43375184 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -706,6 +706,23 @@ abstract class RDD[T: ClassTag]( } /** + * Spark's internal mapPartitions method which skips closure cleaning. To be used carefully + * only if we are sure that the RDD elements are serializable and don't require closure + * cleaning + * + * `preservesPartitioning` indicates whether the input function preserves the partitioner, which --- End diff -- just use `@param` here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10986][Mesos] Set the context class loa...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/9282#issuecomment-152155075 Looks like this is a regression from 1.5.1 so we should definitely fix it. Even though this change is only one line it could change a lot of things. Can we verify that it doesn't cause any new regressions? @dragos can you explain to us the root cause of the issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/9253#issuecomment-152155205 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9552] Add force control for killExecuto...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/7888#issuecomment-152151508 @vanzin can you have a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9258#issuecomment-152152043 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9258#issuecomment-152151993 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11338][WebUI] Prepend app links on Hist...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9291#issuecomment-152152042 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11338][WebUI] Prepend app links on Hist...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9291#issuecomment-152151986 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11338][WebUI] Prepend app links on Hist...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9291#issuecomment-152180565 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11338][WebUI] Prepend app links on Hist...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9291#issuecomment-152180570 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44589/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11403] Log something when killing execu...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/9355#discussion_r43390028 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala --- @@ -238,7 +238,7 @@ object YarnSparkHadoopUtil { if (Utils.isWindows) { escapeForShell("-XX:OnOutOfMemoryError=taskkill /F /PID p") } else { - "-XX:OnOutOfMemoryError='kill %p'" + "-XX:OnOutOfMemoryError='echo OnOutOfMemoryError; kill %p'" --- End diff -- Does this require `bash` to interpret, and do we know the JVM would execute the command in a shell? if you're tested this and it works, OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/9258#issuecomment-152151854 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43373740 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/SchedulerExtensionService.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler.cluster + +import java.util.concurrent.atomic.AtomicBoolean + +import org.apache.hadoop.yarn.api.records.{ApplicationAttemptId, ApplicationId} + +import org.apache.spark.util.Utils +import org.apache.spark.{Logging, SparkContext} + +/** + * An extension service that can be loaded into a Spark YARN scheduler. + * A Service that can be started and stopped. + * + * 1. For implementations to be loadable by [[SchedulerExtensionServices]], + * they must provide an empty constructor. + * 2. The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ +trait SchedulerExtensionService { + + /** + * Start the extension service. This should be a no-op if + * called more than once. + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit + + /** + * Stop the service + * The `stop()` operation MUST be idempotent, and succeed even if `start()` was + * never invoked. + */ + def stop(): Unit +} + +/** + * Binding information for a [[SchedulerExtensionService]]. + * + * The attempt ID will be set if the service is started within a YARN application master; + * there is then a different attempt ID for every time that AM is restarted. + * When the service binding is instantiated on a client, there's no attempt ID, as it lacks + * this information. + * @param sparkContext current spark context + * @param applicationId YARN application ID + * @param attemptId YARN attemptID -if known. + */ +case class SchedulerExtensionServiceBinding( +sparkContext: SparkContext, +applicationId: ApplicationId, +attemptId: Option[ApplicationAttemptId] = None) + +/** + * Container for [[SchedulerExtensionService]] instances. + * + * Loads Extension Services from the configuration property + * `"spark.yarn.services"`, instantiates and starts them. + * When stopped, it stops all child entries. + * + * The order in which child extension services are started and stopped + * is undefined. + * + */ +private[spark] class SchedulerExtensionServices extends SchedulerExtensionService +with Logging { + private var services: List[SchedulerExtensionService] = Nil + private val started = new AtomicBoolean(false) + private var binding: SchedulerExtensionServiceBinding = _ + + /** + * Binding operation will load the named services and call bind on them too; the + * entire set of services are then ready for `init()` and `start()` calls. + * + * @param binding binding to the spark application and YARN + */ + def start(binding: SchedulerExtensionServiceBinding): Unit = { +if (started.getAndSet(true)) { + logWarning("Ignoring re-entrant start operation") + return +} +require(binding.sparkContext != null, "Null context parameter") +require(binding.applicationId != null, "Null appId parameter") +this.binding = binding +val sparkContext = binding.sparkContext +val appId = binding.applicationId +val attemptId = binding.attemptId +logInfo(s"Starting Yarn extension services with app ${binding.applicationId}" + + s" and attemptId $attemptId") + +services = sparkContext.getConf.getOption(SchedulerExtensionServices.SPARK_YARN_SERVICES) + .map { s => +s.split(",").map(_.trim()).filter(!_.isEmpty) + .map { sClass => +val instance = Utils.classForName(sClass) +
[GitHub] spark pull request: [SPARK-10986][Mesos] Set the context class loa...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/9282#issuecomment-152154833 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11383][Docs] Replaced example code in m...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9353#issuecomment-152157846 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9182#issuecomment-152162385 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9182#issuecomment-152162358 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11338][WebUI] Prepend app links on Hist...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9291#issuecomment-152180307 **[Test build #44589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44589/consoleFull)** for PR 9291 at commit [`8bcb3dc`](https://github.com/apache/spark/commit/8bcb3dc16dd07916ef829bceced46f1d436d1b10). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9182#issuecomment-152188739 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/9253#discussion_r43375231 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -706,6 +706,23 @@ abstract class RDD[T: ClassTag]( } /** + * Spark's internal mapPartitions method which skips closure cleaning. To be used carefully + * only if we are sure that the RDD elements are serializable and don't require closure + * cleaning --- End diff -- can you add that this is mainly for performance improvements? Also you're missing a period at the end. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10641][SQL] Add Skewness and Kurtosis S...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9003#issuecomment-152175141 **[Test build #44595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44595/consoleFull)** for PR 9003 at commit [`ff363cc`](https://github.com/apache/spark/commit/ff363cca57e2b1c2bb28e281d014d33b930fd603). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9287#issuecomment-152175275 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9287#issuecomment-152175308 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11402: Use ChildRunnerProvider to create...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9354#issuecomment-152182642 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11402: Use ChildRunnerProvider to create...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9354#issuecomment-152182613 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10986][Mesos] Set the context class loa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9282#issuecomment-152187108 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44591/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10986][Mesos] Set the context class loa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9282#issuecomment-152187102 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9253#issuecomment-152156215 **[Test build #44592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44592/consoleFull)** for PR 9253 at commit [`6a9f738`](https://github.com/apache/spark/commit/6a9f738bb3008cadc7ce855fd33115fbb29d1c0a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43378157 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala --- @@ -51,6 +51,41 @@ private[spark] abstract class YarnSchedulerBackend( private implicit val askTimeout = RpcUtils.askRpcTimeout(sc.conf) + /** Application ID. Must be set by a subclass before starting the service */ + private var appId: ApplicationId = null + + /** Attempt ID. This is unset for client-mode schedulers */ + private var attemptId: Option[ApplicationAttemptId] = None + + /** Scheduler extension services */ + private val services: SchedulerExtensionServices = new SchedulerExtensionServices() + + /** +* Bind to YARN. This *must* be done before calling [[start()]]. +* +* @param appId YARN application ID +* @param attemptId Optional YARN attempt ID +*/ + protected def bindToYarn(appId: ApplicationId, attemptId: Option[ApplicationAttemptId]): Unit = { +this.appId = appId +this.attemptId = attemptId + } + + override def start() { +require(appId != null, "application ID unset") +val binding = SchedulerExtensionServiceBinding(sc, appId, attemptId) +services.start(binding) --- End diff -- But do you need the parsed information? e.g. `ApplicationId` has a "cluster timestamp" and an id; I don't see much use in providing those separately to these services, the string id seems good enough in my view. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9287#issuecomment-152175077 **[Test build #44594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44594/consoleFull)** for PR 9287 at commit [`281d4f4`](https://github.com/apache/spark/commit/281d4f44f7c237a8a76db47ea61a4c981a28a409). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9340#issuecomment-152177512 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9340#issuecomment-152179766 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9340#issuecomment-152179768 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44597/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9340#issuecomment-152179761 **[Test build #44597 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44597/consoleFull)** for PR 9340 at commit [`ab42465`](https://github.com/apache/spark/commit/ab42465d44393a869fef7a3d9f674f77f9155793). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `public class JavaAssociationRulesExample `\n * `public class JavaPrefixSpanExample `\n * `public class JavaSimpleFPGrowth `\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9340#issuecomment-152181074 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Github user JasmineGeorge commented on the pull request: https://github.com/apache/spark/pull/9057#issuecomment-152181045 Sorry I can't get to it until next Wednesday.. Can someone else take over --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11235] [network] Add ability to stream ...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/9206#discussion_r43389586 --- Diff: network/common/src/main/java/org/apache/spark/network/util/TransportFrameDecoder.java --- @@ -0,0 +1,146 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.util; + +import com.google.common.base.Preconditions; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.CompositeByteBuf; +import io.netty.channel.ChannelHandlerContext; +import io.netty.channel.ChannelInboundHandlerAdapter; + +/** + * A customized frame decoder that allows intercepting raw data. + * + * This behaves like Netty's frame decoder (with harcoded parameters that match this library's + * needs), except it allows an interceptor to be installed to read data directly before it's + * framed. + * + * Unlike Netty's frame decoder, each frame is dispatched to child handlers as soon as it's + * decoded, instead of building as many frames as the current buffer allows and dispatching + * all of them. This allows a child handler to install an interceptor if needed. + * + * If an interceptor is installed, framing stops, and data is instead fed directly to the + * interceptor. When the interceptor indicates that it doesn't need to read any more data, + * framing resumes. Interceptors should not hold references to the data buffers provided + * to their handle() method. + */ +public class TransportFrameDecoder extends ChannelInboundHandlerAdapter { + + public static final String HANDLER_NAME = "frameDecoder"; + private static final int LENGTH_SIZE = 8; + private static final int MAX_FRAME_SIZE = Integer.MAX_VALUE; + + private CompositeByteBuf buffer; + private volatile Interceptor interceptor; + + @Override + public void channelRead(ChannelHandlerContext ctx, Object data) throws Exception { +ByteBuf in = (ByteBuf) data; + +if (buffer == null) { + buffer = in.alloc().compositeBuffer(); +} + +buffer.writeBytes(in); + +while (buffer.isReadable()) { + feedInterceptor(); + if (interceptor != null) { +continue; + } + + ByteBuf frame = decodeNext(); + if (frame != null) { +ctx.fireChannelRead(frame); + } else { +break; + } +} + +// We can't discard read sub-buffers if there are other references to the buffer (e.g. +// through slices used for framing). This assumes that code that retains references +// will call retain() from the thread that called "fireChannelRead()" above, otherwise +// ref counting will go awry. +if (buffer != null && buffer.refCnt() == 1) { + buffer.discardReadComponents(); +} + } + + protected ByteBuf decodeNext() throws Exception { +if (buffer.readableBytes() < LENGTH_SIZE) { + return null; +} + +int frameLen = (int) buffer.readLong() - LENGTH_SIZE; --- End diff -- doh, sorry I totally missed that, this is fine. I guess I have just seen it the other way in some examples. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2750][WEB UI]Add Https support for Web ...
Github user jacek-lewandowski commented on the pull request: https://github.com/apache/spark/pull/5664#issuecomment-152194366 @WangTaoTheTonic can you rebase and squash? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9258#issuecomment-152152727 **[Test build #44590 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44590/consoleFull)** for PR 9258 at commit [`824be91`](https://github.com/apache/spark/commit/824be9104bfb81b260912dc86a0dba7508d1d3f5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11383][Docs] Replaced example code in m...
GitHub user rishabhbhardwaj opened a pull request: https://github.com/apache/spark/pull/9353 [SPARK-11383][Docs] Replaced example code in mllib using include_example I have made the required changes in mllib-naive-bayes.md/mllib-isotonic-regression.md and also verified them. Kindle Review it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rishabhbhardwaj/spark SPARK-11383 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9353.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9353 commit d152cb5ac855eeeac962a4b547f6f96522fd1223 Author: Rishabh BhardwajDate: 2015-10-19T06:42:56Z [ SPARK-11180 ] [ SQL ] DataFrame.na.fill does not support Boolean Type commit a53a20d756cfd26ca37acf9dbbd0b4e034f430d8 Author: Rishabh Bhardwaj Date: 2015-10-20T09:50:28Z Merge remote-tracking branch 'upstream/master' commit 870cbb384db84ffcc128114b38b495095e424ace Author: Rishabh Bhardwaj Date: 2015-10-26T09:58:48Z Merge remote-tracking branch 'upstream/master' commit a21b0ed6d86811e5eedff0e4634da010062d225b Author: Rishabh Bhardwaj Date: 2015-10-29T08:57:54Z Merge remote-tracking branch 'upstream/master' commit f40fcc182bb82d7d12aeb98b080b7362bd75ee4e Author: Rishabh Bhardwaj Date: 2015-10-29T11:54:55Z [SPARK-11383][Docs] Replace example code in mllib-naive-bayes.md/mllib-isotonic-regression.md using include_example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/9182#discussion_r43376459 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala --- @@ -51,6 +51,41 @@ private[spark] abstract class YarnSchedulerBackend( private implicit val askTimeout = RpcUtils.askRpcTimeout(sc.conf) + /** Application ID. Must be set by a subclass before starting the service */ + private var appId: ApplicationId = null + + /** Attempt ID. This is unset for client-mode schedulers */ + private var attemptId: Option[ApplicationAttemptId] = None + + /** Scheduler extension services */ + private val services: SchedulerExtensionServices = new SchedulerExtensionServices() + + /** +* Bind to YARN. This *must* be done before calling [[start()]]. +* +* @param appId YARN application ID +* @param attemptId Optional YARN attempt ID +*/ + protected def bindToYarn(appId: ApplicationId, attemptId: Option[ApplicationAttemptId]): Unit = { +this.appId = appId +this.attemptId = attemptId + } + + override def start() { +require(appId != null, "application ID unset") +val binding = SchedulerExtensionServiceBinding(sc, appId, attemptId) +services.start(binding) --- End diff -- string parsing has proven fairly brittle in the past; the move from single to multiple attempts broke all apps trying to do it across versions (i.e. a hadoop 2.2 parser in a 2.5 cluster). Unless you want to base-64 encode the protobuf representation, I'd avoid that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9182#issuecomment-152164469 **[Test build #44593 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44593/consoleFull)** for PR 9182 at commit [`8a6a1f1`](https://github.com/apache/spark/commit/8a6a1f13235fd00dcc58c4106b0314098f961e67). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9182#issuecomment-152164476 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44593/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9182#issuecomment-152164473 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10849][SQL] Adding field metadata prope...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9352#issuecomment-152169726 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...
Github user pravingadakh commented on the pull request: https://github.com/apache/spark/pull/9340#issuecomment-152176928 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9258#issuecomment-152178362 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44590/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11402: Use ChildRunnerProvider to create...
GitHub user jacek-lewandowski opened a pull request: https://github.com/apache/spark/pull/9354 SPARK-11402: Use ChildRunnerProvider to create ExecutorRunner and DriverRunner Abstracted ExecutorRunner and DriverRunner. The current implementations were renamed to ExecutorRunnerImpl and DriverRunnerImpl respectively. Added a way to provide a custom implemnetation of the runners by defining their factories. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jacek-lewandowski/spark SPARK-11402 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9354.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9354 commit a2d2ec8a555d5a2adf20cb2cf29fc76b02e923a6 Author: Jacek LewandowskiDate: 2015-10-15T15:08:21Z SPARK-11402: Use ChildRunnerProvider to create ExecutorRunner and DriverRunner Abstracted ExecutorRunner and DriverRunner. The current implementations were renamed to ExecutorRunnerImpl and DriverRunnerImpl respectively. Added a way to provide a custom implemnetation of the runners by defining their factories. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11314] [YARN] add service API and test ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9182#issuecomment-152188763 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10958] Use json4s 3.3.0. Formats is now...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/8992#issuecomment-152193338 Do you mind closing this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/9258#issuecomment-152154426 @zsxwing I took a quick look and I have a high level question. Why not just do the checkpointing iterator? IIUC this approach involves reading the iterator back from disk to return the values. Wouldn't that be potentially expensive? Also, this doesn't fix it for local checkpointing. If we have a general checkpointing iterator, then RDD doesn't have to change much and we don't need to introduce another `CheckpointManager`, which I find a little clunky. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11388][Build]Fix self closing tags.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9339#issuecomment-152170766 **[Test build #44588 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44588/consoleFull)** for PR 9339 at commit [`7d80528`](https://github.com/apache/spark/commit/7d8052830cdf6456e4a8e3233c943bccf595dc9d). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11388][Build]Fix self closing tags.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9339#issuecomment-152171116 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44588/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11388][Build]Fix self closing tags.
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9339#issuecomment-152171108 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9287#issuecomment-152175508 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9287#issuecomment-152175595 **[Test build #44596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44596/consoleFull)** for PR 9287 at commit [`281d4f4`](https://github.com/apache/spark/commit/281d4f44f7c237a8a76db47ea61a4c981a28a409). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9287#issuecomment-152175502 **[Test build #44594 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44594/consoleFull)** for PR 9287 at commit [`281d4f4`](https://github.com/apache/spark/commit/281d4f44f7c237a8a76db47ea61a4c981a28a409). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `class ExternalShuffleService(`\n * ` case class BoundPortsResponse(`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11326: Split networking in standalone mo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9287#issuecomment-152175511 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44594/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9258#issuecomment-152178180 **[Test build #44590 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44590/consoleFull)** for PR 9258 at commit [`824be91`](https://github.com/apache/spark/commit/824be9104bfb81b260912dc86a0dba7508d1d3f5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9340#issuecomment-152179279 **[Test build #44597 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44597/consoleFull)** for PR 9340 at commit [`ab42465`](https://github.com/apache/spark/commit/ab42465d44393a869fef7a3d9f674f77f9155793). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11380][Docs] Replace example code in ml...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9340#issuecomment-152181139 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11388][Build]Fix self closing tags.
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/9339#issuecomment-152192964 merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7970] Skip closure cleaning for SQL ope...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9253#issuecomment-152196314 **[Test build #44592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44592/consoleFull)** for PR 9253 at commit [`6a9f738`](https://github.com/apache/spark/commit/6a9f738bb3008cadc7ce855fd33115fbb29d1c0a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11348 Replace addOnCompleteCallback with...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9356#issuecomment-152201206 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11348 Replace addOnCompleteCallback with...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9356#issuecomment-152201243 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9298][SQL] Add pearson correlation aggr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/8587#discussion_r43395879 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala --- @@ -524,6 +525,133 @@ case class Sum(child: Expression) extends DeclarativeAggregate { override val evaluateExpression = Cast(currentSum, resultType) } +/** + * Compute Pearson correlation between two expressions. + * When applied on empty data (i.e., count is zero), it returns NaN. + * + * Definition of Pearson correlation can be found at + * http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient + * + * @param left one of the expressions to compute correlation with. + * @param right another expression to compute correlation with. + */ +case class Corr( +left: Expression, +right: Expression, +mutableAggBufferOffset: Int = 0, +inputAggBufferOffset: Int = 0) + extends ImperativeAggregate { + + def children: Seq[Expression] = Seq(left, right) + + def nullable: Boolean = false + + def dataType: DataType = DoubleType + + def inputTypes: Seq[AbstractDataType] = Seq(DoubleType) + + def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + def inputAggBufferAttributes: Seq[AttributeReference] = aggBufferAttributes.map(_.newInstance()) + + val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("MkX", DoubleType)(), +AttributeReference("MkY", DoubleType)(), +AttributeReference("count", LongType)()) + + override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: Int): ImperativeAggregate = +copy(mutableAggBufferOffset = newMutableAggBufferOffset) + + override def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): ImperativeAggregate = +copy(inputAggBufferOffset = newInputAggBufferOffset) + + override def initialize(buffer: MutableRow): Unit = { +(0 until 5).map(idx => buffer.setDouble(mutableAggBufferOffset + idx, 0.0)) +buffer.setLong(mutableAggBufferOffset + 5, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val x = left.eval(input).asInstanceOf[Double] +val y = right.eval(input).asInstanceOf[Double] + +var xAvg = buffer.getDouble(mutableAggBufferOffset) +var yAvg = buffer.getDouble(mutableAggBufferOffset + 1) +var Ck = buffer.getDouble(mutableAggBufferOffset + 2) +var MkX = buffer.getDouble(mutableAggBufferOffset + 3) +var MkY = buffer.getDouble(mutableAggBufferOffset + 4) +var count = buffer.getLong(mutableAggBufferOffset + 5) + +val deltaX = x - xAvg +val deltaY = y - yAvg +count += 1 +xAvg += deltaX / count +yAvg += deltaY / count +Ck += deltaX * (y - yAvg) +MkX += deltaX * (x - xAvg) +MkY += deltaY * (y - yAvg) + +buffer.setDouble(mutableAggBufferOffset, xAvg) +buffer.setDouble(mutableAggBufferOffset + 1, yAvg) +buffer.setDouble(mutableAggBufferOffset + 2, Ck) +buffer.setDouble(mutableAggBufferOffset + 3, MkX) +buffer.setDouble(mutableAggBufferOffset + 4, MkY) +buffer.setLong(mutableAggBufferOffset + 5, count) + } + + // Merge counters from other partitions. Formula can be found at: + // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + override def merge(buffer1: MutableRow, buffer2: InternalRow): Unit = { +val count2 = buffer2.getLong(inputAggBufferOffset + 5) + +if (count2 > 0) { --- End diff -- We only need to consider count in buffer2. I will add document for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9298][SQL] Add pearson correlation aggr...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/8587#discussion_r43395797 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala --- @@ -556,6 +556,33 @@ abstract class AggregationQuerySuite extends QueryTest with SQLTestUtils with Te Row(0, null, 1, 1, null, 0) :: Nil) } + test("pearson correlation") { +val df = Seq.tabulate(10)(i => (1.0 * i, 2.0 * i, i * -1.0)).toDF("a", "b", "c") +val corr1 = df.repartition(2).groupBy().agg(corr("a", "b")).collect()(0).getDouble(0) +assert(math.abs(corr1 - 1.0) < 1e-12) +val corr2 = df.groupBy().agg(corr("a", "c")).collect()(0).getDouble(0) +assert(math.abs(corr2 + 1.0) < 1e-12) +// non-trivial example. To reproduce in python, use: +// >>> from scipy.stats import pearsonr +// >>> import numpy as np +// >>> a = np.array(range(20)) +// >>> b = np.array([x * x - 2 * x + 3.5 for x in range(20)]) +// >>> pearsonr(a, b) +// (0.95723391394758572, 3.8902121417802199e-11) +// In R, use: +// > a <- 0:19 +// > b <- mapply(function(x) x * x - 2 * x + 3.5, a) +// > cor(a, b) +// [1] 0.957233913947585835 +val df2 = Seq.tabulate(20)(x => (1.0 * x, x * x - 2 * x + 3.5)).toDF("a", "b") +val corr3 = df2.groupBy().agg(corr("a", "b")).collect()(0).getDouble(0) +assert(math.abs(corr3 - 0.95723391394758572) < 1e-12) + +val df3 = Seq.tabulate(0)(i => (1.0 * i, 2.0 * i)).toDF("a", "b") +val corr4 = df3.groupBy().agg(corr("a", "b")).collect()(0).getDouble(0) +assert(corr4.isNaN) + } --- End diff -- I will add ImplicitCastInputTypes to case class Corr. So the other NumericType can be automatically casting to double. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org