[GitHub] spark pull request #21563: [SPARK-24557][ML] ClusteringEvaluator support arr...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/21563 [SPARK-24557][ML] ClusteringEvaluator support array input ## What changes were proposed in this pull request? ClusteringEvaluator support array input ## How was this patch tested? added tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark clu_eval_support_array Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21563.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21563 commit b126bd4f410ab4a01bbe7a980042704ea7420c6f Author: éçå³° Date: 2018-06-14T08:15:43Z init pr --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21563 **[Test build #91828 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91828/testReport)** for PR 21563 at commit [`b126bd4`](https://github.com/apache/spark/commit/b126bd4f410ab4a01bbe7a980042704ea7420c6f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21564: [SPARK-24556][SQL] ReusedExchange should rewrite output ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21564 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use level triggering and state...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r195350385 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.util.concurrent.atomic.{AtomicInteger, AtomicLong} + +import io.fabric8.kubernetes.api.model.PodBuilder +import io.fabric8.kubernetes.client.KubernetesClient +import scala.collection.mutable + +import org.apache.spark.{SparkConf, SparkException} +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.KubernetesConf +import org.apache.spark.internal.Logging +import org.apache.spark.util.{Clock, Utils} + +private[spark] class ExecutorPodsAllocator( +conf: SparkConf, +executorBuilder: KubernetesExecutorBuilder, +kubernetesClient: KubernetesClient, +snapshotsStore: ExecutorPodsSnapshotsStore, +clock: Clock) extends Logging { + + private val EXECUTOR_ID_COUNTER = new AtomicLong(0L) + + private val totalExpectedExecutors = new AtomicInteger(0) + + private val podAllocationSize = conf.get(KUBERNETES_ALLOCATION_BATCH_SIZE) + + private val podAllocationDelay = conf.get(KUBERNETES_ALLOCATION_BATCH_DELAY) + + private val podCreationTimeout = math.max(podAllocationDelay * 5, 6) + + private val kubernetesDriverPodName = conf +.get(KUBERNETES_DRIVER_POD_NAME) +.getOrElse(throw new SparkException("Must specify the driver pod name")) + + private val driverPod = kubernetesClient.pods() +.withName(kubernetesDriverPodName) +.get() + + // Executor IDs that have been requested from Kubernetes but have not been detected in any + // snapshot yet. Mapped to the timestamp when they were created. + private val newlyCreatedExecutors = mutable.Map.empty[Long, Long] + + def start(applicationId: String): Unit = { +snapshotsStore.addSubscriber(podAllocationDelay) { + onNewSnapshots(applicationId, _) +} + } + + def setTotalExpectedExecutors(total: Int): Unit = totalExpectedExecutors.set(total) + + private def onNewSnapshots(applicationId: String, snapshots: Seq[ExecutorPodsSnapshot]): Unit = { +newlyCreatedExecutors --= snapshots.flatMap(_.executorPods.keys) +// For all executors we've created against the API but have not seen in a snapshot +// yet - check the current time. If the current time has exceeded some threshold, +// assume that the pod was either never created (the API server never properly +// handled the creation request), or the API server created the pod but we missed +// both the creation and deletion events. In either case, delete the missing pod +// if possible, and mark such a pod to be rescheduled below. +newlyCreatedExecutors.foreach { case (execId, timeCreated) => + if (clock.getTimeMillis() - timeCreated > podCreationTimeout) { +logWarning(s"Executor with id $execId was not detected in the Kubernetes" + + s" cluster after $podCreationTimeout milliseconds despite the fact that a" + + " previous allocation attempt tried to create it. The executor may have been" + + " deleted but the application missed the deletion event.") +Utils.tryLogNonFatalError { + kubernetesClient +.pods() +.withLabel(SPARK_EXECUTOR_ID_LABEL, execId.toString) +.delete() --- End diff -- Shouldn't deleteFromSpark called here as well? Couldn't be the case that the executor exists at a higher level but K8s backend missed it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For
[GitHub] spark pull request #21564: [SPARK-24556][SQL] ReusedExchange should rewrite ...
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/21564#discussion_r195354829 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -170,6 +170,8 @@ case class InMemoryTableScanExec( override def outputPartitioning: Partitioning = { relation.cachedPlan.outputPartitioning match { case h: HashPartitioning => updateAttribute(h).asInstanceOf[HashPartitioning] + case r: RangePartitioning => +r.copy(ordering = r.ordering.map(updateAttribute(_).asInstanceOf[SortOrder])) --- End diff -- Good suggestion, thanks @mgaido91. @viirya Do we need consider below: `PartitioningCollection` in `InMemoryTableScanExec.outputPartitioning`, which is also `Expression`? `PartitioningCollection` and `BroadcastPartitioning` in `ReusedExchangeExec.outputPartitioning`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21566: [Python] Fix typo in serializer exception
GitHub user rberenguel opened a pull request: https://github.com/apache/spark/pull/21566 [Python] Fix typo in serializer exception ## What changes were proposed in this pull request? Fix typo in exception raised in Python serializer ## How was this patch tested? No code changes Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rberenguel/spark fix_typo_pyspark_serializers Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21566.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21566 commit bd8627c879131a4af364ef667f4ac1209ea6909a Author: Ruben Berenguel Montoro Date: 2018-06-14T11:11:14Z Fix typo in serializer exception --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21563: [SPARK-24557][ML] ClusteringEvaluator support arr...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21563#discussion_r195389157 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala --- @@ -107,15 +106,18 @@ class ClusteringEvaluator @Since("2.3.0") (@Since("2.3.0") override val uid: Str @Since("2.3.0") override def evaluate(dataset: Dataset[_]): Double = { -SchemaUtils.checkColumnType(dataset.schema, $(featuresCol), new VectorUDT) +SchemaUtils.validateVectorCompatibleColumn(dataset.schema, $(featuresCol)) SchemaUtils.checkNumericType(dataset.schema, $(predictionCol)) +val vectorCol = DatasetUtils.columnToVector(dataset, $(featuresCol)) +val df = dataset.select(col($(predictionCol)), --- End diff -- not sure this is the right way. Probably we can face the same issue everywhere we are using `DatasetUtils.columnToVector`. Probably it is better to fix the problem there. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21441 **[Test build #91839 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91839/testReport)** for PR 21441 at commit [`c630f4a`](https://github.com/apache/spark/commit/c630f4a55ce38e8e05a575064215a4432e4e01ad). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91839/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21529: [SPARK-24495][SQL] EnsureRequirement returns wrong plan ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21529 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21529: [SPARK-24495][SQL] EnsureRequirement returns wrong plan ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21529 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4031/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use level triggering and state...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r195401593 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsPollingSnapshotSource.scala --- @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import java.util.concurrent.{Future, ScheduledExecutorService, TimeUnit} + +import io.fabric8.kubernetes.client.KubernetesClient +import scala.collection.JavaConverters._ + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.util.ThreadUtils + +private[spark] class ExecutorPodsPollingSnapshotSource( +conf: SparkConf, +kubernetesClient: KubernetesClient, +snapshotsStore: ExecutorPodsSnapshotsStore, +pollingExecutor: ScheduledExecutorService) { --- End diff -- Could you add some debug logging here. In general it would be good to be able to trace what is happening in case of a an issue with debug mode, this applies to all classes introduced for both watching and polling. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21561 **[Test build #91823 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91823/testReport)** for PR 21561 at commit [`61b95a3`](https://github.com/apache/spark/commit/61b95a35ecea4ae21e95fb8370bc4b4525370435). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91823/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21547 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4019/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21547 **[Test build #91827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91827/testReport)** for PR 21547 at commit [`5b2150b`](https://github.com/apache/spark/commit/5b2150b7d8ffcd5f5893fd8a10e31a7c1fa79c52). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21547 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21321: [SPARK-24268][SQL] Use datatype.simpleString in e...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21321#discussion_r195344259 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala --- @@ -208,7 +208,8 @@ class FeatureHasher(@Since("2.3.0") override val uid: String) extends Transforme require(dataType.isInstanceOf[NumericType] || dataType.isInstanceOf[StringType] || dataType.isInstanceOf[BooleanType], -s"FeatureHasher requires columns to be of NumericType, BooleanType or StringType. " + +s"FeatureHasher requires columns to be of ${NumericType.simpleString}, " + --- End diff -- I think this PR rewrites always constant type referenced. I am not sure why you are saying it is not. If I missed some places, then it was just because I haven't seen them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21550: [SPARK-24543][SQL] Support any type as DDL string for fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21550 **[Test build #91832 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91832/testReport)** for PR 21550 at commit [`af946b8`](https://github.com/apache/spark/commit/af946b8ada5af91428e7ab44478e920308846a59). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK] Classification and regression...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16171 It is out of date, and I will close it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use level triggering and state...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r195379060 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala --- @@ -56,17 +58,44 @@ private[spark] class KubernetesClusterManager extends ExternalClusterManager wit Some(new File(Config.KUBERNETES_SERVICE_ACCOUNT_TOKEN_PATH)), Some(new File(Config.KUBERNETES_SERVICE_ACCOUNT_CA_CRT_PATH))) -val allocatorExecutor = ThreadUtils - .newDaemonSingleThreadScheduledExecutor("kubernetes-pod-allocator") val requestExecutorsService = ThreadUtils.newDaemonCachedThreadPool( "kubernetes-executor-requests") + +val bufferSnapshotsExecutor = ThreadUtils + .newDaemonSingleThreadScheduledExecutor("kubernetes-executor-snapshots-buffer") +val snapshotsStore = new ExecutorPodsSnapshotsStoreImpl(bufferSnapshotsExecutor) +val removedExecutorsCache = CacheBuilder.newBuilder() + .expireAfterWrite(3, TimeUnit.MINUTES) --- End diff -- Why 3 minutes? Should this be configurable? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21567: [SPARK-24560][SS][MESOS] Fix some getTimeAsMs as getTime...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21567 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/139/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21529: [SPARK-24495][SQL] EnsureRequirement returns wrong plan ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21529 **[Test build #91826 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91826/testReport)** for PR 21529 at commit [`6ef4f0d`](https://github.com/apache/spark/commit/6ef4f0df7590f0da5aa900f29292ec0fe94658fb). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21441 **[Test build #91840 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91840/testReport)** for PR 21441 at commit [`c630f4a`](https://github.com/apache/spark/commit/c630f4a55ce38e8e05a575064215a4432e4e01ad). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4028/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21529: [SPARK-24495][SQL] EnsureRequirement returns wrong plan ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21529 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21547 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21547: [SPARK-24538][SQL] ByteArrayDecimalType support push dow...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21547 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91827/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21550: [SPARK-24543][SQL] Support any type as DDL string for fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21550 **[Test build #91844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91844/testReport)** for PR 21550 at commit [`af946b8`](https://github.com/apache/spark/commit/af946b8ada5af91428e7ab44478e920308846a59). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21537 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21537 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91831/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4021/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21564: [SPARK-24556][SQL] ReusedExchange should rewrite ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21564#discussion_r195349283 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -170,6 +170,8 @@ case class InMemoryTableScanExec( override def outputPartitioning: Partitioning = { relation.cachedPlan.outputPartitioning match { case h: HashPartitioning => updateAttribute(h).asInstanceOf[HashPartitioning] + case r: RangePartitioning => +r.copy(ordering = r.ordering.map(updateAttribute(_).asInstanceOf[SortOrder])) --- End diff -- why not just `updateAttribute(r)`? Moreover, in order to avoid the same issue in the future with other cases, have you considered doing something like: ``` updateAttribute(relation.cachedPlan.outputPartitioning) `` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21563 **[Test build #91830 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91830/testReport)** for PR 21563 at commit [`d6e76e3`](https://github.com/apache/spark/commit/d6e76e3ecf02ea23d6d60aff58f1228f45ba0235). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21537 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21537 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21537 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91822/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21537 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4022/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21537 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/133/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20611 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91833/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20611 **[Test build #91833 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91833/testReport)** for PR 20611 at commit [`02f1b3e`](https://github.com/apache/spark/commit/02f1b3ef0a38eb75098644eaa4d043c92d2eab84). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20611 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21566: [Python] Fix typo in serializer exception
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21566 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21566: [Python] Fix typo in serializer exception
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21566 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21567: [SPARK-24560][SS][MESOS] Fix some getTimeAsMs as getTime...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21567 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21567: [SPARK-24560][SS][MESOS] Fix some getTimeAsMs as getTime...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21567 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Verify and normalize a partition colu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4026/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Verify and normalize a partition colu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/141/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21529: [SPARK-24495][SQL] EnsureRequirement returns wrong plan ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21529 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/142/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21529: [SPARK-24495][SQL] EnsureRequirement returns wrong plan ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21529 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21550: [SPARK-24543][SQL] Support any type as DDL string for fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21550 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21550: [SPARK-24543][SQL] Support any type as DDL string for fr...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21550 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91832/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21550: [SPARK-24543][SQL] Support any type as DDL string for fr...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21550 **[Test build #91832 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91832/testReport)** for PR 21550 at commit [`af946b8`](https://github.com/apache/spark/commit/af946b8ada5af91428e7ab44478e920308846a59). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21550: [SPARK-24543][SQL] Support any type as DDL string for fr...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21550 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21462: [SPARK-24428][K8S] Fix unused code
Github user skonto commented on the issue: https://github.com/apache/spark/pull/21462 @felixcheung gentle ping. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21566: [Python] Fix typo in serializer exception
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21566 @rberenguel, it's okay but mind taking another look around here and see if there are more typos while we are here? I am pretty sure there are more. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21564: [SPARK-24556][SQL] ReusedExchange should rewrite ...
GitHub user yucai opened a pull request: https://github.com/apache/spark/pull/21564 [SPARK-24556][SQL] ReusedExchange should rewrite output partitioning also when child's partitioning is RangePartitioning ## What changes were proposed in this pull request? Currently, ReusedExchange would rewrite output partitioning if child's partitioning is HashPartitioning, but it does not do the same when child's partitioning is RangePartitioning, sometimes, it could introduce extra shuffle, see: ``` val df = Seq(1 -> "a", 3 -> "c", 2 -> "b").toDF("i", "j") val df1 = df.as("t1") val df2 = df.as("t2") val t = df1.orderBy("j").join(df2.orderBy("j"), $"t1.i" === $"t2.i", "right") t.cache.orderBy($"t2.j").explain ``` Before: ``` == Physical Plan == *(1) Sort [j#14 ASC NULLS FIRST], true, 0 +- Exchange rangepartitioning(j#14 ASC NULLS FIRST, 200) +- InMemoryTableScan [i#5, j#6, i#13, j#14] +- InMemoryRelation [i#5, j#6, i#13, j#14], CachedRDDBuilder... +- *(2) BroadcastHashJoin [i#5], [i#13], RightOuter, BuildLeft :- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as... : +- *(1) Sort [j#6 ASC NULLS FIRST], true, 0 : +- Exchange rangepartitioning(j#6 ASC NULLS FIRST, 200) :+- LocalTableScan [i#5, j#6] +- *(2) Sort [j#14 ASC NULLS FIRST], true, 0 +- ReusedExchange [i#13, j#14], Exchange rangepartitioning(j#6 ASC NULLS FIRST, 200) ``` Better plan should avoid ```Exchange rangepartitioning(j#14 ASC NULLS FIRST, 200)```, like: ``` == Physical Plan == *(1) Sort [j#14 ASC NULLS FIRST], true, 0 +- InMemoryTableScan [i#5, j#6, i#13, j#14] +- InMemoryRelation [i#5, j#6, i#13, j#14], CachedRDDBuilder... +- *(2) BroadcastHashJoin [i#5], [i#13], RightOuter, BuildLeft :- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) : +- *(1) Sort [j#6 ASC NULLS FIRST], true, 0 : +- Exchange rangepartitioning(j#6 ASC NULLS FIRST, 200) :+- LocalTableScan [i#5, j#6] +- *(2) Sort [j#14 ASC NULLS FIRST], true, 0 +- ReusedExchange [i#13, j#14], Exchange rangepartitioning(j#6 ASC NULLS FIRST, 200) ``` ## How was this patch tested? Add new tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yucai/spark SPARK-24556 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21564.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21564 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/132/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S] Use level triggering and state reconc...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/21366 @mccheah could you add a design doc for future reference and so that new contributors can understand better the rationale behind this. There is some description in the JIRA ticket but not enough to describe the final solution. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21564: [SPARK-24556][SQL] ReusedExchange should rewrite ...
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/21564#discussion_r195361725 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -170,6 +170,8 @@ case class InMemoryTableScanExec( override def outputPartitioning: Partitioning = { relation.cachedPlan.outputPartitioning match { case h: HashPartitioning => updateAttribute(h).asInstanceOf[HashPartitioning] + case r: RangePartitioning => +r.copy(ordering = r.ordering.map(updateAttribute(_).asInstanceOf[SortOrder])) --- End diff -- `BroadcastPartitioning`'s `BroadcastMode` contains `Expression`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21564: [SPARK-24556][SQL] ReusedExchange should rewrite ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21564#discussion_r19535 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -170,6 +170,8 @@ case class InMemoryTableScanExec( override def outputPartitioning: Partitioning = { relation.cachedPlan.outputPartitioning match { case h: HashPartitioning => updateAttribute(h).asInstanceOf[HashPartitioning] + case r: RangePartitioning => +r.copy(ordering = r.ordering.map(updateAttribute(_).asInstanceOf[SortOrder])) --- End diff -- Oh, like `HashedRelationBroadcastMode`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21565: wrong Idle Timeout value is used in case of the c...
GitHub user sandeep-katta opened a pull request: https://github.com/apache/spark/pull/21565 wrong Idle Timeout value is used in case of the cacheBlock. It is corrected as per the configuration. ## What changes were proposed in this pull request? IdleTimeout info used to print in the logs is taken based on the cacheBlock. If it is cacheBlock then cachedExecutorIdleTimeoutS is considered else executorIdleTimeoutS ## How was this patch tested? Manual Test spark-sql> cache table sample; 2018-05-15 14:44:02 INFO DAGScheduler:54 - Submitting 3 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[8] at processCmd at CliDriver.java:376) (first 15 tasks are for partitions Vector(0, 1, 2)) 2018-05-15 14:44:02 INFO YarnScheduler:54 - Adding task set 0.0 with 3 tasks 2018-05-15 14:44:03 INFO ExecutorAllocationManager:54 - Requesting 1 new executor because tasks are backlogged (new desired total will be 1) ... ... 2018-05-15 14:46:10 INFO YarnClientSchedulerBackend:54 - Actual list of executor(s) to be killed is 1 2018-05-15 14:46:10 INFO **ExecutorAllocationManager:54 - Removing executor 1 because it has been idle for 120 seconds (new desired total will be 0)** 2018-05-15 14:46:11 INFO YarnSchedulerBackend$YarnDriverEndpoint:54 - Disabling executor 1. 2018-05-15 14:46:11 INFO DAGScheduler:54 - Executor lost: 1 (epoch 1) You can merge this pull request into a Git repository by running: $ git pull https://github.com/sandeep-katta/spark loginfoBug Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21565.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21565 commit 30fcef650ee2bd2873bf402448652acba055f989 Author: sandeep-katta Date: 2018-06-14T09:56:59Z wrong Idle Timeout value is used in case of the cacheBlock. It is corrected as per the configuration. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21565: wrong Idle Timeout value is used in case of the cacheBlo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21565 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21565: wrong Idle Timeout value is used in case of the cacheBlo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21565 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21109: [SPARK-24020][SQL] Sort-merge join inner range optimizat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21109 **[Test build #91824 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91824/testReport)** for PR 21109 at commit [`82c194a`](https://github.com/apache/spark/commit/82c194a8a03b6cc028de303fbc07c68d6078cc2b). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Verify and normalize a partition colu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/137/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21441 **[Test build #91837 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91837/testReport)** for PR 21441 at commit [`cd3a0d1`](https://github.com/apache/spark/commit/cd3a0d1451c92f31c4f1d21d2225ed6cf330ecb5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Verify and normalize a partition colu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21379 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21379: [SPARK-24327][SQL] Verify and normalize a partition colu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21379 **[Test build #91838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91838/testReport)** for PR 21379 at commit [`d2ef95c`](https://github.com/apache/spark/commit/d2ef95c9c1ddb446a1ffda3935ea783e6e17b114). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21441 **[Test build #91841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91841/testReport)** for PR 21441 at commit [`ed6b9a0`](https://github.com/apache/spark/commit/ed6b9a0daebfbce81b49004befcf79b89b11d634). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/140/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21529: [SPARK-24495][SQL] EnsureRequirement returns wrong plan ...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21529 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21537 **[Test build #91831 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91831/testReport)** for PR 21537 at commit [`b592e66`](https://github.com/apache/spark/commit/b592e66c030ba7c2d260c3be48c3b15139f40e5b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21529: [SPARK-24495][SQL] EnsureRequirement returns wrong plan ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21529 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/129/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21529: [SPARK-24495][SQL] EnsureRequirement returns wrong plan ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21529 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21529: [SPARK-24495][SQL] EnsureRequirement returns wrong plan ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21529 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/131/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21563 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4020/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21564: [SPARK-24556][SQL] ReusedExchange should rewrite output ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21564 **[Test build #91829 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91829/testReport)** for PR 21564 at commit [`f37139b`](https://github.com/apache/spark/commit/f37139b2d07497af9df1984e5fb7a50931efbf9a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21564: [SPARK-24556][SQL] ReusedExchange should rewrite output ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21564 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19084: [SPARK-20711][ML]MultivariateOnlineSummarizer/Summarizer...
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/19084 @srowen Could you please give a final review? Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20611 **[Test build #91833 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91833/testReport)** for PR 20611 at commit [`02f1b3e`](https://github.com/apache/spark/commit/02f1b3ef0a38eb75098644eaa4d043c92d2eab84). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16171: [SPARK-18739][ML][PYSPARK] Classification and reg...
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/16171 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use level triggering and state...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r195380019 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala --- @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import com.google.common.cache.Cache +import io.fabric8.kubernetes.api.model.Pod +import io.fabric8.kubernetes.client.KubernetesClient +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.scheduler.ExecutorExited +import org.apache.spark.util.Utils + +private[spark] class ExecutorPodsLifecycleManager( +conf: SparkConf, +executorBuilder: KubernetesExecutorBuilder, +kubernetesClient: KubernetesClient, +snapshotsStore: ExecutorPodsSnapshotsStore, +// Use a best-effort to track which executors have been removed already. It's not generally +// job-breaking if we remove executors more than once but it's ideal if we make an attempt +// to avoid doing so. Expire cache entries so that this data structure doesn't grow beyond +// bounds. +removedExecutorsCache: Cache[java.lang.Long, java.lang.Long]) { + + import ExecutorPodsLifecycleManager._ + + private val eventProcessingInterval = conf.get(KUBERNETES_EXECUTOR_EVENT_PROCESSING_INTERVAL) + + def start(schedulerBackend: KubernetesClusterSchedulerBackend): Unit = { +snapshotsStore.addSubscriber(eventProcessingInterval) { + onNewSnapshots(schedulerBackend, _) +} + } + + private def onNewSnapshots( + schedulerBackend: KubernetesClusterSchedulerBackend, + snapshots: Seq[ExecutorPodsSnapshot]): Unit = { +val execIdsRemovedInThisRound = mutable.HashSet.empty[Long] +snapshots.foreach { snapshot => + snapshot.executorPods.foreach { case (execId, state) => +state match { + case deleted@PodDeleted(pod) => +removeExecutorFromSpark(schedulerBackend, deleted, execId) +execIdsRemovedInThisRound += execId + case failed@PodFailed(pod) => +onFinalNonDeletedState(failed, execId, schedulerBackend, execIdsRemovedInThisRound) + case succeeded@PodSucceeded(pod) => --- End diff -- same as above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21366: [SPARK-24248][K8S] Use level triggering and state...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/21366#discussion_r195379975 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala --- @@ -0,0 +1,164 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.scheduler.cluster.k8s + +import com.google.common.cache.Cache +import io.fabric8.kubernetes.api.model.Pod +import io.fabric8.kubernetes.client.KubernetesClient +import scala.collection.JavaConverters._ +import scala.collection.mutable + +import org.apache.spark.SparkConf +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.scheduler.ExecutorExited +import org.apache.spark.util.Utils + +private[spark] class ExecutorPodsLifecycleManager( +conf: SparkConf, +executorBuilder: KubernetesExecutorBuilder, +kubernetesClient: KubernetesClient, +snapshotsStore: ExecutorPodsSnapshotsStore, +// Use a best-effort to track which executors have been removed already. It's not generally +// job-breaking if we remove executors more than once but it's ideal if we make an attempt +// to avoid doing so. Expire cache entries so that this data structure doesn't grow beyond +// bounds. +removedExecutorsCache: Cache[java.lang.Long, java.lang.Long]) { + + import ExecutorPodsLifecycleManager._ + + private val eventProcessingInterval = conf.get(KUBERNETES_EXECUTOR_EVENT_PROCESSING_INTERVAL) + + def start(schedulerBackend: KubernetesClusterSchedulerBackend): Unit = { +snapshotsStore.addSubscriber(eventProcessingInterval) { + onNewSnapshots(schedulerBackend, _) +} + } + + private def onNewSnapshots( + schedulerBackend: KubernetesClusterSchedulerBackend, + snapshots: Seq[ExecutorPodsSnapshot]): Unit = { +val execIdsRemovedInThisRound = mutable.HashSet.empty[Long] +snapshots.foreach { snapshot => + snapshot.executorPods.foreach { case (execId, state) => +state match { + case deleted@PodDeleted(pod) => +removeExecutorFromSpark(schedulerBackend, deleted, execId) +execIdsRemovedInThisRound += execId + case failed@PodFailed(pod) => --- End diff -- same as above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21567: [SPARK-24560][SS][MESOS] Fix some getTimeAsMs as ...
GitHub user xueyumusic opened a pull request: https://github.com/apache/spark/pull/21567 [SPARK-24560][SS][MESOS] Fix some getTimeAsMs as getTimeAsSeconds ## What changes were proposed in this pull request? This PR replaces some "getTimeAsMs" with "getTimeAsSeconds". This will return a wrong value when the user specifies a value without a time unit. ## How was this patch tested? manual test Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xueyumusic/spark fixGetTimeAs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21567.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21567 commit 10bf41ec86c0af59a791fa02b5efaedc7a164a3c Author: xueyu <278006819@...> Date: 2018-06-14T11:01:29Z fix getTimeAs method --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21441: [DO-NOT-MERGE] Run tests against hadoop-3.1 to see the t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21441 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/4029/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21564: [SPARK-24556][SQL] ReusedExchange should rewrite output ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21564 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91829/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21564: [SPARK-24556][SQL] ReusedExchange should rewrite output ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21564 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org