[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16605 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16690: [SPARK-19347] ReceiverSupervisorImpl can add block to Re...
Github user squito commented on the issue: https://github.com/apache/spark/pull/16690 Jenkins, ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16605 **[Test build #71935 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71935/testReport)** for PR 16605 at commit [`f20de2c`](https://github.com/apache/spark/commit/f20de2c126e691183399b323a1b8abd4e50812eb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16620: [SPARK-19263] DAGScheduler should avoid sending c...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/16620#discussion_r97406162 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1218,7 +1225,9 @@ class DAGScheduler( logInfo("Resubmitting " + shuffleStage + " (" + shuffleStage.name + ") because some of its tasks had failed: " + shuffleStage.findMissingPartitions().mkString(", ")) -submitStage(shuffleStage) +if (noActiveTaskSetManager) { --- End diff -- shouldn't this condition go into the surrounding `if (!shuffleStage.isAvailable)` ? the logInfo is very confusing in this case otherwise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16620: [SPARK-19263] DAGScheduler should avoid sending c...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/16620#discussion_r97586026 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1193,7 +1193,14 @@ class DAGScheduler( } if (runningStages.contains(shuffleStage) && shuffleStage.pendingPartitions.isEmpty) { - markStageAsFinished(shuffleStage) + val noActiveTaskSetManager = +taskScheduler.rootPool == null || + !taskScheduler.rootPool.getSortedTaskSetQueue.exists { +tsm => tsm.stageId == stageId && !tsm.isZombie + } + if (shuffleStage.isAvailable || noActiveTaskSetManager) { +markStageAsFinished(shuffleStage) + } --- End diff -- I have to admit, though this passes all the tests, this is really confusing to me. I only somewhat understand why your original version didn't work, and why this should be used instead. Perhaps some more commenting here would help? The condition under which you do `markStageAsFinished` seems very broad, so perhaps its worth a comment on the case when you do *not* (and perhaps even a `logInfo` in an `else` branch). The discrepancy between pendingPartitions and availableOutputs is also surprising -- perhaps that is worth extra comments on `Stage`, on how the meaning of those two are different. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16620: [SPARK-19263] DAGScheduler should avoid sending c...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/16620#discussion_r97417513 --- Diff: core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala --- @@ -648,4 +648,69 @@ class BasicSchedulerIntegrationSuite extends SchedulerIntegrationSuite[SingleCor } assertDataStructuresEmpty(noFailure = false) } + + testScheduler("[SPARK-19263] DAGScheduler shouldn't resubmit active taskSet.") { +val a = new MockRDD(sc, 2, Nil) +val b = shuffle(2, a) +val shuffleId = b.shuffleDeps.head.shuffleId + +def runBackend(): Unit = { + val (taskDescription, task) = backend.beginTask() + task.stageId match { +// ShuffleMapTask +case 0 => + val stageAttempt = task.stageAttemptId + val partitionId = task.partitionId + (stageAttempt, partitionId) match { +case (0, 0) => + val fetchFailed = FetchFailed( +DAGSchedulerSuite.makeBlockManagerId("hostA"), shuffleId, 0, 0, "ignored") + backend.taskFailed(taskDescription, fetchFailed) +case (0, 1) => + // Wait until stage resubmission caused by FetchFailed is finished. + waitUntilConditionBecomeTrue(taskScheduler.runningTaskSets.size==2, 5000, +"Wait until stage is resubmitted caused by fetch failed") + + // Task(stageAttempt=0, partition=1) will be bogus, because both two + // tasks(stageAttempt=0, partition=0, 1) run on hostA. + // Pending partitions are (0, 1) after stage resubmission, + // then change to be 0 after this bogus task. + backend.taskSuccess(taskDescription, DAGSchedulerSuite.makeMapStatus("hostA", 2)) +case (1, 1) => + // Wait long enough until Success of task(stageAttempt=1 and partition=0) + // is handled by DAGScheduler. + Thread.sleep(5000) + // Task(stageAttempt=1 and partition=0) will cause stage resubmission, + // because shuffleStage.pendingPartitions.isEmpty, + // but shuffleStage.isAvailable is false. + backend.taskSuccess(taskDescription, DAGSchedulerSuite.makeMapStatus("hostB", 2)) +case _ => + backend.taskSuccess(taskDescription, DAGSchedulerSuite.makeMapStatus("hostB", 2)) + } +// ResultTask +case 1 => backend.taskSuccess(taskDescription, 10) + } +} + +withBackend(runBackend _) { + val jobFuture = submit(b, (0 until 2).toArray) + val duration = Duration(15, SECONDS) + awaitJobTermination(jobFuture, duration) +} +assert(results === (0 until 2).map { _ -> 10}.toMap) + } + + def waitUntilConditionBecomeTrue(condition: => Boolean, timeout: Long, msg: String): Unit = { --- End diff -- nit: rename to `waitForCondition` (maybe irrevlant given other comments) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16620: [SPARK-19263] DAGScheduler should avoid sending c...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/16620#discussion_r97417399 --- Diff: core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala --- @@ -648,4 +648,69 @@ class BasicSchedulerIntegrationSuite extends SchedulerIntegrationSuite[SingleCor } assertDataStructuresEmpty(noFailure = false) } + + testScheduler("[SPARK-19263] DAGScheduler shouldn't resubmit active taskSet.") { +val a = new MockRDD(sc, 2, Nil) +val b = shuffle(2, a) +val shuffleId = b.shuffleDeps.head.shuffleId + +def runBackend(): Unit = { + val (taskDescription, task) = backend.beginTask() + task.stageId match { +// ShuffleMapTask +case 0 => + val stageAttempt = task.stageAttemptId + val partitionId = task.partitionId + (stageAttempt, partitionId) match { +case (0, 0) => + val fetchFailed = FetchFailed( +DAGSchedulerSuite.makeBlockManagerId("hostA"), shuffleId, 0, 0, "ignored") + backend.taskFailed(taskDescription, fetchFailed) +case (0, 1) => + // Wait until stage resubmission caused by FetchFailed is finished. + waitUntilConditionBecomeTrue(taskScheduler.runningTaskSets.size==2, 5000, +"Wait until stage is resubmitted caused by fetch failed") + + // Task(stageAttempt=0, partition=1) will be bogus, because both two + // tasks(stageAttempt=0, partition=0, 1) run on hostA. + // Pending partitions are (0, 1) after stage resubmission, + // then change to be 0 after this bogus task. + backend.taskSuccess(taskDescription, DAGSchedulerSuite.makeMapStatus("hostA", 2)) +case (1, 1) => + // Wait long enough until Success of task(stageAttempt=1 and partition=0) + // is handled by DAGScheduler. + Thread.sleep(5000) --- End diff -- hmm, this is a nuisance. I don't see any good way to get rid of this sleep ... but now that I think about it, why can't you do this in `DAGSchedulerSuite`? it seems like this can be entirely contained to the `DAGScheduler` and doesn't require tricky interactions with other parts of the scheduler. (I'm sorry I pointed you in the wrong direction earlier -- I thought perhaps you had tried to copy the examples of `DAGSchedlerSuite` but there was some reason you couldn't.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16661 **[Test build #71937 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71937/testReport)** for PR 16661 at commit [`5672d13`](https://github.com/apache/spark/commit/5672d1345f661665f521fd1dd4410313ef3ab554). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16693: [SPARK-19152][SQL][followup] simplify CreateHiveTableAsS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16693 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16693: [SPARK-19152][SQL][followup] simplify CreateHiveTableAsS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16693 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71933/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16693: [SPARK-19152][SQL][followup] simplify CreateHiveTableAsS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16693 **[Test build #71933 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71933/testReport)** for PR 16693 at commit [`db00cf9`](https://github.com/apache/spark/commit/db00cf9061b2ad4263671f5ca9252642a091ee45). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16329: [SPARK-16046][DOCS] Aggregations in the Spark SQL...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16329#discussion_r97580048 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaUserDefinedTypedAggregation.java --- @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.examples.sql; + +// $example on:typed_custom_aggregation$ +import java.io.Serializable; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Encoder; +import org.apache.spark.sql.Encoders; +import org.apache.spark.sql.SparkSession; +import org.apache.spark.sql.TypedColumn; +import org.apache.spark.sql.expressions.Aggregator; +// $example off:typed_custom_aggregation$ + +public class JavaUserDefinedTypedAggregation { + + // $example on:typed_custom_aggregation$ + public static class Employee implements Serializable { +private String name; +private long salary; + +// Constructors, getters, setters... +// $example off:typed_custom_aggregation$ +public String getName() { + return name; +} + +public void setName(String name) { + this.name = name; +} + +public long getSalary() { + return salary; +} + +public void setSalary(long salary) { + this.salary = salary; +} +// $example on:typed_custom_aggregation$ + } + + public static class Average implements Serializable { +private long sum; +private long count; + +// Constructors, getters, setters... +// $example off:typed_custom_aggregation$ +public Average() { +} + +public Average(long sum, long count) { + this.sum = sum; + this.count = count; +} + +public long getSum() { + return sum; +} + +public void setSum(long sum) { + this.sum = sum; +} + +public long getCount() { + return count; +} + +public void setCount(long count) { + this.count = count; +} +// $example on:typed_custom_aggregation$ + } + + public static class MyAverage extends Aggregator { +// A zero value for this aggregation. Should satisfy the property that any b + zero = b +public Average zero() { --- End diff -- Is this meant to be `MyAverage`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user gmoehler commented on the issue: https://github.com/apache/spark/pull/16660 @viirya Which comment are you referring to? I thought i had included all of them ;-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [WIP][SQL] Use map output statistices to improve global ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #71936 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71936/testReport)** for PR 16677 at commit [`9d4cadb`](https://github.com/apache/spark/commit/9d4cadb782afcba52b8081402f5dd89cb0a27ae5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16269: [SPARK-19080][SQL] simplify data source analysis
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16269 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71932/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16269: [SPARK-19080][SQL] simplify data source analysis
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16269 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16269: [SPARK-19080][SQL] simplify data source analysis
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16269 **[Test build #71932 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71932/testReport)** for PR 16269 at commit [`48535aa`](https://github.com/apache/spark/commit/48535aae6be613c28f900e408f073a5eb7ef76cb). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class HiveFileFormat(fileSinkConf: FileSinkDesc)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16661#discussion_r97571098 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -486,6 +491,9 @@ class GaussianMixture @Since("2.0.0") ( @Since("2.0.0") object GaussianMixture extends DefaultParamsReadable[GaussianMixture] { + /** Limit number of features such that numFeatures^2^ < Integer.MaxValue */ --- End diff -- Nit: ```Integer.MaxValue``` is not a standard convention, it should be ```Int.MaxValue``` in Scala or ```Integer.MAX_VALUE``` in Java. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16661#discussion_r97570174 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -486,6 +491,9 @@ class GaussianMixture @Since("2.0.0") ( @Since("2.0.0") object GaussianMixture extends DefaultParamsReadable[GaussianMixture] { + /** Limit number of features such that numFeatures^2^ < Integer.MaxValue */ + private[clustering] val MAX_NUM_FEATURES = 46000 --- End diff -- +1 @srowen It's better to use the real max. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16605 **[Test build #71934 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71934/testReport)** for PR 16605 at commit [`c16b121`](https://github.com/apache/spark/commit/c16b121247394374fd6066309e1b7309b981eabb). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16605 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11867: [SPARK-14049] [CORE] Add functionality in spark h...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11867 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...
Github user squito commented on the issue: https://github.com/apache/spark/pull/11867 thanks @paragpc merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16605 **[Test build #71935 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71935/testReport)** for PR 16605 at commit [`f20de2c`](https://github.com/apache/spark/commit/f20de2c126e691183399b323a1b8abd4e50812eb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14918: [SPARK-17360][PYSPARK] Support generator in createDataFr...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14918 I do not agree with this change too by the same reason in https://github.com/apache/spark/pull/14918#issuecomment-250882422. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16614: [SPARK-19260] Spaces or "%20" in path parameter are not ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16614 **[Test build #3549 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3549/testReport)** for PR 16614 at commit [`23834a6`](https://github.com/apache/spark/commit/23834a6ec99ac7e8a8df9095ce523de0ede80aea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16605 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71934/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16605 **[Test build #71934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71934/testReport)** for PR 16605 at commit [`c16b121`](https://github.com/apache/spark/commit/c16b121247394374fd6066309e1b7309b981eabb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r97556081 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala --- @@ -475,6 +1164,45 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext { Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L))) } + test("to_unix_timestamp with session local timezone") { --- End diff -- The problem is, except this suite, all the changes you made to the tests are just fixed existing tests to fit the timezone stuff. You add all the new tests in this suite as end-to-end tests, which is not good. We should add new tests in `DateTimeUtilsSuite` as unit tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r97554829 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala --- @@ -475,6 +1164,45 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext { Row(ts1.getTime / 1000L), Row(ts2.getTime / 1000L))) } + test("to_unix_timestamp with session local timezone") { --- End diff -- I don't think we need to add tests in this file at all. We should improve `DateTimeUtilsSuite` to make sure the newly added methods work well with different timezones, e.g. `getHours`, `daysToMillions`, etc. Then make sure these timezone aware expressions will call the newly added methods in `DateTimeUtils` which has timezone parameter(we can remove the old versions that don't take timezone parameter, after we finish handling partition values). This suite is end-to-end test, and it's very annoying if we wanna test all changed expressions, we should write more low-level tests in `DateTimeUtilsSuite`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [WIP][SQL] Use map output statistices to improve global ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71931/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [WIP][SQL] Use map output statistices to improve global ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [WIP][SQL] Use map output statistices to improve global ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #71931 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71931/testReport)** for PR 16677 at commit [`4fb5e40`](https://github.com/apache/spark/commit/4fb5e40d6aa77dafc0eb715730f5048a74d461d6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class FakePartitioning(orgPartition: Partitioning, numPartitions: Int) extends Partitioning ` * `case class LocalLimitExec(limit: Int, child: SparkPlan) extends UnaryExecNode with CodegenSupport ` * `case class GlobalLimitExec(limit: Int, child: SparkPlan) extends UnaryExecNode ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16693: [SPARK-19152][SQL][followup] simplify CreateHiveTableAsS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16693 **[Test build #71933 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71933/testReport)** for PR 16693 at commit [`db00cf9`](https://github.com/apache/spark/commit/db00cf9061b2ad4263671f5ca9252642a091ee45). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16693: [SPARK-19152][SQL][followup] simplify CreateHiveTableAsS...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16693 cc @gatorsmile @windpiger --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16693: [SPARK-19152][SQL][followup] simplify CreateHiveT...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/16693 [SPARK-19152][SQL][followup] simplify CreateHiveTableAsSelectCommand ## What changes were proposed in this pull request? After https://github.com/apache/spark/pull/16552 , `CreateHiveTableAsSelectCommand` becomes very similar to `CreateDataSourceTableAsSelectCommand`, and we can further simplify it by only creating table in the table-not-exist branch. This PR also adds hive provider checking in DataStream reader/writer, which is missed in #16552 ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark minor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16693.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16693 commit db00cf9061b2ad4263671f5ca9252642a091ee45 Author: Wenchen Fan Date: 2017-01-24T13:35:03Z simplify CreateHiveTableAsSelectCommand --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16660 LGTM except one minor comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16660#discussion_r97545280 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala --- @@ -71,7 +73,104 @@ object UDT { } +// object and classes to test SPARK-19311 + +// Trait/Interface for base type +@SQLUserDefinedType(udt = classOf[ExampleBaseTypeUDT]) +sealed trait IExampleBaseType extends Serializable { + def field: Int +} + +// Trait/Interface for derived type +@SQLUserDefinedType(udt = classOf[ExampleSubTypeUDT]) +sealed trait IExampleSubType extends IExampleBaseType + +// a base class +class ExampleBaseClass(override val field: Int) extends IExampleBaseType { + override def toString: String = field.toString --- End diff -- @gmoehler I think we don't need `toString`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16691: [SPARK-19349][DStreams] Check resource ready to avoid mu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16691 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71930/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16691: [SPARK-19349][DStreams] Check resource ready to avoid mu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16691 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16691: [SPARK-19349][DStreams] Check resource ready to avoid mu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16691 **[Test build #71930 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71930/testReport)** for PR 16691 at commit [`0c24291`](https://github.com/apache/spark/commit/0c24291b2738d2c71b59decc60b9e33524b8f84d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16269: [SPARK-19080][SQL] simplify data source analysis
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16269 **[Test build #71932 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71932/testReport)** for PR 16269 at commit [`48535aa`](https://github.com/apache/spark/commit/48535aae6be613c28f900e408f073a5eb7ef76cb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16606 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema order a...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16606 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable sup...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16552 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16680: [WIP][SPARK-16101][SQL] Refactoring CSV schema inference...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16680 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71928/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16680: [WIP][SPARK-16101][SQL] Refactoring CSV schema inference...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16680 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16552 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16668: [SPARK-18788][SPARKR] Add API for getNumPartition...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16668#discussion_r97537116 --- Diff: R/pkg/R/DataFrame.R --- @@ -3406,3 +3406,28 @@ setMethod("randomSplit", } sapply(sdfs, dataFrame) }) + +#' getNumPartitions +#' +#' Return the number of partitions +#' Note: in order to compute the number of partition the SparkDataFrame has to be converted into a +#' RDD temporarily internally. +#' +#' @param x A SparkDataFrame +#' @family SparkDataFrame functions +#' @aliases getNumPartitions,SparkDataFrame-method +#' @rdname getNumPartitions +#' @name getNumPartitions +#' @export +#' @examples +#'\dontrun{ +#' sparkR.session() +#' df <- createDataFrame(cars, numPartitions = 2) +#' getNumPartitions(df) +#' } +#' @note getNumPartitions since 2.1.1 +setMethod("getNumPartitions", + signature(x = "SparkDataFrame"), + function(x) { +getNumPartitionsRDD(toRDD(x)) --- End diff -- maybe we can add this slow implementation to Spark 2.1, and improve it in Spark 2.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16680: [WIP][SPARK-16101][SQL] Refactoring CSV schema inference...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16680 **[Test build #71928 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71928/testReport)** for PR 16680 at commit [`0f7b9b8`](https://github.com/apache/spark/commit/0f7b9b8b17f79c83c920682f000d7c4eb4cda291). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare atomic and string type column...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15880 I have updated the PR title and description, and added a release_notes label in the ticket. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16660 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71929/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16660 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16660 **[Test build #71929 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71929/testReport)** for PR 16660 at commit [`7aed9a4`](https://github.com/apache/spark/commit/7aed9a4fada263785ce1d81acb31073ef7a401fd). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ExampleBaseTypeUDT extends UserDefinedType[IExampleBaseType] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user gmoehler commented on a diff in the pull request: https://github.com/apache/spark/pull/16660#discussion_r97534883 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala --- @@ -194,4 +293,35 @@ class UserDefinedTypeSuite extends QueryTest with SharedSQLContext with ParquetT // call `collect` to make sure this query can pass analysis. pointsRDD.as[MyLabeledPoint].map(_.copy(label = 2.0)).collect() } + + test("SPARK-19311: UDFs disregard UDT type hierarchy") { +UDTRegistration.register(classOf[IExampleBaseType].getName, --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15837: [SPARK-18395][SQL] Evaluate common subexpression like la...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15837 Close this as alternative one #16659 is merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15837: [SPARK-18395][SQL] Evaluate common subexpression ...
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/15837 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [WIP][SQL] Use map output statistices to improve global ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #71931 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71931/testReport)** for PR 16677 at commit [`4fb5e40`](https://github.com/apache/spark/commit/4fb5e40d6aa77dafc0eb715730f5048a74d461d6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16692: [SPARK-19335] Introduce UPSERT feature to SPARK
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16692 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16614: [SPARK-19260] Spaces or "%20" in path parameter are not ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16614 **[Test build #3549 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3549/testReport)** for PR 16614 at commit [`23834a6`](https://github.com/apache/spark/commit/23834a6ec99ac7e8a8df9095ce523de0ede80aea). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16692: [SPARK-19335] Introduce UPSERT feature to SPARK
GitHub user kevinyu98 opened a pull request: https://github.com/apache/spark/pull/16692 [SPARK-19335] Introduce UPSERT feature to SPARK ## What changes were proposed in this pull request? This PR proposes to add the UPSERT feature support into SPARK through DataFrameWriter's JDBC data source options. For example: If the mytable2 in mysql database have unique constraints on column c1, and the user wants to save the dataframe into the mysql database, it will fail with violation of unique constraints. `val df = Seq((1,4)).toDF("c1","c2")` `val url = "jdbc:mysql://9.30.167.220:3306/mydb"` `df.write.mode(org.apache.spark.sql.SaveMode.Append) .option("user","kevin").option("password","kevin").jdbc(url,"mytable2",new java.util.Properties())` With this feature, the user can use this UPSERT options to write the dataframe into the mysql database table. `df.write.mode(org.apache.spark.sql.SaveMode.Append) .option(âupsertâ,true).option(âupsertUpdateColumnâ,âc1â).option("user","kevin").option("password","kevin").jdbc(url,"mytable2",new java.util.Properties())` Here is the design doc. [UPSERT DESIGN DOC](https://drive.google.com/open?id=1IoafDm78v7ATP-npKbTaw2_dTFiXplCd9NzEe8CBH6E) ## How was this patch tested? Local test: run the test case from spark-shell and connect to MySQL and Postgresql database Test case: add test cases in the existing test cases including docker integration suite Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kevinyu98/spark upsert2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16692.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16692 commit 3b44c5978bd44db986621d3e8511e9165b66926b Author: Kevin Yu Date: 2016-04-20T18:06:30Z adding testcase commit 18b4a31c687b264b50aa5f5a74455956911f738a Author: Kevin Yu Date: 2016-04-22T21:48:00Z Merge remote-tracking branch 'upstream/master' commit 4f4d1c8f2801b1e662304ab2b33351173e71b427 Author: Kevin Yu Date: 2016-04-23T16:50:19Z Merge remote-tracking branch 'upstream/master' get latest code from upstream commit f5f0cbed1eb5754c04c36933b374c3b3d2ae4f4e Author: Kevin Yu Date: 2016-04-23T22:20:53Z Merge remote-tracking branch 'upstream/master' adding trim characters support commit d8b2edbd13ee9a4f057bca7dcb0c0940e8e867b8 Author: Kevin Yu Date: 2016-04-25T20:24:33Z Merge remote-tracking branch 'upstream/master' get latest code for pr12646 commit 196b6c66b0d55232f427c860c0e7c6876c216a67 Author: Kevin Yu Date: 2016-04-25T23:45:57Z Merge remote-tracking branch 'upstream/master' merge latest code commit f37a01e005f3e27ae2be056462d6eb6730933ba5 Author: Kevin Yu Date: 2016-04-27T14:15:06Z Merge remote-tracking branch 'upstream/master' merge upstream/master commit bb5b01fd3abeea1b03315eccf26762fcc23f80c0 Author: Kevin Yu Date: 2016-04-30T23:49:31Z Merge remote-tracking branch 'upstream/master' commit bde5820a181cf84e0879038ad8c4cebac63c1e24 Author: Kevin Yu Date: 2016-05-04T03:52:31Z Merge remote-tracking branch 'upstream/master' commit 5f7cd96d495f065cd04e8e4cc58461843e45bc8d Author: Kevin Yu Date: 2016-05-10T21:14:50Z Merge remote-tracking branch 'upstream/master' commit 893a49af0bfd153ccb59ba50b63a232660e0eada Author: Kevin Yu Date: 2016-05-13T18:20:39Z Merge remote-tracking branch 'upstream/master' commit 4bbe1fd4a3ebd50338ccbe07dc5887fe289cd53d Author: Kevin Yu Date: 2016-05-17T21:58:14Z Merge remote-tracking branch 'upstream/master' commit b2dd795e23c36cbbd022f07a10c0cf21c85eb421 Author: Kevin Yu Date: 2016-05-18T06:37:13Z Merge remote-tracking branch 'upstream/master' commit 8c3e5da458dbff397ed60fcb68f2a46d87ab7ba4 Author: Kevin Yu Date: 2016-05-18T16:18:16Z Merge remote-tracking branch 'upstream/master' commit a0eaa408e847fbdc3ac5b26348588ee0a1e276c7 Author: Kevin Yu Date: 2016-05-19T04:28:20Z Merge remote-tracking branch 'upstream/master' commit d03c940ed89795fa7fe1d1e9f511363b22cdf19d Author: Kevin Yu Date: 2016-05-19T21:24:33Z Merge remote-tracking branch 'upstream/master' commit d728d5e002082e571ac47292226eb8b2614f479f Author: Kevin Yu Date: 2016-05-24T20:32:57Z Merge remote-tracking branch 'upstream/master' commit ea104ddfbf7d180ed1bc53dd9a1005010264aa1f Author: Kevin Yu Date: 2016-05-25T22:52:57Z Merge remote-tracking branch 'upstream/master' commit 6ab1215b781ad0cccf1752f3a625b4e4e371c38e Author: Kevin Yu Date: 2016-05-27T17:18:46Z Merge remote-tracking branch 'upstream/master' commit 0c566533705331697eb
[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/16661#discussion_r97530588 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala --- @@ -486,6 +491,9 @@ class GaussianMixture @Since("2.0.0") ( @Since("2.0.0") object GaussianMixture extends DefaultParamsReadable[GaussianMixture] { + /** Limit number of features such that numFeatures^2^ < Integer.MaxValue */ + private[clustering] val MAX_NUM_FEATURES = 46000 --- End diff -- Is floor(sqrt(2^31-1)) = 46340 more accurate? or is there overhead that prevents this from being achievable? I know it's a corner case, but if 46000 is a number that's just "about" the real max, let's just use the real max. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12380: [SPARK-14623][ML]add label binarizer
Github user srowen commented on the issue: https://github.com/apache/spark/pull/12380 Isn't this just one-hot encoding? Spark has had this for a long time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16652: [SPARK-19234][MLLib] AFTSurvivalRegression should fail f...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16652 This is looking OK to me, but it needs a (squash, optionally, and) rebase now before it can test again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16654 Sure, and classification metrics like AUC only make sense for classifiers that output more than just a label -- they have to output a probability or score of some kind. Not every metric necessarily makes sense for every model, and we can use class hierarchy or just argument checking to avoid applying metrics where nonsensical. WSSSE can't be used for k-medoids, yes. k-medoids is also not in Spark, AFAIK. It's still not an argument to not abstract this at all. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16676: delete useless var “j”
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16676 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16658: [DOCS] Fix typo in docs
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16658 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16676: delete useless var “j”
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16676 Merged to master. Please read http://spark.apache.org/contributing.html for next time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16658: [DOCS] Fix typo in docs
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16658 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16355 Done, and it synced now. Merged to master/2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16355 It's an apache-github sync issue: https://github.com/apache/spark/commits/branch-2.1 is missing the latest commit from https://git-wip-us.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-2.1 I'll cherry-pick onto apache/branch-2.1 and push as that might also kick the sync to try again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15945 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15945 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71926/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15945 **[Test build #71926 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71926/testReport)** for PR 15945 at commit [`bea519f`](https://github.com/apache/spark/commit/bea519f2ba12312ec96884c3545f74b3bc28c4a2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16552 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71925/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16552 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16552 **[Test build #71925 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71925/testReport)** for PR 16552 at commit [`59db8e4`](https://github.com/apache/spark/commit/59db8e41ec2f5a4e090af3964ce48a61936e2ef4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare atomic and string type column...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15880 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71924/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare atomic and string type column...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15880 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema order a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16606 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71922/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15880: [SPARK-17913][SQL] compare atomic and string type column...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15880 **[Test build #71924 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71924/testReport)** for PR 15880 at commit [`a11f89b`](https://github.com/apache/spark/commit/a11f89bf5ed13b4061a29daf007a608314465a94). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema order a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16606 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema order a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16606 **[Test build #71922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71922/testReport)** for PR 16606 at commit [`72164eb`](https://github.com/apache/spark/commit/72164eb02c1b7acd836a5038fddb8bcd8225a1c6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16552 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71923/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16552 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16656: [SPARK-18116][DStream] Report stream input inform...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16656#discussion_r97519002 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala --- @@ -536,6 +539,7 @@ abstract class DStream[T: ClassTag] ( logDebug(s"${this.getClass().getSimpleName}.readObject used") ois.defaultReadObject() generatedRDDs = new HashMap[Time, RDD[T]]() +recoveredReports = new HashMap[Time, StreamInputInfo]() } --- End diff -- use recoveredReports to hold recovered report information. We can not report to `inputInfoTracker`, as jobScheduler not yet initialized here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16552 **[Test build #71923 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71923/testReport)** for PR 16552 at commit [`7bf5b50`](https://github.com/apache/spark/commit/7bf5b50c5cfba1ecb02b95c2fa9bb1ae7830ca99). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16660#discussion_r97518422 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala --- @@ -194,4 +293,35 @@ class UserDefinedTypeSuite extends QueryTest with SharedSQLContext with ParquetT // call `collect` to make sure this query can pass analysis. pointsRDD.as[MyLabeledPoint].map(_.copy(label = 2.0)).collect() } + + test("SPARK-19311: UDFs disregard UDT type hierarchy") { +UDTRegistration.register(classOf[IExampleBaseType].getName, --- End diff -- oh. if you worry about that, actually we have `UDTRegistrationSuite` for test case of `UDTRegistration`. i am fine to either `SQLUserDefinedType` or `UDTRegistration`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16691: [SPARK-19349][DStreams] Check resource ready to avoid mu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16691 **[Test build #71930 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71930/testReport)** for PR 16691 at commit [`0c24291`](https://github.com/apache/spark/commit/0c24291b2738d2c71b59decc60b9e33524b8f84d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16691: [SPARK-19349][DStreams] Check resource ready to a...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16691#discussion_r97518189 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala --- @@ -422,16 +423,36 @@ class ReceiverTracker(ssc: StreamingContext, skipReceiverLaunch: Boolean = false } /** - * Run the dummy Spark job to ensure that all slaves have registered. This avoids all the - * receivers to be scheduled on the same node. + * Wait for executors register ready. This avoids multiple receivers to be scheduled + * on the same node. Here, we check whether all resource has been registered. If not, + * and the number of receiver is larger than the number of registered executors, we + * will give once more chance to wait for remaining executors to register for + * "spark.scheduler.maxRegisteredResourcesWaitingTime" times. * - * TODO Should poll the executor number and wait for executors according to - * "spark.scheduler.minRegisteredResourcesRatio" and - * "spark.scheduler.maxRegisteredResourcesWaitingTime" rather than running a dummy job. + * This only occurs when set too small "spark.scheduler.minRegisteredResourcesRatio". */ - private def runDummySparkJob(): Unit = { + private def checkResourceReady(): Unit = { +val pollTime = 100 +val checkingStarted = System.currentTimeMillis() +val onceMoreWaitingTimeMs = + ssc.sparkContext.conf.getTimeAsMs("spark.scheduler.maxRegisteredResourcesWaitingTime", "30s") + --- End diff -- here use "spark.scheduler.maxRegisteredResourcesWaitingTime", need specific configï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16691: [SPARK-19349][DStreams] Check resource ready to a...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/16691 [SPARK-19349][DStreams] Check resource ready to avoid multiple receivers to be scheduled on the same node. ## What changes were proposed in this pull request? remove related TODO Currently, we can only ensure registered resource satisfy the "spark.scheduler.minRegisteredResourcesRatio". But if "spark.scheduler.minRegisteredResourcesRatio" is set too small, receivers may still be scheduled to few nodes. In fact, we can give once more chance to wait for sufficient resource to schedule receiver evenly. ## How was this patch tested? existing ut You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-19349 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16691.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16691 commit 0c24291b2738d2c71b59decc60b9e33524b8f84d Author: uncleGen Date: 2017-01-24T10:37:15Z SPARK-19349: Check resource ready to avoid multiple receivers to be scheduled on the same node. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16660 **[Test build #71929 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71929/testReport)** for PR 16660 at commit [`7aed9a4`](https://github.com/apache/spark/commit/7aed9a4fada263785ce1d81acb31073ef7a401fd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user gmoehler commented on the issue: https://github.com/apache/spark/pull/16660 Thanks for the valuable (and fast!) comments - i have worked them in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16660: [SPARK-19311][SQL] fix UDT hierarchy issue
Github user gmoehler commented on a diff in the pull request: https://github.com/apache/spark/pull/16660#discussion_r97512560 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala --- @@ -194,4 +293,35 @@ class UserDefinedTypeSuite extends QueryTest with SharedSQLContext with ParquetT // call `collect` to make sure this query can pass analysis. pointsRDD.as[MyLabeledPoint].map(_.copy(label = 2.0)).collect() } + + test("SPARK-19311: UDFs disregard UDT type hierarchy") { +UDTRegistration.register(classOf[IExampleBaseType].getName, --- End diff -- i tend to leave them, but remove the @SQLUserDefinedType, so we have a test that uses UDTRegistration --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16680: [WIP][SPARK-16101][SQL] Refactoring CSV schema inference...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16680 **[Test build #71928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71928/testReport)** for PR 16680 at commit [`0f7b9b8`](https://github.com/apache/spark/commit/0f7b9b8b17f79c83c920682f000d7c4eb4cda291). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16676: delete useless var “j”
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16676 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16676: delete useless var “j”
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16676 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71927/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16676: delete useless var “j”
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16676 **[Test build #71927 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71927/testReport)** for PR 16676 at commit [`cf8211a`](https://github.com/apache/spark/commit/cf8211a0057b5cc5652414eb96bb453c0e2618fa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16689: [SPARK-19342][SPARKR] bug fixed in collect method for co...
Github user titicaca commented on the issue: https://github.com/apache/spark/pull/16689 Sure. Shall I add the tests in pkg/inst/tests/testthat/test_sparkSQL.R? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16689: [SPARK-19342][SPARKR] bug fixed in collect method...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16689#discussion_r97508643 --- Diff: R/pkg/R/DataFrame.R --- @@ -1136,9 +1136,17 @@ setMethod("collect", # Note that "binary" columns behave like complex types. if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != "binary") { -vec <- do.call(c, col) +valueIndex <- which(!is.na(col)) +if (length(valueIndex) > 0 && valueIndex[1] > 1) { + colTail <- col[-(1 : (valueIndex[1] - 1))] + vec <- do.call(c, colTail) + classVal <- class(vec) + vec <- c(rep(NA, valueIndex[1] - 1), vec) + class(vec) <- classVal --- End diff -- Hmm, what happened here? if you want to drop the NA and use the rest to infer the class you can do `col[!is.na(col)]` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org