[GitHub] spark pull request #16774: [SPARK-19357][ML] Adding parallel model evaluatio...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/16774#discussion_r110979877 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/ValidatorParams.scala --- @@ -67,6 +71,39 @@ private[ml] trait ValidatorParams extends HasSeed with Params { /** @group getParam */ def getEvaluator: Evaluator = $(evaluator) + /** + * param to control the number of models evaluated in parallel + * Default: 1 + * + * @group param + */ + val numParallelEval: IntParam = new IntParam(this, "numParallelEval", +"max number of models to evaluate in parallel, 1 for serial evaluation", +ParamValidators.gtEq(1)) + + /** @group getParam */ + def getNumParallelEval: Int = $(numParallelEval) + + /** + * Creates a execution service to be used for validation, defaults to a thread-pool with + * size of `numParallelEval` + */ + protected var executorServiceFactory: (Int) => ExecutorService = { +(requestedMaxThreads: Int) => ThreadUtils.newDaemonCachedThreadPool( --- End diff -- So my thinking was that if the thread calling fit is terminated, it would have to be the JVM shutting down which would exit without waiting for these daemon threads. We don't really at what point the daemon threads stop or if they stop abruptly since any unfinished work is useless. So I'm not sure if adding a shutdownHook would do anything different? On the other hand, if the SparkSession wanted to cancel the running threads with the JVM still running, I think it could do that if it provided it's own ExecutorService. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17606: [SPARK-20291][SQL] NaNvl(FloatType, NullType) sho...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/17606#discussion_r110979361 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -571,6 +571,7 @@ object TypeCoercion { NaNvl(l, Cast(r, DoubleType)) case NaNvl(l, r) if l.dataType == FloatType && r.dataType == DoubleType => NaNvl(Cast(l, DoubleType), r) + case NaNvl(l, r) if r.dataType == NullType => NaNvl(l, Cast(r, l.dataType)) --- End diff -- Yeah, this PR prevents casting from `NaNvl(FloatType, NullType)` to `NaNvl(DoubleType, DoubleType)` since we want to minimize the casting as much as possible. Also, if we want to replace `NaN` by `null`, we want to keep the output type the same as input type. Whether `NaNvl(FloatType, DoubleType)` should be cast into `NaNvl(DoubleType, DoubleType)` is another story. I agree with you, we should downcast the replacement `DoubleType` into `FloatType`. And in my opinion, doing this implicit casting is error-prone, and we should do explicit casting by users instead. @gatorsmile maybe you can chime in, and give the feedback whether we should cast `NaNvl(FloatType, DoubleType)` to `NaNvl(DoubleType, DoubleType)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17330: [SPARK-19993][SQL] Caching logical plans containing subq...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17330 **[Test build #75710 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75710/testReport)** for PR 17330 at commit [`362d62f`](https://github.com/apache/spark/commit/362d62ff393954d37d76ac55636d50ee0b4ffcb5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17330: [SPARK-19993][SQL] Caching logical plans containi...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/17330#discussion_r110977254 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -670,4 +677,139 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext assert(spark.read.parquet(path).filter($"id" > 4).count() == 15) } } + + test("SPARK-19993 simple subquery caching") { +withTempView("t1", "t2") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + Seq(1).toDF("c1").createOrReplaceTempView("t2") --- End diff -- @cloud-fan sorry... actually i had some of these tests combined and when i split, i forgot to remove this. Will fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17330: [SPARK-19993][SQL] Caching logical plans containi...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/17330#discussion_r110977330 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -670,4 +677,139 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext assert(spark.read.parquet(path).filter($"id" > 4).count() == 15) } } + + test("SPARK-19993 simple subquery caching") { +withTempView("t1", "t2") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + Seq(1).toDF("c1").createOrReplaceTempView("t2") + + sql( +""" + |SELECT * FROM t1 + |WHERE + |NOT EXISTS (SELECT * FROM t1) +""".stripMargin).cache() + + val cachedDs = +sql( + """ +|SELECT * FROM t1 +|WHERE +|NOT EXISTS (SELECT * FROM t1) + """.stripMargin) + assert(getNumInMemoryRelations(cachedDs) == 1) + + // Additional predicate in the subquery plan should cause a cache miss + val cachedMissDs = + sql( +""" + |SELECT * FROM t1 + |WHERE + |NOT EXISTS (SELECT * FROM t1 where c1 = 0) +""".stripMargin) + assert(getNumInMemoryRelations(cachedMissDs) == 0) +} + } + + test("SPARK-19993 subquery caching with correlated predicates") { +withTempView("t1", "t2") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + Seq(1).toDF("c1").createOrReplaceTempView("t2") + + // Simple correlated predicate in subquery + sql( +""" + |SELECT * FROM t1 + |WHERE + |t1.c1 in (SELECT t2.c1 FROM t2 where t1.c1 = t2.c1) +""".stripMargin).cache() + + val cachedDs = +sql( + """ +|SELECT * FROM t1 +|WHERE +|t1.c1 in (SELECT t2.c1 FROM t2 where t1.c1 = t2.c1) + """.stripMargin) + assert(getNumInMemoryRelations(cachedDs) == 1) +} + } + + test("SPARK-19993 subquery with cached underlying relation") { +withTempView("t1", "t2") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + Seq(1).toDF("c1").createOrReplaceTempView("t2") --- End diff -- @cloud-fan sorry... actually i had some of these tests combined and when i split, i forgot to remove this. Will fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17609: [SPARK-20296][TRIVIAL][DOCS] Count distinct error messag...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17609 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17609: [SPARK-20296][TRIVIAL][DOCS] Count distinct error...
GitHub user jtoka opened a pull request: https://github.com/apache/spark/pull/17609 [SPARK-20296][TRIVIAL][DOCS] Count distinct error message for streaming ## What changes were proposed in this pull request? Update count distinct error message for streaming datasets/dataframes to match current behavior. These aggregations are not yet supported, regardless of whether the dataset/dataframe is aggregated. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jtoka/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17609.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17609 commit a4d34c5bcfe53ef05c56f8ce6838bbcda30c9f7e Author: jtoka Date: 2017-04-11T18:13:55Z Count distinct error message Update count distinct error message for streaming datasets/dataframes to match current behavior. These aggregations are not yet supported. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17330: [SPARK-19993][SQL] Caching logical plans containi...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/17330#discussion_r110976069 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -76,6 +76,13 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext sum } + private def getNumInMemoryTableScanExecs(plan: SparkPlan): Int = { --- End diff -- @cloud-fan So we are operating at the physical plan level in this method where as the other method getNumInMemoryRelations operates at a logical plan level. And in here we are simply counting the the InMemoryTableScanExec nodes in the plan. I have changed the function name to getNumInMemoryTablesRecursively. Does it look ok to you ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17604: [SPARK-20289][SQL] Use StaticInvoke to box primit...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17604 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17604: [SPARK-20289][SQL] Use StaticInvoke to box primitive typ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17604 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17295 > LGTM, cc @mallman to check the unmap part LGTM, too. Sorry for the late reply... I've been away the past two weeks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17436 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75709/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17436 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17436 **[Test build #75709 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75709/testReport)** for PR 17436 at commit [`6443f59`](https://github.com/apache/spark/commit/6443f59754fec2330fc81e201ae28c7709da9f65). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #7652: [SPARK-9312] [ML] Added max confidence factor to OneVsRes...
Github user AxenGitHub commented on the issue: https://github.com/apache/spark/pull/7652 Is there any news on this branch? we would benefit a lot from this feature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String t...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17527 The general idea is to leave any lower-casing that affects strings in the user program alone, to use the locale-sensitive `toLowerCase()`. This is more conservative. All of the changes should only affect internal strings or API values, where there is no reason to be locale-specific. For example: checking a property value against a known list of enum string values in a case-insensitive way. This should address the underlying problem, where lower-casing an internal property results int he wrong result in the Turkish locale, without changing the results of a user program. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17455: [Spark-20044][Web UI] Support Spark UI behind front-end ...
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/17455 It seems it didn't take @holdenk ok, @vanzin mind okaying this to test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17608: [SPARK-20293][WEB UI][History]In the page of 'jobs' or '...
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/17608 @guoxiaolongzte This seems familiar, are you using the latest version of Knox with your Spark UI? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17527: [SPARK-20156][CORE][SQL][STREAMING][MLLIB] Java String t...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17527 I am wondering what is the reason some of `toLowerCase` is changed, but the others remain unchanged? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17598: [SPARK-20284][CORE] Make {Des,S}erializationStrea...
Github user superbobry commented on a diff in the pull request: https://github.com/apache/spark/pull/17598#discussion_r110950456 --- Diff: core/src/main/scala/org/apache/spark/serializer/Serializer.scala --- @@ -125,7 +125,7 @@ abstract class SerializerInstance { * A stream for writing serialized objects. */ @DeveloperApi -abstract class SerializationStream { +abstract class SerializationStream extends Closeable { --- End diff -- Sure, added that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...
Github user johnc1231 commented on the issue: https://github.com/apache/spark/pull/17459 @viirya Do you have any more comments on this, or are you happy with it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9571 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75708/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9571 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9571 **[Test build #75708 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75708/testReport)** for PR 9571 at commit [`8903dcf`](https://github.com/apache/spark/commit/8903dcfe2b927c8fc3fed9df3e9939670a016944). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17491: [SPARK-20175][SQL] Exists should not be evaluated in Joi...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17491 I think the current approach will have a LeftSemi join for this Exists subquery. Is it far from the optimal access plan you said? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "spark.m...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17436 **[Test build #75709 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75709/testReport)** for PR 17436 at commit [`6443f59`](https://github.com/apache/spark/commit/6443f59754fec2330fc81e201ae28c7709da9f65). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17150: [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17150 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75707/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/17436#discussion_r110923167 --- Diff: core/src/main/java/org/apache/spark/memory/MemoryConsumer.java --- @@ -41,7 +41,7 @@ protected MemoryConsumer(TaskMemoryManager taskMemoryManager, long pageSize, Mem } protected MemoryConsumer(TaskMemoryManager taskMemoryManager) { --- End diff -- [This test code](https://github.com/apache/spark/blob/master/core/src/test/java/org/apache/spark/memory/TestMemoryConsumer.java#L24) is only the case that specifies memory mode different from `TaskMemoryManager.getTungstenMemoryMode()`. To simplify the code, I have just replaced `MemoryConsumer(taskMemoryManager, pageSize, taskMemoryManager.getTungstenMemoryMode())` with `MemoryConsumer(taskMemoryManager, pageSize)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17150: [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17150 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17150: [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17150 **[Test build #75707 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75707/testReport)** for PR 17150 at commit [`cccfbdf`](https://github.com/apache/spark/commit/cccfbdf5d0c762b13c65986ea6fa06a06cb394a4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17520: [WIP][SPARK-19712][SQL] Move PullupCorrelatedPredicates ...
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/17520 @cloud-fan: would you be interested in reviewing this PR since I have not heard from @hvanhovell for a while? Note this is a WIP and I want to hear your feedback on the issues I put in the comments along with the code. The code, as it is, is to preserve the current behaviour but not necessary a desired one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17491: [SPARK-20175][SQL] Exists should not be evaluated in Joi...
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/17491 @cloud-fan wrote: "How useful is this optimization? It only works when Exists has no condition, is that a common case?" One of the common cases of this usage is an application of ACL where the application asks the database whether the user has a proper authority to access a certain set of data or not. Ex: select ... from controlled_table where exists (select 1 from acl_table where user = CURRENT_USER and role = ...) From a runtime perspective, an optimal access plan is placing the ACL_TABLE as an outer of a nested-loop join with a semantic to fetch only the first qualified row, once the row exists, continue to process the inner table, CONTROLLED_TABLE, or avoiding access the inner completely if no qualified row from the outer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...
Github user squito commented on the issue: https://github.com/apache/spark/pull/16781 @ueshin thanks for taking a look earlier, sorry it has taken me some time to update this. Things to note since last time: 1) Hive has seen been updated in [HIVE-16231](https://issues.apache.org/jira/browse/HIVE-16231) to use the local timezone, not GMT, as the default for storing data. Really, this is the change that should have been in HIVE-12767 -- otherwise you lose backwards compatibility with old datasets. 2) This PR now uses the session time zone, rather than local timezone. There are tests to confirm that a mix of session timezone X storage timezone works correctly. 3) Predicate pushdown is handled. I actually didn't need to change the behavior at all, since predicates are never pushed to int96 -- but there are some tests that confirm this. I'm sure there is some minor cleanup that could be done, but overall I think this is ready now. I'd appreciate if you take another look and any suggestions you can make. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17598: [SPARK-20284][CORE] Make {Des,S}erializationStream exten...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17598 **[Test build #3659 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3659/testReport)** for PR 17598 at commit [`75ba026`](https://github.com/apache/spark/commit/75ba026db26171e0ed59d48d0ab2855f2a2af757). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class SerializationStream extends Closeable ` * `abstract class DeserializationStream extends Closeable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17606: [SPARK-20291][SQL] NaNvl(FloatType, NullType) sho...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17606#discussion_r110901338 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala --- @@ -656,14 +656,20 @@ class TypeCoercionSuite extends PlanTest { test("nanvl casts") { ruleTest(TypeCoercion.FunctionArgumentConversion, - NaNvl(Literal.create(1.0, FloatType), Literal.create(1.0, DoubleType)), - NaNvl(Cast(Literal.create(1.0, FloatType), DoubleType), Literal.create(1.0, DoubleType))) + NaNvl(Literal.create(1.0f, FloatType), Literal.create(1.0, DoubleType)), + NaNvl(Cast(Literal.create(1.0f, FloatType), DoubleType), Literal.create(1.0, DoubleType))) ruleTest(TypeCoercion.FunctionArgumentConversion, - NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0, FloatType)), - NaNvl(Literal.create(1.0, DoubleType), Cast(Literal.create(1.0, FloatType), DoubleType))) + NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0f, FloatType)), + NaNvl(Literal.create(1.0, DoubleType), Cast(Literal.create(1.0f, FloatType), DoubleType))) ruleTest(TypeCoercion.FunctionArgumentConversion, NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0, DoubleType)), NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0, DoubleType))) +ruleTest(TypeCoercion.FunctionArgumentConversion, + NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, NullType)), + NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, FloatType))) +ruleTest(TypeCoercion.FunctionArgumentConversion, + NaNvl(Literal.create(1.0, DoubleType), Literal.create(null, NullType)), + NaNvl(Literal.create(1.0, DoubleType), Literal.create(null, DoubleType))) --- End diff -- then this should be `Cast(Literal.create(null, NullType), DoubleType)`, I think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17606: [SPARK-20291][SQL] NaNvl(FloatType, NullType) sho...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17606#discussion_r110901088 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala --- @@ -656,14 +656,20 @@ class TypeCoercionSuite extends PlanTest { test("nanvl casts") { ruleTest(TypeCoercion.FunctionArgumentConversion, - NaNvl(Literal.create(1.0, FloatType), Literal.create(1.0, DoubleType)), - NaNvl(Cast(Literal.create(1.0, FloatType), DoubleType), Literal.create(1.0, DoubleType))) + NaNvl(Literal.create(1.0f, FloatType), Literal.create(1.0, DoubleType)), + NaNvl(Cast(Literal.create(1.0f, FloatType), DoubleType), Literal.create(1.0, DoubleType))) ruleTest(TypeCoercion.FunctionArgumentConversion, - NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0, FloatType)), - NaNvl(Literal.create(1.0, DoubleType), Cast(Literal.create(1.0, FloatType), DoubleType))) + NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0f, FloatType)), + NaNvl(Literal.create(1.0, DoubleType), Cast(Literal.create(1.0f, FloatType), DoubleType))) ruleTest(TypeCoercion.FunctionArgumentConversion, NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0, DoubleType)), NaNvl(Literal.create(1.0, DoubleType), Literal.create(1.0, DoubleType))) +ruleTest(TypeCoercion.FunctionArgumentConversion, + NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, NullType)), + NaNvl(Literal.create(1.0f, FloatType), Literal.create(null, FloatType))) --- End diff -- oh. `Literal.create(null, NullType)` should be `Cast(Literal.create(null, NullType), FloatType)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9571 **[Test build #75708 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75708/testReport)** for PR 9571 at commit [`8903dcf`](https://github.com/apache/spark/commit/8903dcfe2b927c8fc3fed9df3e9939670a016944). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17436#discussion_r110891992 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -351,11 +351,12 @@ class ParquetFileFormat if (pushed.isDefined) { ParquetInputFormat.setFilterPredicate(hadoopAttemptContext.getConfiguration, pushed.get) } + val taskContext = Option(TaskContext.get()) val parquetReader = if (enableVectorizedReader) { val vectorizedReader = new VectorizedParquetRecordReader() vectorizedReader.initialize(split, hadoopAttemptContext) logDebug(s"Appending $partitionSchema ${file.partitionValues}") -vectorizedReader.initBatch(partitionSchema, file.partitionValues) +vectorizedReader.initBatch(partitionSchema, file.partitionValues, taskContext.isDefined) --- End diff -- `taskContext.isDefined` means enable off heap? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17491: [SPARK-20175][SQL] Exists should not be evaluated...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17491 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17491: [SPARK-20175][SQL] Exists should not be evaluated in Joi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17491 LGTM, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17436: [SPARK-20101][SQL] Use OffHeapColumnVector when "...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17436#discussion_r110889293 --- Diff: core/src/test/scala/org/apache/spark/memory/StaticMemoryManagerSuite.scala --- @@ -48,7 +48,10 @@ class StaticMemoryManagerSuite extends MemoryManagerSuite { conf.clone .set("spark.memory.fraction", "1") .set("spark.testing.memory", maxOnHeapExecutionMemory.toString) -.set("spark.memory.offHeap.size", maxOffHeapExecutionMemory.toString), +.set("spark.memory.offHeap.size", + if (maxOffHeapExecutionMemory != 0L) { maxOffHeapExecutionMemory.toString } else { +conf.get("spark.memory.offHeap.size", maxOffHeapExecutionMemory.toString) --- End diff -- why this change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17587: [SPARK-20274][SQL] support compatible array eleme...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17587 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17589: [SPARK-16544][SQL] Support for conversion from numeric c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17589 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75705/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17589: [SPARK-16544][SQL] Support for conversion from numeric c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17589 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17589: [SPARK-16544][SQL] Support for conversion from numeric c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17589 **[Test build #75705 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75705/testReport)** for PR 17589 at commit [`cbf8a22`](https://github.com/apache/spark/commit/cbf8a224e9cb5744fd340a4f835bdf07cfdf5543). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17491: [SPARK-20175][SQL] Exists should not be evaluated in Joi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17491 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75703/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17491: [SPARK-20175][SQL] Exists should not be evaluated in Joi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17491 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17587: [SPARK-20274][SQL] support compatible array element type...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17587 thanks for the review, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17491: [SPARK-20175][SQL] Exists should not be evaluated in Joi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17491 **[Test build #75703 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75703/testReport)** for PR 17491 at commit [`24ae5ce`](https://github.com/apache/spark/commit/24ae5ce866f82641470ed9598fad9fece450313c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17330: [SPARK-19993][SQL] Caching logical plans containing subq...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17330 LGTM except some minor comments about test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17330: [SPARK-19993][SQL] Caching logical plans containi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17330#discussion_r110885997 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -76,6 +76,13 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext sum } + private def getNumInMemoryTableScanExecs(plan: SparkPlan): Int = { --- End diff -- we need a better name, this actually get in-memory table recursively, which is different from `getNumInMemoryRelations` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17330: [SPARK-19993][SQL] Caching logical plans containi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17330#discussion_r110885627 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -670,4 +677,139 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext assert(spark.read.parquet(path).filter($"id" > 4).count() == 15) } } + + test("SPARK-19993 simple subquery caching") { +withTempView("t1", "t2") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + Seq(1).toDF("c1").createOrReplaceTempView("t2") + + sql( +""" + |SELECT * FROM t1 + |WHERE + |NOT EXISTS (SELECT * FROM t1) +""".stripMargin).cache() + + val cachedDs = +sql( + """ +|SELECT * FROM t1 +|WHERE +|NOT EXISTS (SELECT * FROM t1) + """.stripMargin) + assert(getNumInMemoryRelations(cachedDs) == 1) + + // Additional predicate in the subquery plan should cause a cache miss + val cachedMissDs = + sql( +""" + |SELECT * FROM t1 + |WHERE + |NOT EXISTS (SELECT * FROM t1 where c1 = 0) +""".stripMargin) + assert(getNumInMemoryRelations(cachedMissDs) == 0) +} + } + + test("SPARK-19993 subquery caching with correlated predicates") { +withTempView("t1", "t2") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + Seq(1).toDF("c1").createOrReplaceTempView("t2") + + // Simple correlated predicate in subquery + sql( +""" + |SELECT * FROM t1 + |WHERE + |t1.c1 in (SELECT t2.c1 FROM t2 where t1.c1 = t2.c1) +""".stripMargin).cache() + + val cachedDs = +sql( + """ +|SELECT * FROM t1 +|WHERE +|t1.c1 in (SELECT t2.c1 FROM t2 where t1.c1 = t2.c1) + """.stripMargin) + assert(getNumInMemoryRelations(cachedDs) == 1) +} + } + + test("SPARK-19993 subquery with cached underlying relation") { +withTempView("t1", "t2") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + Seq(1).toDF("c1").createOrReplaceTempView("t2") --- End diff -- where is `t2` used? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17330: [SPARK-19993][SQL] Caching logical plans containi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17330#discussion_r110885501 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -670,4 +677,139 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext assert(spark.read.parquet(path).filter($"id" > 4).count() == 15) } } + + test("SPARK-19993 simple subquery caching") { +withTempView("t1", "t2") { + Seq(1).toDF("c1").createOrReplaceTempView("t1") + Seq(1).toDF("c1").createOrReplaceTempView("t2") --- End diff -- where is `t2` used? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17150: [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17150 **[Test build #75707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75707/testReport)** for PR 17150 at commit [`cccfbdf`](https://github.com/apache/spark/commit/cccfbdf5d0c762b13c65986ea6fa06a06cb394a4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75699/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #75699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75699/testReport)** for PR 16677 at commit [`b8a2275`](https://github.com/apache/spark/commit/b8a22755bfdef8f1ab78016aea6914155ada67c1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12574: [SPARK-13857][ML][WIP] Add "recommend all" functi...
Github user MLnick closed the pull request at: https://github.com/apache/spark/pull/12574 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17598: [SPARK-20284][CORE] Make {Des,S}erializationStream exten...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17598 **[Test build #3659 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3659/testReport)** for PR 17598 at commit [`75ba026`](https://github.com/apache/spark/commit/75ba026db26171e0ed59d48d0ab2855f2a2af757). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17608: [SPARK-20293][WEB UI][History]In the page of 'jobs' or '...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/17608 Is this, the only way to encode, will not let the browser to escape our special characters.The page will not be error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17587: [SPARK-20274][SQL] support compatible array element type...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17587 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17587: [SPARK-20274][SQL] support compatible array element type...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17587 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75701/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17608: [SPARK-20293][WEB UI][History]In the page of 'jobs' or '...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17608 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17587: [SPARK-20274][SQL] support compatible array element type...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17587 **[Test build #75701 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75701/testReport)** for PR 17587 at commit [`17a308b`](https://github.com/apache/spark/commit/17a308b7aaee44a6c807c21dea4ebaf79d48f34f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17608: [SPARK-20293][WEB UI][History]In the page of 'jobs' or '...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17608 I don't quite understand, in that you say that the problem was URL-encoding the URL, but the solution here is to URL-encode it again. Is that right? maybe you can show a more concrete example of the URL as generated by the UI, and exactly what it is interpreted as, and the error page. This isn't very clear now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17608: [SPARK-20293][WEB UI][History]In the page of 'job...
GitHub user guoxiaolongzte opened a pull request: https://github.com/apache/spark/pull/17608 [SPARK-20293][WEB UI][History]In the page of 'jobs' or 'stages' of history server web ui,,click the 'Go' button, query paging data, the page error ## What changes were proposed in this pull request? In the page of 'jobs' or 'stages' of history server web ui, Click on the 'Go' button, query paging data, the page error, function can not be used. The reasons are as follows: '#' Was escaped by the browser as% 23. & CompletedStage.desc = true% 23completed, the parameter value desc becomes = true% 23, causing the page to report an error. The error is as follows: HTTP ERROR 400 Problem Access / history / app-20170411132432-0004 / stages /. Reason: For input string: "true # completed" Powered by Jetty: // The amendments are as follows: The URL of the accessed URL is escaped to ensure that the URL is not escaped by the browser. please see attachment of 'https://issues.apache.org/jira/browse/SPARK-20293'. ## How was this patch tested? manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/guoxiaolongzte/spark SPARK-20293 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17608.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17608 commit d383efba12c66addb17006dea107bb0421d50bc3 Author: éå°é¾ 10207633 Date: 2017-03-31T13:57:09Z [SPARK-20177]Document about compression way has some little detail changes. commit 3059013e9d2aec76def14eb314b6761bea0e7ca0 Author: éå°é¾ 10207633 Date: 2017-04-01T01:38:02Z [SPARK-20177] event log add a space commit 555cef88fe09134ac98fd0ad056121c7df2539aa Author: guoxiaolongzte Date: 2017-04-02T00:16:08Z '/applications/[app-id]/jobs' in rest api,status should be [running|succeeded|failed|unknown] commit 46bb1ad3ddd9fb55b5607ac4f20213a90186cfe9 Author: éå°é¾ 10207633 Date: 2017-04-05T03:16:50Z Merge branch 'master' of https://github.com/apache/spark into SPARK-20177 commit 0efb0dd9e404229cce638fe3fb0c966276784df7 Author: éå°é¾ 10207633 Date: 2017-04-05T03:47:53Z [SPARK-20218]'/applications/[app-id]/stages' in REST API,add description. commit 0e37fdeee28e31fc97436dabd001d3c85c5a7794 Author: éå°é¾ 10207633 Date: 2017-04-05T05:22:54Z [SPARK-20218] '/applications/[app-id]/stages/[stage-id]' in REST API,remove redundant description. commit 52641bb01e55b48bd9e8579fea217439d14c7dc7 Author: éå°é¾ 10207633 Date: 2017-04-07T06:24:58Z Merge branch 'SPARK-20218' commit d3977c9cab0722d279e3fae7aacbd4eb944c22f6 Author: éå°é¾ 10207633 Date: 2017-04-08T07:13:02Z Merge branch 'master' of https://github.com/apache/spark commit 137b90e5a85cde7e9b904b3e5ea0bb52518c4716 Author: éå°é¾ 10207633 Date: 2017-04-10T05:13:40Z Merge branch 'master' of https://github.com/apache/spark commit 0fe5865b8022aeacdb2d194699b990d8467f7a0a Author: éå°é¾ 10207633 Date: 2017-04-10T10:25:22Z Merge branch 'SPARK-20190' of https://github.com/guoxiaolongzte/spark commit cf6f42ac84466960f2232c025b8faeb5d7378fe1 Author: éå°é¾ 10207633 Date: 2017-04-10T10:26:27Z Merge branch 'master' of https://github.com/apache/spark commit 9c1d634b9efe7cdd85e80d742e269aa69fd9994d Author: éå°é¾ 10207633 Date: 2017-04-11T06:38:01Z Merge branch 'master' of https://github.com/apache/spark commit 6c62262bebe5fc8d5473b7fcc2fdb2656e4f8cc0 Author: éå°é¾ 10207633 Date: 2017-04-11T10:46:58Z Merge branch 'master' of https://github.com/apache/spark commit 1b22cfb8e13918d52a498e8d46b3a0c5c236d121 Author: éå°é¾ 10207633 Date: 2017-04-11T11:03:01Z [SPARK-20293]In the page of 'jobs' or 'stages' of history server web ui,,click the 'Go' button, query paging data, the page error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17607: [DOCS] Add docstrings to non-operator binary ops in pysp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17607 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17607: [DOCS] Add docstrings to non-operator binary ops in pysp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17607 **[Test build #75706 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75706/testReport)** for PR 17607 at commit [`3497d12`](https://github.com/apache/spark/commit/3497d12d75db86b6a21c1c1bc5e5b9802deb19a9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17607: [DOCS] Add docstrings to non-operator binary ops in pysp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17607 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75706/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17606: [SPARK-20291][SQL] NaNvl(FloatType, NullType) should not...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17606 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17606: [SPARK-20291][SQL] NaNvl(FloatType, NullType) should not...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17606 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75702/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17607: [DOCS] Add docstrings to non-operator binary ops ...
Github user zero323 closed the pull request at: https://github.com/apache/spark/pull/17607 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17606: [SPARK-20291][SQL] NaNvl(FloatType, NullType) should not...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17606 **[Test build #75702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75702/testReport)** for PR 17606 at commit [`fa5e1af`](https://github.com/apache/spark/commit/fa5e1aff1319a75e89da8baf48f06b223b17eb8c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17604: [SPARK-20289][SQL] Use StaticInvoke to box primitive typ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17604 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17604: [SPARK-20289][SQL] Use StaticInvoke to box primitive typ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17604 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75698/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17604: [SPARK-20289][SQL] Use StaticInvoke to box primitive typ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17604 **[Test build #75698 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75698/testReport)** for PR 17604 at commit [`8cbc617`](https://github.com/apache/spark/commit/8cbc617ee528ab92a755995a03b9ebefc2eb03a4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17607: [DOCS] Add docstrings to non-operator binary ops in pysp...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17607 Or I believe one of both PRs could handle all of them. Cc @map222. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17607: [DOCS] Add docstrings to non-operator binary ops in pysp...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17607 Actually, there is a similar PR - https://github.com/apache/spark/pull/17469. How about doing only non-duplicated ones? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17607: [DOCS] Add docstrings to non-operator binary ops in pysp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17607 **[Test build #75706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75706/testReport)** for PR 17607 at commit [`3497d12`](https://github.com/apache/spark/commit/3497d12d75db86b6a21c1c1bc5e5b9802deb19a9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17607: [DOCS] Add docstrings to non-operator binary ops in pysp...
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/17607 cc @holdenk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17607: [DOCS] Add docstrings to non-operator binary ops ...
GitHub user zero323 opened a pull request: https://github.com/apache/spark/pull/17607 [DOCS] Add docstrings to non-operator binary ops in pyspark.sql.Column ## What changes were proposed in this pull request? Add docstrings to the following `pyspark.sql.Column` binary ops: - `bitwiseOR`, `bitwiseAND`, `bitwiseXOR`. - `contains`, `rlike`, `like`, `startswith`, `endswith`. ## How was this patch tested? Manual tests, docs build. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zero323/spark BINARYOPS-DOCSTRINGS Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17607.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17607 commit 3497d12d75db86b6a21c1c1bc5e5b9802deb19a9 Author: zero323 Date: 2017-04-11T10:24:14Z Add docstrings to selected binary ops --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17606: [SPARK-20291][SQL] NaNvl(FloatType, NullType) should not...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17606 LGTM, except for a question which might not be related to this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17533: [WIP][SPARK-20219] Schedule tasks based on size of input...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17533 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17533: [WIP][SPARK-20219] Schedule tasks based on size of input...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17533 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75697/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17533: [WIP][SPARK-20219] Schedule tasks based on size of input...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17533 **[Test build #75697 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75697/testReport)** for PR 17533 at commit [`e3a15c3`](https://github.com/apache/spark/commit/e3a15c3fffd699770738caff2e03f066bf0e149c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17589: [SPARK-16544][SQL] Support for conversion from numeric c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17589 **[Test build #75705 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75705/testReport)** for PR 17589 at commit [`cbf8a22`](https://github.com/apache/spark/commit/cbf8a224e9cb5744fd340a4f835bdf07cfdf5543). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17606: [SPARK-20291][SQL] NaNvl(FloatType, NullType) sho...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17606#discussion_r110864960 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -571,6 +571,7 @@ object TypeCoercion { NaNvl(l, Cast(r, DoubleType)) case NaNvl(l, r) if l.dataType == FloatType && r.dataType == DoubleType => NaNvl(Cast(l, DoubleType), r) + case NaNvl(l, r) if r.dataType == NullType => NaNvl(l, Cast(r, l.dataType)) --- End diff -- One question I have is, why `NaNvl(FloatType, DoubleType)` should be cast to `NaNvl(DoubleType, DoubleType)`, but `NaNvl(FloatType, NullType)` should not be cast to `NaNvl(DoubleType, DoubleType)`? They all change the input type from `FloatType` to `DoubleType`. Won't the first cast cause mismatching? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17603: [SPARK-20288] Avoid generating the MapStatus by stageId ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17603 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17603: [SPARK-20288] Avoid generating the MapStatus by stageId ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17603 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75696/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17603: [SPARK-20288] Avoid generating the MapStatus by stageId ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17603 **[Test build #75696 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75696/testReport)** for PR 17603 at commit [`5a93693`](https://github.com/apache/spark/commit/5a93693debb733154cb9f5916d3b8ee1d2d2b2e5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9571 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9571 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75704/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9571 **[Test build #75704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75704/testReport)** for PR 9571 at commit [`ec1f2d7`](https://github.com/apache/spark/commit/ec1f2d7f8743ce6de3e83f2f9a82f1c940c8be52). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9571: [SPARK-11373] [CORE] Add metrics to the History Server an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9571 **[Test build #75704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75704/testReport)** for PR 9571 at commit [`ec1f2d7`](https://github.com/apache/spark/commit/ec1f2d7f8743ce6de3e83f2f9a82f1c940c8be52). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17491: [SPARK-20175][SQL] Exists should not be evaluated in Joi...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17491 @cloud-fan The optimization rule is removed now. This patch now is just making `Exists` subquery without correlated references work. Please take a look again. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17491: [SPARK-20175][SQL] Exists should not be evaluated in Joi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17491 **[Test build #75703 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75703/testReport)** for PR 17491 at commit [`24ae5ce`](https://github.com/apache/spark/commit/24ae5ce866f82641470ed9598fad9fece450313c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17491: [SPARK-20175][SQL] Exists should not be evaluated...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17491#discussion_r110859934 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -498,3 +498,32 @@ object RewriteCorrelatedScalarSubquery extends Rule[LogicalPlan] { } } } + +/** + * This rule rewrites a EXISTS predicate sub-queries into an Aggregate with count. + * So it doesn't be converted to a JOIN later. + */ +object RewriteEmptyExists extends Rule[LogicalPlan] with PredicateHelper { + private def containsAgg(plan: LogicalPlan): Boolean = { +plan.collect { + case a: Aggregate => a +}.nonEmpty + } + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case Filter(condition, child) => + val (withSubquery, withoutSubquery) = + splitConjunctivePredicates(condition).partition(SubqueryExpression.hasInOrExistsSubquery) + val newWithSubquery = withSubquery.map(_.transform { +case e @ Exists(sub, conditions, exprId) if conditions.isEmpty && !containsAgg(sub) => + val countExpr = Alias(Count(Literal(1)).toAggregateExpression(), "count")() + val expr = Alias(GreaterThan(countExpr.toAttribute, Literal(0)), e.toString)() + ScalarSubquery( +Project(Seq(expr), + Aggregate(Nil, Seq(countExpr), LocalLimit(Literal(1), sub))), --- End diff -- Btw, I am not very sure this early-out can benefit the general usage, except for this kind of special case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17491: [SPARK-20175][SQL] Exists should not be evaluated...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17491#discussion_r110858379 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -498,3 +498,32 @@ object RewriteCorrelatedScalarSubquery extends Rule[LogicalPlan] { } } } + +/** + * This rule rewrites a EXISTS predicate sub-queries into an Aggregate with count. + * So it doesn't be converted to a JOIN later. + */ +object RewriteEmptyExists extends Rule[LogicalPlan] with PredicateHelper { + private def containsAgg(plan: LogicalPlan): Boolean = { +plan.collect { + case a: Aggregate => a +}.nonEmpty + } + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case Filter(condition, child) => + val (withSubquery, withoutSubquery) = + splitConjunctivePredicates(condition).partition(SubqueryExpression.hasInOrExistsSubquery) + val newWithSubquery = withSubquery.map(_.transform { +case e @ Exists(sub, conditions, exprId) if conditions.isEmpty && !containsAgg(sub) => + val countExpr = Alias(Count(Literal(1)).toAggregateExpression(), "count")() + val expr = Alias(GreaterThan(countExpr.toAttribute, Literal(0)), e.toString)() + ScalarSubquery( +Project(Seq(expr), + Aggregate(Nil, Seq(countExpr), LocalLimit(Literal(1), sub))), --- End diff -- We can address the early-out in other work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17491: [SPARK-20175][SQL] Exists should not be evaluated...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17491#discussion_r110858303 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -498,3 +498,32 @@ object RewriteCorrelatedScalarSubquery extends Rule[LogicalPlan] { } } } + +/** + * This rule rewrites a EXISTS predicate sub-queries into an Aggregate with count. + * So it doesn't be converted to a JOIN later. + */ +object RewriteEmptyExists extends Rule[LogicalPlan] with PredicateHelper { + private def containsAgg(plan: LogicalPlan): Boolean = { +plan.collect { + case a: Aggregate => a +}.nonEmpty + } + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case Filter(condition, child) => + val (withSubquery, withoutSubquery) = + splitConjunctivePredicates(condition).partition(SubqueryExpression.hasInOrExistsSubquery) + val newWithSubquery = withSubquery.map(_.transform { +case e @ Exists(sub, conditions, exprId) if conditions.isEmpty && !containsAgg(sub) => + val countExpr = Alias(Count(Literal(1)).toAggregateExpression(), "count")() + val expr = Alias(GreaterThan(countExpr.toAttribute, Literal(0)), e.toString)() + ScalarSubquery( +Project(Seq(expr), + Aggregate(Nil, Seq(countExpr), LocalLimit(Literal(1), sub))), --- End diff -- I think it is a special case. Then I will remove this optimization and minimize this pr's change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17455: [Spark-20044][Web UI] Support Spark UI behind front-end ...
Github user okoethibm commented on the issue: https://github.com/apache/spark/pull/17455 @ajbozarth Any other comments on this PR? Why is it not testing even though it has an "ok to test"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org