[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20853 **[Test build #88430 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88430/testReport)** for PR 20853 at commit [`5515da6`](https://github.com/apache/spark/commit/5515da6d1517ca016f1b67048f9976571731343c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20853 **[Test build #88429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88429/testReport)** for PR 20853 at commit [`5515da6`](https://github.com/apache/spark/commit/5515da6d1517ca016f1b67048f9976571731343c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20853 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r175854733 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -546,6 +546,10 @@ case class DataSource( case dataSource: CreatableRelationProvider => SaveIntoDataSourceCommand(data, dataSource, caseInsensitiveOptions, mode) case format: FileFormat => +if (DataSource.isBuiltInFileBasedDataSource(format) && data.schema.size == 0) { --- End diff -- We don't need this check. `FileFormat` is internal, we don't need to distinguish between built-in file format or external ones. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible with d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20866 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1647/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible with d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20866 **[Test build #88428 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88428/testReport)** for PR 20866 at commit [`897ebd0`](https://github.com/apache/spark/commit/897ebd0035d7825e16b8dfec1f27d5b20b054f45). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible with d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20866 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20851#discussion_r175854222 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -50,6 +52,12 @@ private[parquet] object ParquetFilters { (n: String, v: Any) => FilterApi.eq( binaryColumn(n), Option(v).map(b => Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]])).orNull) +case DateType if SQLConf.get.parquetFilterPushDownDate => + (n: String, v: Any) => FilterApi.eq( + intColumn(n), + Option(v).map { d => + DateTimeUtils.fromJavaDate(d.asInstanceOf[java.sql.Date]).asInstanceOf[Integer] --- End diff -- I think we should do `DateTimeUtils.fromJavaDate(d.asInstanceOf[java.sql.Date], SQLConf.get.sessionLocalTimeZone).asInstanceOf[Integer]` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20851#discussion_r175853868 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -50,6 +52,12 @@ private[parquet] object ParquetFilters { (n: String, v: Any) => FilterApi.eq( binaryColumn(n), Option(v).map(b => Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]])).orNull) +case DateType if SQLConf.get.parquetFilterPushDownDate => + (n: String, v: Any) => FilterApi.eq( + intColumn(n), + Option(v).map { d => + DateTimeUtils.fromJavaDate(d.asInstanceOf[java.sql.Date]).asInstanceOf[Integer] --- End diff -- shall we respect session local timezone here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20866 [SPARK-23749][SQL] Avoid Hive.get() to compatible with different Hive metastore ## What changes were proposed in this pull request? Avoid `Hive.get()` to compatible with different Hive metastore. ## How was this patch tested? Exist unit tests and manual tests with a security Hadoop cluster You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-23749 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20866.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20866 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user jcuquemelle commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r175852840 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor + 1 + +By default, the dynamic allocation will request enough executors to maximize the +parallelism according to the number of tasks to process. While this minimizes the +latency of the job, with small tasks this setting wastes a lot of resources due to --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user jcuquemelle commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r175852906 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor + 1 + +By default, the dynamic allocation will request enough executors to maximize the +parallelism according to the number of tasks to process. While this minimizes the +latency of the job, with small tasks this setting wastes a lot of resources due to +executor allocation overhead, as some executor might not even do any work. +This setting allows to set a divisor that will be used to reduce the number of +executors w.r.t. full parallelism --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user jcuquemelle commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r175852881 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor + 1 + +By default, the dynamic allocation will request enough executors to maximize the +parallelism according to the number of tasks to process. While this minimizes the +latency of the job, with small tasks this setting wastes a lot of resources due to +executor allocation overhead, as some executor might not even do any work. +This setting allows to set a divisor that will be used to reduce the number of +executors w.r.t. full parallelism +Defaults to 1.0 --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20851#discussion_r175851942 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -313,6 +315,36 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex } } + test("filter pushdown - date") { +implicit class IntToDate(int: Int) { + def d: Date = new Date(Date.valueOf("2018-03-01").getTime + 24 * 60 * 60 * 1000 * (int - 1)) +} + +withParquetDataFrame((1 to 4).map(i => Tuple1(i.d))) { implicit df => + checkFilterPredicate('_1.isNull, classOf[Eq[_]], Seq.empty[Row]) + checkFilterPredicate('_1.isNotNull, classOf[NotEq[_]], (1 to 4).map(i => Row.apply(i.d))) + + checkFilterPredicate('_1 === 1.d, classOf[Eq[_]], 1.d) --- End diff -- `1.d` is weird, can we name it `1.date`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20657 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20657 **[Test build #88427 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88427/testReport)** for PR 20657 at commit [`ab60dda`](https://github.com/apache/spark/commit/ab60dda1feb5c68f9ce6e67e14b777dd657e99a7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20657 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1646/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20851: [SPARK-23727][SQL] Support for pushing down filters for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20851 **[Test build #88426 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88426/testReport)** for PR 20851 at commit [`82c5a73`](https://github.com/apache/spark/commit/82c5a73b03bcab62a7fcbaf992ec0a9698e81d91). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20852: [SPARK-23728][BRANCH-2.3] Fix ML tests with expected exc...
Github user attilapiros commented on the issue: https://github.com/apache/spark/pull/20852 Closing manually as it was merged to branch-2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20852: [SPARK-23728][BRANCH-2.3] Fix ML tests with expec...
Github user attilapiros closed the pull request at: https://github.com/apache/spark/pull/20852 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/20851#discussion_r175839248 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -313,6 +316,36 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex } } + test("filter pushdown - date") { +implicit class IntToDate(int: Int) { --- End diff -- Ok, do you mean this way? Looks like we need more words :). ``` implicit class StringToDate(s: String) { def d: Date = Date.valueOf(s) } withParquetDataFrame( Seq("2017-08-18", "2017-08-19", "2017-08-20", "2017-08-21").map(i => Tuple1(i.d))) { implicit df => checkFilterPredicate('_1.isNull, classOf[Eq[_]], Seq.empty[Row]) checkFilterPredicate('_1.isNotNull, classOf[NotEq[_]], Seq("2017-08-18", "2017-08-19", "2017-08-20", "2017-08-21").map(i => Row.apply(i.d))) checkFilterPredicate('_1 === "2017-08-18".d, classOf[Eq[_]], "2017-08-18".d) checkFilterPredicate('_1 =!= "2017-08-18".d, classOf[NotEq[_]], Seq("2017-08-19", "2017-08-20", "2017-08-21").map(i => Row.apply(i.d))) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r175836822 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- I'd rather not change it unless it's fixing a problem. I don't see a problem being fixed here. Also, we should avoid adding more and more cluster-specific logic. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r175835570 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor --- End diff -- sorry didn't get back to this earlier, I think fullExecutorAllocationDivisor would be fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20774 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1645/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20774 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20774 **[Test build #88425 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88425/testReport)** for PR 20774 at commit [`4df6513`](https://github.com/apache/spark/commit/4df6513bba72ffe06047d96e365c0f5e198c0d18). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20774 Retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19881: [SPARK-22683][CORE] Add tasksPerExecutorSlot para...
Github user jcuquemelle commented on a diff in the pull request: https://github.com/apache/spark/pull/19881#discussion_r175831139 --- Diff: docs/configuration.md --- @@ -1795,6 +1796,19 @@ Apart from these, the following properties are also available, and may be useful Lower bound for the number of executors if dynamic allocation is enabled. + + spark.dynamicAllocation.fullParallelismDivisor --- End diff -- How about something like fullAllocationDivisor ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20859: [SPARK-23702][SS] Forbid watermarks on both sides of sta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20859 **[Test build #88424 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88424/testReport)** for PR 20859 at commit [`cd8c638`](https://github.com/apache/spark/commit/cd8c638bb4a278651c2d65579cb9acf909efb97e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/20579 @gatorsmile When you get a chance, could you please see if the check for internal datasource looks reasonable ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20641: [SPARK-23464][MESOS] Fix mesos cluster scheduler ...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/20641#discussion_r175804550 --- Diff: resource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala --- @@ -199,6 +199,38 @@ class MesosClusterSchedulerSuite extends SparkFunSuite with LocalSparkContext wi }) } + test("properly wraps and escapes parameters passed to driver command") { --- End diff -- Does this test fail with the old code? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: Documenting months_between direction
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20787 **[Test build #88423 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88423/testReport)** for PR 20787 at commit [`682bc8c`](https://github.com/apache/spark/commit/682bc8cd766c405d241bf76af62ab03ac3cc26d2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20861 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20861 **[Test build #88422 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88422/testReport)** for PR 20861 at commit [`37a7c8e`](https://github.com/apache/spark/commit/37a7c8e37d0aed753f3e93bfcc5953f208c3277a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20861 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1644/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/20640 @squito @IgorBerman let's move on with this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: Documenting months_between direction
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20787 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: Documenting months_between direction
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20787 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88421/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: Documenting months_between direction
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20787 **[Test build #88421 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88421/testReport)** for PR 20787 at commit [`9d99121`](https://github.com/apache/spark/commit/9d991212cb4da219257964b0d20aaf57a194e558). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expre...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20861 @hvanhovell Thanks! Your comments are addressed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: Documenting months_between direction
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20787 **[Test build #88421 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88421/testReport)** for PR 20787 at commit [`9d99121`](https://github.com/apache/spark/commit/9d991212cb4da219257964b0d20aaf57a194e558). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: Documenting months_between direction
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20787 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: Documenting months_between direction
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20787 **[Test build #88420 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88420/testReport)** for PR 20787 at commit [`264ed99`](https://github.com/apache/spark/commit/264ed994e2b74239907fdccc40544ca67d9b0531). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: Documenting months_between direction
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20787 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88420/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: Documenting months_between direction
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20787 **[Test build #88420 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88420/testReport)** for PR 20787 at commit [`264ed99`](https://github.com/apache/spark/commit/264ed994e2b74239907fdccc40544ca67d9b0531). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: Documenting months_between direction
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20787 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: Documenting months_between direction
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20787 **[Test build #88419 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88419/testReport)** for PR 20787 at commit [`9afb314`](https://github.com/apache/spark/commit/9afb3140469f801d1f08c5b30e0698ec524fcc92). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: Documenting months_between direction
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20787 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88419/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20787: Documenting months_between direction
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20787 **[Test build #88419 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88419/testReport)** for PR 20787 at commit [`9afb314`](https://github.com/apache/spark/commit/9afb3140469f801d1f08c5b30e0698ec524fcc92). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20796 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20865: [SPARK-23542] The exists action shoule be further optimi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20865 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88418/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20865: [SPARK-23542] The exists action shoule be further optimi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20865 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20865: [SPARK-23542] The exists action shoule be further optimi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20865 **[Test build #88418 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88418/testReport)** for PR 20865 at commit [`69e0981`](https://github.com/apache/spark/commit/69e0981a09d4fd85dfc6dba6e74b33f218788cae). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20796 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20796 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88413/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20727 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20727 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88415/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20796 **[Test build #88413 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88413/testReport)** for PR 20796 at commit [`5557a80`](https://github.com/apache/spark/commit/5557a80d4674e929332d9441342e5b90e314eb45). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20727 **[Test build #88415 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88415/testReport)** for PR 20727 at commit [`d6e9160`](https://github.com/apache/spark/commit/d6e91604585b22a27fbd0b7caa0a8e96d3725400). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20865: [SPARK-23542] The exists action shoule be further optimi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20865 **[Test build #88418 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88418/testReport)** for PR 20865 at commit [`69e0981`](https://github.com/apache/spark/commit/69e0981a09d4fd85dfc6dba6e74b33f218788cae). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20865: [SPARK-23542] The exists action shoule be further optimi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20865 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1643/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20865: [SPARK-23542] The exists action shoule be further optimi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20865 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20861 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88414/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20861 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20861 **[Test build #88414 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88414/testReport)** for PR 20861 at commit [`8676495`](https://github.com/apache/spark/commit/86764959734c3c50ee684584c60766dd4288879c). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Uuid(randomSeed: Option[Long] = None) extends LeafExpression with Nondeterministic ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20864: [SPARK-23745][SQL]Remove the directories of the âhive....
Github user samartinucci commented on the issue: https://github.com/apache/spark/pull/20864 Seems to be related to: https://github.com/apache/spark/pull/18666 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19108 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88416/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19108 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19108 **[Test build #88416 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88416/testReport)** for PR 19108 at commit [`6187d88`](https://github.com/apache/spark/commit/6187d8893405afc3e488de55fe36d7f736b16cc3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20850 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20850 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88411/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20850 **[Test build #88411 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88411/testReport)** for PR 20850 at commit [`3637a5c`](https://github.com/apache/spark/commit/3637a5c171ab856051b64bdd3fe01d40c5b2b569). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20865: [SPARK-23542] The exists action shoule be further optimi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20865 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20865: [SPARK-23542] The exists action shoule be further optimi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20865 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1642/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20862: [SPARK-23744][CORE]Fix memory leak in ReadableChannelFil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20862 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88410/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20862: [SPARK-23744][CORE]Fix memory leak in ReadableChannelFil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20862 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20862: [SPARK-23744][CORE]Fix memory leak in ReadableChannelFil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20862 **[Test build #88410 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88410/testReport)** for PR 20862 at commit [`5b58c57`](https://github.com/apache/spark/commit/5b58c57607551328c893a3857717e4b159ecf841). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20865: [SPARK-23542] The exists action shoule be further optimi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20865 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20865: [SPARK-23542] The exists action shoule be further optimi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20865 **[Test build #88417 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88417/testReport)** for PR 20865 at commit [`3bf9878`](https://github.com/apache/spark/commit/3bf987828acea096811ba8dd1d42de8221cac62d). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20865: [SPARK-23542] The exists action shoule be further optimi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20865 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88417/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20865: [SPARK-23542] The exists action shoule be further optimi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20865 **[Test build #88417 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88417/testReport)** for PR 20865 at commit [`3bf9878`](https://github.com/apache/spark/commit/3bf987828acea096811ba8dd1d42de8221cac62d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20865: [SPARK-23542] The exists action shoule be further...
GitHub user KaiXinXiaoLei opened a pull request: https://github.com/apache/spark/pull/20865 [SPARK-23542] The exists action shoule be further optimized in logical plan ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) The optimized logical plan of query `select * from tt1 where exists (select * from tt2 where tt1.i = tt2.i)` is > == Optimized Logical Plan == Join LeftSemi, (i#14 = i#16) :- HiveTableRelation `default`.`tt1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#14, s#15] +- Project [i#16] +- HiveTableRelation `default`.`tt2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#16, s#17] The `exists` action will be rewritten as semi jion. But i the query of `select * from tt1 left semi join tt2 on tt2.i = tt1.i`, the optimized logical plan is : > == Optimized Logical Plan == Join LeftSemi, (i#22 = i#20) :- `Filter isnotnull`(i#20) : +- HiveTableRelation `default`.`tt1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#20, s#21] +- Project [i#22] +- HiveTableRelation `default`.`tt2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#22, s#23] So i think the optimized logical plan of 'select * from tt1 where exists (select * from tt2 where tt1.i = tt2.i);` should be further optimization. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) With this patch, the optimized logical plan of 'select * from tt1 where exists (select * from tt2 where tt1.i = tt2.i);` is: > == Optimized Logical Plan == Join LeftSemi, (i#14 = i#16) :- Filter isnotnull(i#14) : +- HiveTableRelation `default`.`tt1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#14, s#15] +- Project [i#16] :- Filter isnotnull(i#16) +- HiveTableRelation `default`.`tt2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [i#16, s#17] Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/KaiXinXiaoLei/spark SPARK-23542 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20865.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20865 commit 3bf987828acea096811ba8dd1d42de8221cac62d Author: KaiXinXiaoLei <584620569@...> Date: 2018-03-02T03:33:26Z message --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r175725372 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -84,19 +84,49 @@ object ReorderJoin extends Rule[LogicalPlan] with PredicateHelper { } } + // Extract a list of logical plans to be joined for join-order comparisons. + // Since `ExtractFiltersAndInnerJoins` handles left-deep trees only, this function have + // the same strategy to extract the plan list. + private def extractLeftDeepInnerJoins(plan: LogicalPlan): Seq[LogicalPlan] = plan match { +case j @ Join(left, right, _: InnerLike, _) => right +: extractLeftDeepInnerJoins(left) +case p @ Project(_, j @ Join(_, _, _: InnerLike, _)) => extractLeftDeepInnerJoins(j) +case _ => Seq(plan) + } + + private def checkSameJoinOrder(plan1: LogicalPlan, plan2: LogicalPlan): Boolean = { +extractLeftDeepInnerJoins(plan1) == extractLeftDeepInnerJoins(plan2) + } + + private def mayCreateOrderedJoin( + originalPlan: LogicalPlan, + input: Seq[(LogicalPlan, InnerLike)], + conditions: Seq[Expression]): LogicalPlan = { +val orderedJoins = createOrderedJoin(input, conditions) +if (!checkSameJoinOrder(orderedJoins, originalPlan)) { --- End diff -- Is this check necessary? I think check `originalPlan.output != orderedJoins.output` is enough, and faster. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r175696570 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -141,14 +141,16 @@ object ExtractEquiJoinKeys extends Logging with PredicateHelper { } /** - * A pattern that collects the filter and inner joins. + * A pattern that collects the filter and inner joins (and skip projections in plan sub-trees). --- End diff -- skip projections with attributes only --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r175727668 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinOptimizationSuite.scala --- @@ -145,4 +159,15 @@ class JoinOptimizationSuite extends PlanTest { } assert(broadcastChildren.size == 1) } + + test("SPARK-23172 skip projections when flattening joins") { --- End diff -- Could you add a test case which would fail before the fix? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r175696187 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -172,17 +174,23 @@ object ExtractFiltersAndInnerJoins extends PredicateHelper { case Filter(filterCondition, j @ Join(left, right, _: InnerLike, joinCondition)) => val (plans, conditions) = flattenJoin(j) (plans, conditions ++ splitConjunctivePredicates(filterCondition)) - +case p @ Project(_, j @ Join(left, right, _: InnerLike, joinCondition)) => + // Keep flattening joins when projects having attributes only + if (p.outputSet.subsetOf(j.outputSet)) { --- End diff -- If we want to make sure the project has attributes only, should it be `p.projectList.forall(_.isInstanceOf[Attribute])`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/20345#discussion_r175696302 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -172,17 +174,23 @@ object ExtractFiltersAndInnerJoins extends PredicateHelper { case Filter(filterCondition, j @ Join(left, right, _: InnerLike, joinCondition)) => val (plans, conditions) = flattenJoin(j) (plans, conditions ++ splitConjunctivePredicates(filterCondition)) - +case p @ Project(_, j @ Join(left, right, _: InnerLike, joinCondition)) => + // Keep flattening joins when projects having attributes only --- End diff -- nit: when projects having attributes only => when the project has attributes only --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uui...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20861#discussion_r175727781 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolvedUuidExpressionsSuite.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} + +/** + * Test suite for resolving Uuid expressions. + */ +class ResolvedUuidExpressionsSuite extends AnalysisTest { + + private lazy val a = 'a.int + private lazy val r = LocalRelation(a) + private lazy val uuid1 = Uuid().as('_uuid1) + private lazy val uuid2 = Uuid().as('_uuid2) + private lazy val uuid3 = Uuid().as('_uuid3) + private lazy val uuid1Ref = uuid1.toAttribute + + private val analyzer = getAnalyzer(caseSensitive = true) + + private def getUuidExpressions(plan: LogicalPlan): Seq[Uuid] = { +val uuids = new ArrayBuffer[Uuid]() +plan.transformUp { --- End diff -- Nit use `flatMap`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uui...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20861#discussion_r175727739 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolvedUuidExpressionsSuite.scala --- @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} + +/** + * Test suite for resolving Uuid expressions. + */ +class ResolvedUuidExpressionsSuite extends AnalysisTest { + + private lazy val a = 'a.int + private lazy val r = LocalRelation(a) + private lazy val uuid1 = Uuid().as('_uuid1) + private lazy val uuid2 = Uuid().as('_uuid2) + private lazy val uuid3 = Uuid().as('_uuid3) + private lazy val uuid1Ref = uuid1.toAttribute + + private val analyzer = getAnalyzer(caseSensitive = true) + + private def getUuidExpressions(plan: LogicalPlan): Seq[Uuid] = { +val uuids = new ArrayBuffer[Uuid]() +plan.transformUp { + case p => +p.transformExpressionsUp { --- End diff -- NIT: use `collect`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uui...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20861#discussion_r175725276 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -98,6 +99,8 @@ class Analyzer( this(catalog, conf, conf.optimizerMaxIterations) } + private lazy val random = new Random() --- End diff -- Shall we put `random` in the `ResolvedUuidExpressions`? That makes it a little bit easier to follow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolut...
Github user mgaido91 closed the pull request at: https://github.com/apache/spark/pull/20847 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user mn-mikke commented on the issue: https://github.com/apache/spark/pull/20858 @maropu What other libraries do you mean? I'm not aware of any library providing this functionality on top Spark SQL. When using Spark SQL as an ETL tool for structured and nested data, people are forced to use UDFs for transforming arrays since current api for array columns is lacking. This approach brings several drawbacks: - bad code readability - Catalyst is blind when performing optimizations - impossibility to track data lineage of the transformation (a key aspect for the financial industry, see [Spline](https://absaoss.github.io/spline/) and [Spline paper](https://github.com/AbsaOSS/spline/releases/download/release%2F0.2.7/Spline_paper_IEEE_2018.pdf)) So my colleagues and I decided to extend the current Spark SQL API with well-known collection functions like concat, flatten, zipWithIndex, etc. We don't want to keep this functionality just in our fork of Spark, but would like to share it with others. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20774 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20774 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88412/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20774 **[Test build #88412 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88412/testReport)** for PR 20774 at commit [`4df6513`](https://github.com/apache/spark/commit/4df6513bba72ffe06047d96e365c0f5e198c0d18). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class PromoteStrings(conf: SQLConf) extends TypeCoercionRule ` * ` case class InConversion(conf: SQLConf) extends TypeCoercionRule ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19108 **[Test build #88416 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88416/testReport)** for PR 19108 at commit [`6187d88`](https://github.com/apache/spark/commit/6187d8893405afc3e488de55fe36d7f736b16cc3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19108 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1641/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19108 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20811: [SPARK-23668][K8S] Add config option for passing through...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20811 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1623/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20327 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org