[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/21860 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22118: Branch 2.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22118 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22118: Branch 2.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22118 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22118: Branch 2.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22118 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20725: [SPARK-23555][PYTHON] Add BinaryType support for Arrow i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20725 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94838/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20725: [SPARK-23555][PYTHON] Add BinaryType support for Arrow i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20725 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22118: Branch 2.2
GitHub user speful opened a pull request: https://github.com/apache/spark/pull/22118 Branch 2.2 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22118.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22118 commit 86609a95af4b700e83638b7416c7e3706c2d64c6 Author: Liang-Chi Hsieh Date: 2017-08-08T08:12:41Z [SPARK-21567][SQL] Dataset should work with type alias If we create a type alias for a type workable with Dataset, the type alias doesn't work with Dataset. A reproducible case looks like: object C { type TwoInt = (Int, Int) def tupleTypeAlias: TwoInt = (1, 1) } Seq(1).toDS().map(_ => ("", C.tupleTypeAlias)) It throws an exception like: type T1 is not a class scala.ScalaReflectionException: type T1 is not a class at scala.reflect.api.Symbols$SymbolApi$class.asClass(Symbols.scala:275) ... This patch accesses the dealias of type in many places in `ScalaReflection` to fix it. Added test case. Author: Liang-Chi Hsieh Closes #18813 from viirya/SPARK-21567. (cherry picked from commit ee1304199bcd9c1d5fc94f5b06fdd5f6fe7336a1) Signed-off-by: Wenchen Fan commit e87ffcaa3e5b75f8d313dc995e4801063b60cd5c Author: Wenchen Fan Date: 2017-08-08T08:32:49Z Revert "[SPARK-21567][SQL] Dataset should work with type alias" This reverts commit 86609a95af4b700e83638b7416c7e3706c2d64c6. commit d0233145208eb6afcd9fe0c1c3a9dbbd35d7727e Author: pgandhi Date: 2017-08-09T05:46:06Z [SPARK-21503][UI] Spark UI shows incorrect task status for a killed Executor Process The executor tab on Spark UI page shows task as completed when an executor process that is running that task is killed using the kill command. Added the case ExecutorLostFailure which was previously not there, thus, the default case would be executed in which case, task would be marked as completed. This case will consider all those cases where executor connection to Spark Driver was lost due to killing the executor process, network connection etc. ## How was this patch tested? Manually Tested the fix by observing the UI change before and after. Before: https://user-images.githubusercontent.com/8190/28482929-571c9cea-6e30-11e7-93dd-728de5cdea95.png;> After: https://user-images.githubusercontent.com/8190/28482964-8649f5ee-6e30-11e7-91bd-2eb2089c61cc.png;> Please review http://spark.apache.org/contributing.html before opening a pull request. Author: pgandhi Author: pgandhi999 Closes #18707 from pgandhi999/master. (cherry picked from commit f016f5c8f6c6aae674e9905a5c0b0bede09163a4) Signed-off-by: Wenchen Fan commit 7446be3328ea75a5197b2587e3a8e2ca7977726b Author: WeichenXu Date: 2017-08-09T06:44:10Z [SPARK-21523][ML] update breeze to 0.13.2 for an emergency bugfix in strong wolfe line search ## What changes were proposed in this pull request? Update breeze to 0.13.1 for an emergency bugfix in strong wolfe line search https://github.com/scalanlp/breeze/pull/651 ## How was this patch tested? N/A Author: WeichenXu Closes #18797 from WeichenXu123/update-breeze. (cherry picked from commit b35660dd0e930f4b484a079d9e2516b0a7dacf1d) Signed-off-by: Yanbo Liang commit f6d56d2f1c377000921effea2b1faae15f9cae82 Author: Shixiong Zhu Date: 2017-08-09T06:49:33Z [SPARK-21596][SS] Ensure places calling HDFSMetadataLog.get check the return value Same PR as #18799 but for branch 2.2. Main discussion the other PR. When I was investigating a flaky test, I realized that many places don't check the return value of `HDFSMetadataLog.get(batchId: Long): Option[T]`. When a batch is supposed to be there, the caller just ignores None rather than throwing an error. If some bug causes a query doesn't generate a batch metadata file, this behavior will hide it and allow the query continuing to run and finally delete metadata logs and make it hard to debug. This PR ensures that places calling HDFSMetadataLog.get always check the
[GitHub] spark issue #20725: [SPARK-23555][PYTHON] Add BinaryType support for Arrow i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20725 **[Test build #94838 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94838/testReport)** for PR 20725 at commit [`461c326`](https://github.com/apache/spark/commit/461c326f00d68a350a1b5c0f7b644f2871ee0a85). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22117 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94834/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22117 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22117 **[Test build #94834 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94834/testReport)** for PR 22117 at commit [`3cad78f`](https://github.com/apache/spark/commit/3cad78f8bb9bc0dc841cd0c31e0b0d52f8e7c764). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20725: [SPARK-23555][PYTHON] Add BinaryType support for Arrow i...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20725 This is working now, and BinaryType support is conditional on pyarrow 0.10.0 or higher being used. @HyukjinKwon @cloud-fan what are your thoughts on getting this in for Spark 2.4? I think it would be very useful to have since images in Spark use the BInaryType and it will be good to have when integrating Spark with DL frameworks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20725: [SPARK-23555][PYTHON] Add BinaryType support for Arrow i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20725 **[Test build #94838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94838/testReport)** for PR 20725 at commit [`461c326`](https://github.com/apache/spark/commit/461c326f00d68a350a1b5c0f7b644f2871ee0a85). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20725: [SPARK-23555][PYTHON] Add BinaryType support for Arrow i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20725 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21537 We are fully swamped by the hotfix and regressions of 2.3 release and the new features that are targeting to 2.4. We should post some comments in this PR earlier. Designing an IR for our codegen is the right thing we should do. [If you do not agree on this, we can discuss about it.] How to design an IR is a challenging task. The whole community is welcome to submit the designs and PRs. Everyone can show the ideas. The best idea will win. @HyukjinKwon If you have a bandwidth, please also have a try --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20725: [SPARK-23555][PYTHON] Add BinaryType support for Arrow i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20725 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2236/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21860 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21860 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94835/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21860 **[Test build #94835 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94835/testReport)** for PR 21860 at commit [`6ff46d9`](https://github.com/apache/spark/commit/6ff46d941a6ddb29345ea0c563aa68b77f540139). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createUnsafeAr...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21912 cc @ueshin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22031 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94833/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22031 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22031 **[Test build #94833 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94833/testReport)** for PR 22031 at commit [`0342ed9`](https://github.com/apache/spark/commit/0342ed934e65c13c43081f464503800118383a44). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22113: [SPARK-25126] Lazily create Reader for orc files
Github user raofu commented on a diff in the pull request: https://github.com/apache/spark/pull/22113#discussion_r210473687 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala --- @@ -70,7 +70,7 @@ private[hive] object OrcFileOperator extends Logging { hdfsPath.getFileSystem(conf) } -listOrcFiles(basePath, conf).iterator.map { path => +listOrcFiles(basePath, conf).view.map { path => --- End diff -- My bad. I misread the code. Sorry about the noise. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22113: [SPARK-25126] Lazily create Reader for orc files
Github user raofu closed the pull request at: https://github.com/apache/spark/pull/22113 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22115: [SPARK-25082] [SQL] improve the javadoc for expm1()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22115 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94831/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22115: [SPARK-25082] [SQL] improve the javadoc for expm1()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22115 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22115: [SPARK-25082] [SQL] improve the javadoc for expm1()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22115 **[Test build #94831 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94831/testReport)** for PR 22115 at commit [`089c31f`](https://github.com/apache/spark/commit/089c31fcff1a5b84634f5de78c1bd440f738b2f4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22045#discussion_r210469510 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -497,6 +497,53 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } +/** + * Returns a map that applies the function to each value of the map. + */ +@ExpressionDescription( +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.", +examples = """ +Examples: + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> v + 1); +map(array(1, 2, 3), array(2, 3, 4)) + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v); +map(array(1, 2, 3), array(2, 4, 6)) + """, +since = "2.4.0") --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22045#discussion_r210469494 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -497,6 +497,53 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } +/** + * Returns a map that applies the function to each value of the map. + */ +@ExpressionDescription( +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.", +examples = """ --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22045#discussion_r210471011 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -2302,6 +2302,177 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { assert(ex5.getMessage.contains("function map_zip_with does not support ordering on type map")) } + test("transform values function - test primitive data types") { +val dfExample1 = Seq( + Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7) +).toDF("i") + +val dfExample2 = Seq( + Map[Boolean, String](false -> "abc", true -> "def") +).toDF("x") + +val dfExample3 = Seq( + Map[String, Int]("a" -> 1, "b" -> 2, "c" -> 3) +).toDF("y") + +val dfExample4 = Seq( + Map[Int, Double](1 -> 1.0, 2 -> 1.40, 3 -> 1.70) +).toDF("z") + +val dfExample5 = Seq( + Map[Int, Array[Int]](1 -> Array(1, 2)) +).toDF("c") + +def testMapOfPrimitiveTypesCombination(): Unit = { + checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> k + v)"), +Seq(Row(Map(1 -> 2, 9 -> 18, 8 -> 16, 7 -> 14 + + checkAnswer(dfExample2.selectExpr( +"transform_values(x, (k, v) -> if(k, v, CAST(k AS String)))"), +Seq(Row(Map(false -> "false", true -> "def" + + checkAnswer(dfExample2.selectExpr("transform_values(x, (k, v) -> NOT k AND v = 'abc')"), +Seq(Row(Map(false -> true, true -> false + + checkAnswer(dfExample3.selectExpr("transform_values(y, (k, v) -> v * v)"), +Seq(Row(Map("a" -> 1, "b" -> 4, "c" -> 9 + + checkAnswer(dfExample3.selectExpr( +"transform_values(y, (k, v) -> k || ':' || CAST(v as String))"), +Seq(Row(Map("a" -> "a:1", "b" -> "b:2", "c" -> "c:3" + + checkAnswer( +dfExample3.selectExpr("transform_values(y, (k, v) -> concat(k, cast(v as String)))"), +Seq(Row(Map("a" -> "a1", "b" -> "b2", "c" -> "c3" + + checkAnswer( +dfExample4.selectExpr( + "transform_values(" + +"z,(k, v) -> map_from_arrays(ARRAY(1, 2, 3), " + +"ARRAY('one', 'two', 'three'))[k] || '_' || CAST(v AS String))"), +Seq(Row(Map(1 -> "one_1.0", 2 -> "two_1.4", 3 ->"three_1.7" + + checkAnswer( +dfExample4.selectExpr("transform_values(z, (k, v) -> k-v)"), +Seq(Row(Map(1 -> 0.0, 2 -> 0.6001, 3 -> 1.3 + + checkAnswer( +dfExample5.selectExpr("transform_values(c, (k, v) -> k + cardinality(v))"), +Seq(Row(Map(1 -> 3 +} + +// Test with local relation, the Project will be evaluated without codegen +testMapOfPrimitiveTypesCombination() +dfExample1.cache() +dfExample2.cache() +dfExample3.cache() +dfExample4.cache() +dfExample5.cache() +// Test with cached relation, the Project will be evaluated with codegen +testMapOfPrimitiveTypesCombination() + } + + test("transform values function - test empty") { +val dfExample1 = Seq( + Map.empty[Integer, Integer] +).toDF("i") + +val dfExample2 = Seq( + Map.empty[BigInt, String] +).toDF("j") + +def testEmpty(): Unit = { + checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> NULL)"), +Seq(Row(Map.empty[Integer, Integer]))) + + checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> k)"), +Seq(Row(Map.empty[Integer, Integer]))) + + checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> v)"), +Seq(Row(Map.empty[Integer, Integer]))) + + checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> 0)"), +Seq(Row(Map.empty[Integer, Integer]))) + + checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> 'value')"), +Seq(Row(Map.empty[Integer, String]))) + + checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> true)"), +Seq(Row(Map.empty[Integer, Boolean]))) + + checkAnswer(dfExample2.selectExpr("transform_values(j, (k, v) -> k + cast(v as BIGINT))"), +Seq(Row(Map.empty[BigInt, BigInt]))) +} + +testEmpty() +dfExample1.cache() +dfExample2.cache() +testEmpty() + } + + test("transform values function - test null values") { +val dfExample1 = Seq( + Map[Int, Integer](1 -> 1, 2 -> 2, 3 -> 3, 4 -> 4) +).toDF("a") + +val dfExample2 = Seq( + Map[Int, String](1 -> "a", 2 -> "b", 3 -> null)
[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22045#discussion_r210469472 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -497,6 +497,53 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } +/** + * Returns a map that applies the function to each value of the map. + */ +@ExpressionDescription( +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.", --- End diff -- nit: indent --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22045#discussion_r210470513 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -497,6 +497,53 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } +/** + * Returns a map that applies the function to each value of the map. + */ +@ExpressionDescription( +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.", +examples = """ +Examples: + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> v + 1); +map(array(1, 2, 3), array(2, 3, 4)) + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v); +map(array(1, 2, 3), array(2, 4, 6)) + """, +since = "2.4.0") +case class TransformValues( +argument: Expression, +function: Expression) + extends MapBasedSimpleHigherOrderFunction with CodegenFallback { + + override def nullable: Boolean = argument.nullable + + @transient lazy val MapType(keyType, valueType, valueContainsNull) = argument.dataType + + override def dataType: DataType = MapType(keyType, function.dataType, valueContainsNull) + + override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction) + : TransformValues = { +copy(function = f(function, (keyType, false) :: (valueType, valueContainsNull) :: Nil)) + } + + @transient lazy val LambdaFunction( + _, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, _) = function --- End diff -- nit: indent --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...
Github user deshanxiao commented on the issue: https://github.com/apache/spark/pull/22109 @vanzin Sorry..SPARK-22850 has fix the problem. Maybe I will track the executor lose problem next. Thank you! @vanzin @squito --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22031#discussion_r210467354 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -442,3 +442,91 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } + +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(left, right, func) - Merges the two given arrays, element-wise, into a single array using function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying function.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array('a', 'b', 'c'), (x, y) -> (y, x)); + array(('a', 1), ('b', 3), ('c', 5)) + > SELECT _FUNC_(array(1, 2), array(3, 4), (x, y) -> x + y)); + array(4, 6) + > SELECT _FUNC_(array('a', 'b', 'c'), array('d', 'e', 'f'), (x, y) -> concat(x, y)); + array('ad', 'be', 'cf') + """, + since = "2.4.0") +// scalastyle:on line.size.limit +case class ArraysZipWith( +left: Expression, +right: Expression, +function: Expression) + extends HigherOrderFunction with CodegenFallback with ExpectsInputTypes { + + override def inputs: Seq[Expression] = List(left, right) + + override def functions: Seq[Expression] = List(function) + + def expectingFunctionType: AbstractDataType = AnyDataType + @transient lazy val functionForEval: Expression = functionsForEval.head + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, ArrayType, expectingFunctionType) + + override def nullable: Boolean = inputs.exists(_.nullable) + + override def dataType: ArrayType = ArrayType(function.dataType, function.nullable) + + override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction): ArraysZipWith = { +val (leftElementType, leftContainsNull) = left.dataType match { + case ArrayType(elementType, containsNull) => (elementType, containsNull) + case _ => +val ArrayType(elementType, containsNull) = ArrayType.defaultConcreteType +(elementType, containsNull) +} +val (rightElementType, rightContainsNull) = right.dataType match { + case ArrayType(elementType, containsNull) => (elementType, containsNull) + case _ => +val ArrayType(elementType, containsNull) = ArrayType.defaultConcreteType +(elementType, containsNull) +} +copy(function = f(function, + (leftElementType, leftContainsNull) :: (rightElementType, rightContainsNull) :: Nil)) --- End diff -- If we append `null`s to the shorter array, both of the arguments might be `null`, so we should use `true` for nullabilities of the arguments as @mn-mikke suggested. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22031#discussion_r210468640 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -2302,6 +2302,76 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { assert(ex5.getMessage.contains("function map_zip_with does not support ordering on type map")) } + test("arrays zip_with function - for primitive types") { +val df1 = Seq[(Seq[Integer], Seq[Integer])]( + (Seq(9001, 9002, 9003), Seq(4, 5, 6)), + (Seq(1, 2), Seq(3, 4)), + (Seq.empty, Seq.empty), + (null, null) +).toDF("val1", "val2") +val df2 = Seq[(Seq[Integer], Seq[Long])]( + (Seq(1, null, 3), Seq(1L, 2L)), + (Seq(1, 2, 3), Seq(4L, 11L)) +).toDF("val1", "val2") + +val expectedValue1 = Seq( + Row(Seq(9005, 9007, 9009)), + Row(Seq(4, 6)), + Row(Seq.empty), + Row(null)) +checkAnswer(df1.selectExpr("zip_with(val1, val2, (x, y) -> x + y)"), expectedValue1) + +val expectedValue2 = Seq( + Row(Seq(Row(1.0, 1), Row(2.0, null), Row(null, 3))), --- End diff -- Why `1.0` or `2.0` instead of `1L` or `2L`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22031#discussion_r210467721 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala --- @@ -396,4 +396,52 @@ class HigherOrderFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper map_zip_with(mbb0, mbbn, concat), null) } + + test("ZipWith") { +def zip_with( +left: Expression, +right: Expression, +f: (Expression, Expression) => Expression): Expression = { + val ArrayType(leftT, leftContainsNull) = left.dataType.asInstanceOf[ArrayType] + val ArrayType(rightT, rightContainsNull) = right.dataType.asInstanceOf[ArrayType] + ZipWith(left, right, createLambda(leftT, leftContainsNull, rightT, rightContainsNull, f)) +} + +val ai0 = Literal.create(Seq(1, 2, 3), ArrayType(IntegerType, containsNull = false)) +val ai1 = Literal.create(Seq(1, 2, 3), ArrayType(IntegerType, containsNull = false)) --- End diff -- What's the difference between `ai0` and `ai1`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22031#discussion_r210467959 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala --- @@ -396,4 +396,52 @@ class HigherOrderFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper map_zip_with(mbb0, mbbn, concat), null) } + + test("ZipWith") { +def zip_with( +left: Expression, +right: Expression, +f: (Expression, Expression) => Expression): Expression = { + val ArrayType(leftT, leftContainsNull) = left.dataType.asInstanceOf[ArrayType] + val ArrayType(rightT, rightContainsNull) = right.dataType.asInstanceOf[ArrayType] + ZipWith(left, right, createLambda(leftT, leftContainsNull, rightT, rightContainsNull, f)) +} + +val ai0 = Literal.create(Seq(1, 2, 3), ArrayType(IntegerType, containsNull = false)) +val ai1 = Literal.create(Seq(1, 2, 3), ArrayType(IntegerType, containsNull = false)) +val ai2 = Literal.create(Seq[Integer](1, null, 3), ArrayType(IntegerType, containsNull = true)) +val ai3 = Literal.create(Seq[Integer](1, null), ArrayType(IntegerType, containsNull = true)) +val ain = Literal.create(null, ArrayType(IntegerType, containsNull = false)) + +val add: (Expression, Expression) => Expression = (x, y) => x + y +val plusOne: Expression => Expression = x => x + 1 + +checkEvaluation(zip_with(ai0, ai1, add), Seq(2, 4, 6)) +checkEvaluation(zip_with(ai3, ai2, add), Seq(2, null, null)) +checkEvaluation(zip_with(ai2, ai3, add), Seq(2, null, null)) +checkEvaluation(zip_with(ain, ain, add), null) +checkEvaluation(zip_with(ai1, ain, add), null) +checkEvaluation(zip_with(ain, ai1, add), null) + +val as0 = Literal.create(Seq("a", "b", "c"), ArrayType(StringType, containsNull = false)) +val as1 = Literal.create(Seq("a", null, "c"), ArrayType(StringType, containsNull = true)) +val as2 = Literal.create(Seq("a"), ArrayType(StringType, containsNull = true)) +val asn = Literal.create(null, ArrayType(StringType, containsNull = false)) + +val concat: (Expression, Expression) => Expression = (x, y) => Concat(Seq(x, y)) + +checkEvaluation(zip_with(as0, as1, concat), Seq("aa", null, "cc")) +checkEvaluation(zip_with(as0, as2, concat), Seq("aa", null, null)) + +val aai1 = Literal.create(Seq(Seq(1, 2, 3), null, Seq(4, 5)), + ArrayType(ArrayType(IntegerType, containsNull = false), containsNull = true)) +val aai2 = Literal.create(Seq(Seq(1, 2, 3)), + ArrayType(ArrayType(IntegerType, containsNull = false), containsNull = true)) +checkEvaluation( + zip_with(aai1, aai2, (a1, a2) => + Cast(zip_with(transform(a1, plusOne), transform(a2, plusOne), add), StringType)), --- End diff -- nit: indent --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22031#discussion_r210468814 --- Diff: sql/core/src/test/resources/sql-tests/inputs/higher-order-functions.sql --- @@ -51,3 +51,12 @@ select exists(ys, y -> y > 30) as v from nested; -- Check for element existence in a null array select exists(cast(null as array), y -> y > 30) as v; + +-- Zip with array +select zip_with(ys, zs, (a, b) -> a + size(b)) as v from nested; + +-- Zip with array with concat +select zip_with(array('a', 'b', 'c'), array('d', 'e', 'f'), (x, y) -> concat(x, y)) as v; + +-- Zip with array coalesce +select zip_with(array('a'), array('d', null, 'f'), (x, y) -> coalesce(x, y)) as v; --- End diff -- Can you add a line break at the end of the file? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22031#discussion_r210466854 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -687,3 +687,89 @@ case class MapZipWith(left: Expression, right: Expression, function: Expression) override def prettyName: String = "map_zip_with" } + +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(left, right, func) - Merges the two given arrays, element-wise, into a single array using function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying function.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array('a', 'b', 'c'), (x, y) -> (y, x)); + array(('a', 1), ('b', 3), ('c', 5)) + > SELECT _FUNC_(array(1, 2), array(3, 4), (x, y) -> x + y)); + array(4, 6) + > SELECT _FUNC_(array('a', 'b', 'c'), array('d', 'e', 'f'), (x, y) -> concat(x, y)); + array('ad', 'be', 'cf') + """, + since = "2.4.0") +// scalastyle:on line.size.limit +case class ZipWith(left: Expression, right: Expression, function: Expression) + extends HigherOrderFunction with CodegenFallback { + + def functionForEval: Expression = functionsForEval.head + + override def arguments: Seq[Expression] = left :: right :: Nil + + override def argumentTypes: Seq[AbstractDataType] = ArrayType :: ArrayType :: Nil + + override def functions: Seq[Expression] = List(function) + + override def functionTypes: Seq[AbstractDataType] = AnyDataType :: Nil + + override def nullable: Boolean = left.nullable || right.nullable + + override def dataType: ArrayType = ArrayType(function.dataType, function.nullable) + + override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction): ZipWith = { +val (leftElementType, leftContainsNull) = left.dataType match { + case ArrayType(elementType, containsNull) => (elementType, containsNull) + case _ => +val ArrayType(elementType, containsNull) = ArrayType.defaultConcreteType +(elementType, containsNull) +} +val (rightElementType, rightContainsNull) = right.dataType match { + case ArrayType(elementType, containsNull) => (elementType, containsNull) + case _ => +val ArrayType(elementType, containsNull) = ArrayType.defaultConcreteType +(elementType, containsNull) +} --- End diff -- Now we can do: ```scala val ArrayType(leftElementType, leftContainsNull) = left.dataType val ArrayType(rightElementType, rightContainsNull) = right.dataType ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22031#discussion_r210467535 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala --- @@ -396,4 +396,52 @@ class HigherOrderFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper map_zip_with(mbb0, mbbn, concat), null) } + + test("ZipWith") { +def zip_with( +left: Expression, +right: Expression, +f: (Expression, Expression) => Expression): Expression = { + val ArrayType(leftT, leftContainsNull) = left.dataType.asInstanceOf[ArrayType] + val ArrayType(rightT, rightContainsNull) = right.dataType.asInstanceOf[ArrayType] --- End diff -- nit: we don't need `.asInstanceOf[ArrayType]`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/G...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/21561#discussion_r210468884 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeans.scala --- @@ -246,6 +245,16 @@ class BisectingKMeans private ( new BisectingKMeansModel(root, this.distanceMeasure) } + /** + * Runs the bisecting k-means algorithm. + * @param input RDD of vectors + * @return model for the bisecting kmeans + */ + @Since("1.6.0") --- End diff -- Oh right I get it now, this isn't a new method, it's 'replacing' the definition above. ð --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22116: [DOCS]Update configuration.md
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22116 Huh OK I thought I looked and this had been fixed. Good catch. Also there's an instance in `cloud-integration.md`, worth fixing too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/G...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/21561#discussion_r210468639 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeans.scala --- @@ -246,6 +245,16 @@ class BisectingKMeans private ( new BisectingKMeansModel(root, this.distanceMeasure) } + /** + * Runs the bisecting k-means algorithm. + * @param input RDD of vectors + * @return model for the bisecting kmeans + */ + @Since("1.6.0") --- End diff -- `def run(input: RDD[Vector]): BisectingKMeansModel` is a public api since 1.6, and users can call it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/G...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/21561#discussion_r210468107 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeans.scala --- @@ -246,6 +245,16 @@ class BisectingKMeans private ( new BisectingKMeansModel(root, this.distanceMeasure) } + /** + * Runs the bisecting k-means algorithm. + * @param input RDD of vectors + * @return model for the bisecting kmeans + */ + @Since("1.6.0") --- End diff -- You couldn't call `BisectingKMeans.run(...)` before this, right? it wasn't in a superclass or anything. In that sense I think this method needs to be marked as new as of 2.4.0, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 @tdas Kindly reminder. @zsxwing Could you take a quick look at this and share your thought? I think the patch is ready to merge, but blocked with slightly conflict of view so more voices would be better. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/G...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/21561#discussion_r210467653 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeans.scala --- @@ -246,6 +245,16 @@ class BisectingKMeans private ( new BisectingKMeansModel(root, this.distanceMeasure) } + /** + * Runs the bisecting k-means algorithm. + * @param input RDD of vectors + * @return model for the bisecting kmeans + */ + @Since("1.6.0") --- End diff -- this api was already existing since 1.6.0, so we should keep the since annotation? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21733 @tdas Kindly reminder. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22115: [SPARK-25082] [SQL] improve the javadoc for expm1()
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/22115 I have already done the global search. That is the only place needs change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22013: [SPARK-23939][SQL] Add transform_keys function
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/22013 LGTM. @mn-mikke @mgaido91 Do you have any other comments on this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22031 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94830/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22031 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22031 **[Test build #94830 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94830/testReport)** for PR 22031 at commit [`92cb34a`](https://github.com/apache/spark/commit/92cb34af9c1e5742d9fa21f677645daea029bfd6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ZipWith(left: Expression, right: Expression, function: Expression)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22116: Update configuration.md
Github user KraFusion commented on the issue: https://github.com/apache/spark/pull/22116 @HyukjinKwon the same instance did exist in the spark website repo, PR has been merged. Not sure what to change the title to, the PR instructions don't cover simple typo fixes in documentation that don't have an associated JIRA. Should I prefix the current title with [DOCS] ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22116: Update configuration.md
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22116 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22116: Update configuration.md
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22116 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94836/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22116: Update configuration.md
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22116 **[Test build #94836 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94836/testReport)** for PR 22116 at commit [`2b2a61c`](https://github.com/apache/spark/commit/2b2a61c849ddad680819126f8a6fdc28cbbad721). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22115: [SPARK-25082] [SQL] improve the javadoc for expm1()
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22115 Mind fixing Python / R / SQL ones while we are here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22009 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2235/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22009 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22116: Update configuration.md
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22116 **[Test build #94836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94836/testReport)** for PR 22116 at commit [`2b2a61c`](https://github.com/apache/spark/commit/2b2a61c849ddad680819126f8a6fdc28cbbad721). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22113: [SPARK-25126] Lazily create Reader for orc files
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22113#discussion_r210462023 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala --- @@ -70,7 +70,7 @@ private[hive] object OrcFileOperator extends Logging { hdfsPath.getFileSystem(conf) } -listOrcFiles(basePath, conf).iterator.map { path => +listOrcFiles(basePath, conf).view.map { path => --- End diff -- Do you mean `collectFirst` actually traverse `iterator` entirely? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22009 **[Test build #94837 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94837/testReport)** for PR 22009 at commit [`0318b4b`](https://github.com/apache/spark/commit/0318b4b1dcbfde0024945308578cedf8d4a09168). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22113: [SPARK-25126] Lazily create Reader for orc files
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22113#discussion_r210461983 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala --- @@ -70,7 +70,7 @@ private[hive] object OrcFileOperator extends Logging { hdfsPath.getFileSystem(conf) } -listOrcFiles(basePath, conf).iterator.map { path => +listOrcFiles(basePath, conf).view.map { path => --- End diff -- Do you mean 'iterator' and `collectFirst` actually traverse entirely? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22109 **[Test build #4263 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4263/testReport)** for PR 22109 at commit [`26ca9c2`](https://github.com/apache/spark/commit/26ca9c2c08c62961183e6461183c2963b6a00474). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22116: Update configuration.md
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22116 @KraFusion, mind double checking if there's same instance and fixing the PR title to reflect the change? Also should be good to read https://spark.apache.org/contributing.html even though it's a minor change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22116: Update configuration.md
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22116 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...
Github user deshanxiao commented on the issue: https://github.com/apache/spark/pull/22109 @squito @vanzin Thanks, the first time I find it in our cluster is Spark2.1. Spark2.1 has the method `setupAndStartListenerBus` too, but it still looks like wrong. The phenomenon of executor lose I find it in yesterday. Maybe we should fix them together. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21537 Yea, yea. I understand. I wasn't trying to say the severity of effect by introducing AnalysisBarrier wasn't trivial. True, I understand it is not an easy job. Thank you Reynold for that. Yea, also I don't mean to say we should just go ahead without sufficient discussion. Wanted to point out that there are positive aspects of the effort and try about AnalysisBarrier too. It wasn't all bad in a way. > The reason why we did not merge this PR is that we are doubting this is the right thing to do. @rednaxelafx If that's true, the concerns should be mentioned here and discussed. Was there a discussion about it in the community and did I miss it? I would appreciate if we can talk here. > Instead of reinventing a compiler, how about letting the compiler internal expert (in our community, we have @kiszk) to lead the effort and offer a design for this. If there is a design concern for that and better suggestion, let's file a JIRA. I want to see the problem, concerns and possible suggestions as well. Yup, I got that it might be better for someone who has some expertise in that area but I was thinking that they should purely based upon the community work primarily - in that way, it looked reasonable @viirya goes ahead since it's basically his work. If not, to me I don't see any particular one is preferred. Just wanted to point out that the baseline is open for anyone not for specific persons. If anyone is willing to do this, anyone is welcome to go ahead. So, primarily they should voluntarily join in without other factors. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22106: [SPARK-25116][TESTS]Fix the Kafka cluster leak and clean...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22106 **[Test build #4264 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4264/testReport)** for PR 22106 at commit [`63cc11d`](https://github.com/apache/spark/commit/63cc11dfa575ac25ee3751a93a2cb5a6b9886218). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22106: [SPARK-25116][TESTS]Fix the Kafka cluster leak and clean...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22106 **[Test build #4266 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4266/testReport)** for PR 22106 at commit [`63cc11d`](https://github.com/apache/spark/commit/63cc11dfa575ac25ee3751a93a2cb5a6b9886218). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21860 **[Test build #94835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94835/testReport)** for PR 21860 at commit [`6ff46d9`](https://github.com/apache/spark/commit/6ff46d941a6ddb29345ea0c563aa68b77f540139). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22117 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2234/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22117 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22117: [SPARK-23654][BUILD] remove jets3t as a dependenc...
GitHub user steveloughran opened a pull request: https://github.com/apache/spark/pull/22117 [SPARK-23654][BUILD] remove jets3t as a dependency of spark # What changes were proposed in this pull request? Remove jets3t dependency, and bouncy castle which it brings in; update licenses and deps Note this is just #22081 with merge conflict resolved; submitting to see what jenkins says. # How was this patch tested? Existing tests on a JVM with unlimited Java Crypto Extensions You can merge this pull request into a Git repository by running: $ git pull https://github.com/steveloughran/spark incoming/PR-22081-jets3t Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22117.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22117 commit 3cad78f8bb9bc0dc841cd0c31e0b0d52f8e7c764 Author: Sean Owen Date: 2018-08-11T21:41:38Z Remove jets3t dependency, and bouncy castle which it brings in; update licenses and deps --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22117 **[Test build #94834 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94834/testReport)** for PR 22117 at commit [`3cad78f`](https://github.com/apache/spark/commit/3cad78f8bb9bc0dc841cd0c31e0b0d52f8e7c764). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...
Github user habren commented on the issue: https://github.com/apache/spark/pull/21868 Hi @HyukjinKwon I moved the change to master branch just now. Please help to review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...
Github user habren commented on a diff in the pull request: https://github.com/apache/spark/pull/21868#discussion_r210456342 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -401,12 +399,41 @@ case class FileSourceScanExec( fsRelation: HadoopFsRelation): RDD[InternalRow] = { val defaultMaxSplitBytes = fsRelation.sparkSession.sessionState.conf.filesMaxPartitionBytes -val openCostInBytes = fsRelation.sparkSession.sessionState.conf.filesOpenCostInBytes +var openCostInBytes = fsRelation.sparkSession.sessionState.conf.filesOpenCostInBytes val defaultParallelism = fsRelation.sparkSession.sparkContext.defaultParallelism val totalBytes = selectedPartitions.flatMap(_.files.map(_.getLen + openCostInBytes)).sum val bytesPerCore = totalBytes / defaultParallelism -val maxSplitBytes = Math.min(defaultMaxSplitBytes, Math.max(openCostInBytes, bytesPerCore)) +var maxSplitBytes = Math.min(defaultMaxSplitBytes, Math.max(openCostInBytes, bytesPerCore)) +if(fsRelation.fileFormat.isInstanceOf[ParquetSource] && + fsRelation.sparkSession.sessionState.conf.isParquetSizeAdaptiveEnabled) { + if (relation.dataSchema.map(_.dataType).forall(dataType => +dataType.isInstanceOf[CalendarIntervalType] || dataType.isInstanceOf[StructType] + || dataType.isInstanceOf[MapType] || dataType.isInstanceOf[NullType] + || dataType.isInstanceOf[AtomicType] || dataType.isInstanceOf[ArrayType])) { + +def getTypeLength (dataType : DataType) : Int = { + if (dataType.isInstanceOf[StructType]) { + fsRelation.sparkSession.sessionState.conf.parquetStructTypeLength + } else if (dataType.isInstanceOf[ArrayType]) { + fsRelation.sparkSession.sessionState.conf.parquetArrayTypeLength + } else if (dataType.isInstanceOf[MapType]) { +fsRelation.sparkSession.sessionState.conf.parquetMapTypeLength + } else { +dataType.defaultSize + } +} + +val selectedColumnSize = requiredSchema.map(_.dataType).map(getTypeLength(_)) + .reduceOption(_ + _).getOrElse(StringType.defaultSize) +val totalColumnSize = relation.dataSchema.map(_.dataType).map(getTypeLength(_)) + .reduceOption(_ + _).getOrElse(StringType.defaultSize) +val multiplier = totalColumnSize / selectedColumnSize --- End diff -- @viirya Now it also support ORC. Please help to review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22110#discussion_r210455974 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala --- @@ -73,4 +73,14 @@ object TypeUtils { } x.length - y.length } + + /** + * Returns true if elements of the data type could be used as items of a hash set or as keys + * of a hash map. + */ + def typeCanBeHashed(dataType: DataType): Boolean = dataType match { --- End diff -- hey, this is a weird name, `byte[]` can also be hashed. I'd rather call it `typeWithProperEquals`, and document it as @mgaido91 proposed. I don't think we need to consider `hashCode` here, it's a rule in java world that equals and hashCode should be defined in a coherent way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22116: Update configuration.md
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22116 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22116: Update configuration.md
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22116 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22116: Update configuration.md
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22116 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22116: Update configuration.md
GitHub user KraFusion opened a pull request: https://github.com/apache/spark/pull/22116 Update configuration.md changed $SPARK_HOME/conf/spark-default.conf to $SPARK_HOME/conf/spark-defaults.conf no testing necessary as this was a change to documentation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/KraFusion/spark-1 patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22116.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22116 commit 2b2a61c849ddad680819126f8a6fdc28cbbad721 Author: Joey Krabacher Date: 2018-08-16T01:24:08Z Update configuration.md changed $SPARK_HOME/conf/spark-default.conf to $SPARK_HOME/conf/spark-defaults.conf --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 >> Hello, we've been using your patch at Stripe and we've found something that looks like a new bug: > > Thank you for sharing this, @xinxin-stripe. This is very helpful. I will investigate and report back. I have not been able to reproduce this issue with this branch at commit 0e5594b6ac1dcb94f3f0166e66a7d4e7eae3d00c. However, I'm seeing the same failure scenario as yours on VideoAmp's internal 2.1, 2.2 and 2.3 backports of this branch. I think the reason for this difference is that our internal branches (and probably yours) incorporate rules to support pruning for aggregations. That functionality was removed from this PR. I will fix this and share the fix with you. It would help if you could send me a scenario where you can reproduce this failure with a Spark SQL query. Query plans for datasets built from SQL queries tend to be much more readable. Consider e-mailing me directly on this issue because it does not appear to be strictly related to this PR. My e-mail address is m...@allman.ms. Thanks again! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21123: [SPARK-24045][SQL]Create base class for file data source...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21123 @rdblue I really appreciate your work for adding these new logical plans, and they are indeed incremental: when you add `AppendData`, you change `DataFrameWriter` to only create `AppendData` if `SaveMode` is "append". For other modes, still use the old `WriteToDataSourceV2`. That said, every PR we merged for data source v2, makes data source v2 better and still usable. I don't want to change this policy in #22009. I understand your concern of keeping the bad `SaveMode` API in data source v2, I hate it too. We should definitely revisit it before marking data source v2 as stable, but I don't think we need to rush to a decision in #22009 , which doesn't mark the v2 API stable. What do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21537 To Spark users, introducing AnalysisBarrier is a disaster. However, to the developers of Spark internal, this is just a bug. If you served the customers who are heavily using Spark, you will understand what I am talking about. It is even hard to debug when the Spark jobs are very complex. Normally, we never commit/merge any PR that is useless, especially when the PR changes are not tiny. Reverting this PRs are also very painful. That is why Reynold took a few days to finish it. It is not a fun job for him to rewrite it. Based on the current work, I can expect there are hundreds of PRs that will be submitted for changing the codegen templates and polishing the current code. The reason why we did not merge this PR is that we are doubting this is the right thing to do. @rednaxelafx I am not saying @viirya and @mgaido91 did a bad job to submit many PRs to improve the existing one. However, we need to think of the fundamental problems we are solving in the codegen. Instead of reinventing a compiler, how about letting the compiler internal expert (in our community, we have @kiszk) to lead the effort and offer a design for this. Coding and designing are different issues. If possible, we need to find the best person to drive it. If @viirya and @mgaido91 think they are familiar with compiler internal, I am also glad to see the designs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/22031#discussion_r210452329 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -442,3 +442,91 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } + +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(left, right, func) - Merges the two given arrays, element-wise, into a single array using function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying function.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array('a', 'b', 'c'), (x, y) -> (y, x)); + array(('a', 1), ('b', 3), ('c', 5)) + > SELECT _FUNC_(array(1, 2), array(3, 4), (x, y) -> x + y)); + array(4, 6) + > SELECT _FUNC_(array('a', 'b', 'c'), array('d', 'e', 'f'), (x, y) -> concat(x, y)); + array('ad', 'be', 'cf') + """, + since = "2.4.0") +// scalastyle:on line.size.limit +case class ArraysZipWith( +left: Expression, +right: Expression, +function: Expression) + extends HigherOrderFunction with CodegenFallback with ExpectsInputTypes { + + override def inputs: Seq[Expression] = List(left, right) + + override def functions: Seq[Expression] = List(function) + + def expectingFunctionType: AbstractDataType = AnyDataType + @transient lazy val functionForEval: Expression = functionsForEval.head + + override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, ArrayType, expectingFunctionType) + + override def nullable: Boolean = inputs.exists(_.nullable) + + override def dataType: ArrayType = ArrayType(function.dataType, function.nullable) + + override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction): ArraysZipWith = { +val (leftElementType, leftContainsNull) = left.dataType match { + case ArrayType(elementType, containsNull) => (elementType, containsNull) + case _ => +val ArrayType(elementType, containsNull) = ArrayType.defaultConcreteType +(elementType, containsNull) +} +val (rightElementType, rightContainsNull) = right.dataType match { + case ArrayType(elementType, containsNull) => (elementType, containsNull) + case _ => +val ArrayType(elementType, containsNull) = ArrayType.defaultConcreteType +(elementType, containsNull) +} +copy(function = f(function, + (leftElementType, leftContainsNull) :: (rightElementType, rightContainsNull) :: Nil)) --- End diff -- @mn-mikke @ueshin "both arrays must be the same length" was how zip_with in presto used to work, they've moved to appending nulls and process regardless. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22031 **[Test build #94833 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94833/testReport)** for PR 22031 at commit [`0342ed9`](https://github.com/apache/spark/commit/0342ed934e65c13c43081f464503800118383a44). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22031 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2233/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22031: [SPARK-23932][SQL] Higher order function zip_with
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22031 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22095: [SPARK-23984][K8S] Changed Python Version config ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22095 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21860 **[Test build #94832 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94832/testReport)** for PR 21860 at commit [`768c914`](https://github.com/apache/spark/commit/768c9147c82e3a160bbd6cb29f30da87549518de). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21860 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94832/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21860 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21860 **[Test build #94832 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94832/testReport)** for PR 21860 at commit [`768c914`](https://github.com/apache/spark/commit/768c9147c82e3a160bbd6cb29f30da87549518de). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22106: [SPARK-25116][TESTS]Fix the Kafka cluster leak and clean...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22106 **[Test build #4276 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4276/testReport)** for PR 22106 at commit [`63cc11d`](https://github.com/apache/spark/commit/63cc11dfa575ac25ee3751a93a2cb5a6b9886218). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22106: [SPARK-25116][TESTS]Fix the Kafka cluster leak and clean...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22106 **[Test build #4272 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4272/testReport)** for PR 22106 at commit [`63cc11d`](https://github.com/apache/spark/commit/63cc11dfa575ac25ee3751a93a2cb5a6b9886218). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22106: [SPARK-25116][TESTS]Fix the Kafka cluster leak and clean...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22106 **[Test build #4271 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4271/testReport)** for PR 22106 at commit [`63cc11d`](https://github.com/apache/spark/commit/63cc11dfa575ac25ee3751a93a2cb5a6b9886218). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22106: [SPARK-25116][TESTS]Fix the Kafka cluster leak and clean...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22106 **[Test build #4269 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4269/testReport)** for PR 22106 at commit [`63cc11d`](https://github.com/apache/spark/commit/63cc11dfa575ac25ee3751a93a2cb5a6b9886218). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22101: [SPARK-25114][Core] Fix RecordBinaryComparator when subt...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/22101 ping @gatorsmile @mridulm @squito --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org