[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22009 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94779/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22009 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22009 **[Test build #94779 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94779/testReport)** for PR 22009 at commit [`f4f85a8`](https://github.com/apache/spark/commit/f4f85a833ef319a6860134e12655574aca081ed6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20226: [SPARK-23034][SQL] Override `nodeName` for all *ScanExec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20226 **[Test build #94786 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94786/testReport)** for PR 20226 at commit [`1facc05`](https://github.com/apache/spark/commit/1facc0554aae0829a19bbb7607b25ff7eda4ef8d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20529: [SPARK-23350][SS]Bug fix for exception handling w...
Github user yanlin-Lynn closed the pull request at: https://github.com/apache/spark/pull/20529 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20529: [SPARK-23350][SS]Bug fix for exception handling when sto...
Github user yanlin-Lynn commented on the issue: https://github.com/apache/spark/pull/20529 No Need any more, close this patch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94785/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94785 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94785/testReport)** for PR 21889 at commit [`1c0c4bf`](https://github.com/apache/spark/commit/1c0c4bf14172dd2257fe1d00fc0aeed78aa1cb84). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21889: [SPARK-4502][SQL] Parquet nested column pruning -...
Github user ajacques commented on a diff in the pull request: https://github.com/apache/spark/pull/21889#discussion_r210170646 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -0,0 +1,245 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.parquet + +import java.io.File + +import org.scalactic.Equality + +import org.apache.spark.sql.{DataFrame, QueryTest, Row} +import org.apache.spark.sql.catalyst.parser.CatalystSqlParser +import org.apache.spark.sql.execution.FileSourceScanExec +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types.StructType + +class ParquetSchemaPruningSuite +extends QueryTest +with ParquetTest +with SharedSQLContext { + case class FullName(first: String, middle: String, last: String) + case class Contact( +id: Int, +name: FullName, +address: String, +pets: Int, +friends: Array[FullName] = Array(), +relatives: Map[String, FullName] = Map()) + + val janeDoe = FullName("Jane", "X.", "Doe") + val johnDoe = FullName("John", "Y.", "Doe") + val susanSmith = FullName("Susan", "Z.", "Smith") + + private val contacts = +Contact(0, janeDoe, "123 Main Street", 1, friends = Array(susanSmith), + relatives = Map("brother" -> johnDoe)) :: +Contact(1, johnDoe, "321 Wall Street", 3, relatives = Map("sister" -> janeDoe)) :: Nil + + case class Name(first: String, last: String) + case class BriefContact(id: Int, name: Name, address: String) + + private val briefContacts = +BriefContact(2, Name("Janet", "Jones"), "567 Maple Drive") :: +BriefContact(3, Name("Jim", "Jones"), "6242 Ash Street") :: Nil + + case class ContactWithDataPartitionColumn( +id: Int, +name: FullName, +address: String, +pets: Int, +friends: Array[FullName] = Array(), --- End diff -- This is a definition in a constructor, so I don't think we can do: ```scala case class Contact( [...] friends: Array.empty[FullName], relatives: Map.empty[String, FullName] ) ``` Scala wants a colon, so I opted for: ```scala case class Contact( [...] friends: Array[FullName] = Array.empty, relatives: Map[String, FullName] = Map.empty ) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22103: [SPARK-25113][SQL] Add logging to CodeGenerator w...
Github user rednaxelafx commented on a diff in the pull request: https://github.com/apache/spark/pull/22103#discussion_r210169785 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1385,9 +1386,15 @@ object CodeGenerator extends Logging { try { val cf = new ClassFile(new ByteArrayInputStream(classBytes)) val stats = cf.methodInfos.asScala.flatMap { method => - method.getAttributes().filter(_.getClass.getName == codeAttr.getName).map { a => + method.getAttributes().filter(_.getClass eq codeAttr).map { a => --- End diff -- > I worried whether the strictness may change behavior. Right, I can tell. And as I've mentioned above, although my new check is stricter, it doesn't make the behavior any "worse" than before, because we're reflectively accessing the `code` field immediately after via `codeAttrField.get(a)`, and that won't work unless the classes are matching exactly. The old code before my change would actually be too permissive -- in the case of class loader mismatch, the old check will allow the it go run to the reflective access site, but it'll then fail because reflection doesn't allow access from the wrong class. This can be exemplified by the following pseudocode ``` val c1 = new URLClassLoader(somePath).loadClass("Foo") // load a class val c2 = new URLClassLoader(somePath).loadClass("Foo") // load another class with the same name from the same path, but different class loader val nameEq = c1.getName == c2.getName // true val refEq = c1 eq c2 // false val f1 = c1.getClass.getField("a") val o1 = c1.newInstance val o2 = c2.newInstance f1.get(o1) // okay f1.get(o2) // fail with exception ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94785 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94785/testReport)** for PR 21889 at commit [`1c0c4bf`](https://github.com/apache/spark/commit/1c0c4bf14172dd2257fe1d00fc0aeed78aa1cb84). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22031: [TODO][SPARK-23932][SQL] Higher order function zip_with
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/22031 @techaddict Thanks! I look forward to the update. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22031: [TODO][SPARK-23932][SQL] Higher order function zip_with
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/22031 Hi @ueshin I will update the PR tommorow --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22031: [TODO][SPARK-23932][SQL] Higher order function zip_with
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/22031 Hi @techaddict, Do you have time to continue working on this? If you don't have enough time, I can take this over, so please let me know. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22093: [SPARK-25100][CORE] Fix no registering TaskCommitMessage...
Github user deshanxiao commented on the issue: https://github.com/apache/spark/pull/22093 @xuanyuanking I think it carefully later. You are right. The UT just need to guarantee the KryoSerializer right not all. I will add it in `KryoSerializerSuite`. Should I delete current UT from FileSuit? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22099: [SPARK-25111][BUILD] increment kinesis client/producer &...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22099 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22099: [SPARK-25111][BUILD] increment kinesis client/producer &...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22099 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94776/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22099: [SPARK-25111][BUILD] increment kinesis client/producer &...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22099 **[Test build #94776 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94776/testReport)** for PR 22099 at commit [`e79e5b9`](https://github.com/apache/spark/commit/e79e5b9c0bbdf24dcc3cda30dc2c1a70d12b02aa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22045#discussion_r210164955 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -497,6 +497,60 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } +/** + * Returns a map that applies the function to each value of the map. + */ +@ExpressionDescription( +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.", +examples = """ +Examples: + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> v + 1); +map(array(1, 2, 3), array(2, 3, 4)) + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> k + v); +map(array(1, 2, 3), array(2, 4, 6)) + """, +since = "2.4.0") +case class TransformValues( +argument: Expression, +function: Expression) + extends MapBasedSimpleHigherOrderFunction with CodegenFallback { + + override def nullable: Boolean = argument.nullable + + override def dataType: DataType = { +val map = argument.dataType.asInstanceOf[MapType] +MapType(map.keyType, function.dataType, function.nullable) + } + + @transient val MapType(keyType, valueType, valueContainsNull) = argument.dataType --- End diff -- `lazy val`? Could you add a test when argument is not a map in invalid cases of `DataFrameFunctionsSuite`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22045#discussion_r210165194 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala --- @@ -95,6 +95,12 @@ class HigherOrderFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper aggregate(expr, zero, merge, identity) } + def transformValues(expr: Expression, f: (Expression, Expression) => Expression): Expression = { +val valueType = expr.dataType.asInstanceOf[MapType].valueType +val keyType = expr.dataType.asInstanceOf[MapType].keyType +TransformValues(expr, createLambda(keyType, false, valueType, true, f)) --- End diff -- We should use `valueContainsNull` instead of `true`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22045#discussion_r210165373 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala --- @@ -283,6 +289,61 @@ class HigherOrderFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper 15) } + test("TransformValues") { +val ai0 = Literal.create( + Map(1 -> 1, 2 -> 2, 3 -> 3), + MapType(IntegerType, IntegerType)) +val ai1 = Literal.create( + Map(1 -> 1, 2 -> null, 3 -> 3), + MapType(IntegerType, IntegerType)) +val ain = Literal.create( + Map.empty[Int, Int], + MapType(IntegerType, IntegerType)) --- End diff -- Can you add tests for `Literal.create(null, MapType(IntegerType, IntegerType))`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22045#discussion_r210165543 --- Diff: sql/core/src/test/resources/sql-tests/inputs/higher-order-functions.sql --- @@ -51,3 +51,17 @@ select exists(ys, y -> y > 30) as v from nested; -- Check for element existence in a null array select exists(cast(null as array), y -> y > 30) as v; + +create or replace temporary view nested as values + (1, map(1,1,2,2,3,3)), + (2, map(4,4,5,5,6,6)) --- End diff -- nit: ``` (1, map(1, 1, 2, 2, 3, 3)), (2, map(4, 4, 5, 5, 6, 6)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22045#discussion_r210164879 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -497,6 +497,60 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } +/** + * Returns a map that applies the function to each value of the map. + */ +@ExpressionDescription( +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.", +examples = """ +Examples: + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> v + 1); --- End diff -- nit: we need one more right parenthesis after the second `array(1, 2, 3)`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22045#discussion_r210164976 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -497,6 +497,60 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } +/** + * Returns a map that applies the function to each value of the map. + */ +@ExpressionDescription( +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.", +examples = """ +Examples: + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> v + 1); +map(array(1, 2, 3), array(2, 3, 4)) + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> k + v); +map(array(1, 2, 3), array(2, 4, 6)) + """, +since = "2.4.0") +case class TransformValues( +argument: Expression, +function: Expression) + extends MapBasedSimpleHigherOrderFunction with CodegenFallback { + + override def nullable: Boolean = argument.nullable + + override def dataType: DataType = { +val map = argument.dataType.asInstanceOf[MapType] +MapType(map.keyType, function.dataType, function.nullable) --- End diff -- We can use `keyType` from the following val? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22045#discussion_r210165225 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala --- @@ -283,6 +289,61 @@ class HigherOrderFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper 15) } + test("TransformValues") { +val ai0 = Literal.create( + Map(1 -> 1, 2 -> 2, 3 -> 3), + MapType(IntegerType, IntegerType)) --- End diff -- Can you add `valueContainsNull` explicitly? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22045#discussion_r210165102 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -497,6 +497,60 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } +/** + * Returns a map that applies the function to each value of the map. + */ +@ExpressionDescription( +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.", +examples = """ +Examples: + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> v + 1); +map(array(1, 2, 3), array(2, 3, 4)) + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> k + v); +map(array(1, 2, 3), array(2, 4, 6)) + """, +since = "2.4.0") +case class TransformValues( +argument: Expression, +function: Expression) + extends MapBasedSimpleHigherOrderFunction with CodegenFallback { + + override def nullable: Boolean = argument.nullable + + override def dataType: DataType = { +val map = argument.dataType.asInstanceOf[MapType] +MapType(map.keyType, function.dataType, function.nullable) + } + + @transient val MapType(keyType, valueType, valueContainsNull) = argument.dataType + + override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction) + : TransformValues = { +copy(function = f(function, (keyType, false) :: (valueType, valueContainsNull) :: Nil)) + } + + @transient lazy val (keyVar, valueVar) = { +val LambdaFunction( +_, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, _) = function +(keyVar, valueVar) + } --- End diff -- nit: how about: ```scala @transient lazy val LambdaFunction(_, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, _) = function ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22045#discussion_r210165448 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -2302,6 +2302,210 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { assert(ex5.getMessage.contains("function map_zip_with does not support ordering on type map")) } + test("transform values function - test various primitive data types combinations") { --- End diff -- We don't need so many cases here. We only need to verify the api works end to end. Evaluation checks of the function should be in `HigherOrderFunctionsSuite`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21661: [SPARK-24685][build] Restore support for building old Ha...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21661 **[Test build #94784 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94784/testReport)** for PR 21661 at commit [`34ca4ef`](https://github.com/apache/spark/commit/34ca4ef2b0757b2832ac2dbcc364b42eb4f34e48). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21661: [SPARK-24685][build] Restore support for building old Ha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21661 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2202/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21661: [SPARK-24685][build] Restore support for building old Ha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21661 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94781/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21561 **[Test build #94781 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94781/testReport)** for PR 21561 at commit [`ecab85c`](https://github.com/apache/spark/commit/ecab85c921fbc81865b800be25d533fea8e75fd5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21661: [SPARK-24685][build] Restore support for building old Ha...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21661 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22045 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22045 **[Test build #94783 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94783/testReport)** for PR 22045 at commit [`b73106d`](https://github.com/apache/spark/commit/b73106d43000972ab9adae3d3b463a0dada2b9cc). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22045 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94783/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22045 **[Test build #94783 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94783/testReport)** for PR 22045 at commit [`b73106d`](https://github.com/apache/spark/commit/b73106d43000972ab9adae3d3b463a0dada2b9cc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22013#discussion_r210165079 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -497,6 +497,65 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } +/** + * Transform Keys for every entry of the map by applying the transform_keys function. + * Returns map with transformed key entries + */ +@ExpressionDescription( + usage = "_FUNC_(expr, func) - Transforms elements in a map using the function.", + examples = """ +Examples: + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 1); + map(array(2, 3, 4), array(1, 2, 3)) + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v); + map(array(2, 4, 6), array(1, 2, 3)) + """, + since = "2.4.0") +case class TransformKeys( +argument: Expression, +function: Expression) + extends MapBasedSimpleHigherOrderFunction with CodegenFallback { + + override def nullable: Boolean = argument.nullable + + override def dataType: DataType = { +val map = argument.dataType.asInstanceOf[MapType] +MapType(function.dataType, map.valueType, map.valueContainsNull) + } + + @transient val MapType(keyType, valueType, valueContainsNull) = argument.dataType + + override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction): TransformKeys = { +copy(function = f(function, (keyType, false) :: (valueType, valueContainsNull) :: Nil)) + } + + @transient lazy val (keyVar, valueVar) = { +val LambdaFunction( +_, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, _) = function +(keyVar, valueVar) + } --- End diff -- nit: how about: ```scala @transient lazy val LambdaFunction(_, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, _) = function ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/22045 ok to test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94780/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17086 **[Test build #94780 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94780/testReport)** for PR 17086 at commit [`9e59fd5`](https://github.com/apache/spark/commit/9e59fd592e9cbe43e9fc3d5c317cd3c4e2d6ac43). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22013: [SPARK-23939][SQL] Add transform_keys function
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/22013 Btw, we need one more right parenthesis after the second `array(1, 2, 3)` and a space at `(k,v)` in the description? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...
Github user deshanxiao commented on the issue: https://github.com/apache/spark/pull/22109 please help me cc @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22013#discussion_r210161746 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -2302,6 +2302,97 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { assert(ex5.getMessage.contains("function map_zip_with does not support ordering on type map")) } + test("transform keys function - test various primitive data types combinations") { +val dfExample1 = Seq( + Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7) +).toDF("i") + +val dfExample2 = Seq( + Map[Int, Double](1 -> 1.0E0, 2 -> 1.4E0, 3 -> 1.7E0) +).toDF("j") + +val dfExample3 = Seq( + Map[Int, Boolean](25 -> true, 26 -> false) +).toDF("x") + +val dfExample4 = Seq( + Map[Array[Int], Boolean](Array(1, 2) -> false) +).toDF("y") + + +def testMapOfPrimitiveTypesCombination(): Unit = { + checkAnswer(dfExample1.selectExpr("transform_keys(i, (k, v) -> k + v)"), +Seq(Row(Map(2 -> 1, 18 -> 9, 16 -> 8, 14 -> 7 + + checkAnswer(dfExample2.selectExpr("transform_keys(j, " + +"(k, v) -> map_from_arrays(ARRAY(1, 2, 3), ARRAY('one', 'two', 'three'))[k])"), +Seq(Row(Map("one" -> 1.0, "two" -> 1.4, "three" -> 1.7 + + checkAnswer(dfExample2.selectExpr("transform_keys(j, (k, v) -> CAST(v * 2 AS BIGINT) + k)"), +Seq(Row(Map(3 -> 1.0, 4 -> 1.4, 6 -> 1.7 + + checkAnswer(dfExample2.selectExpr("transform_keys(j, (k, v) -> k + v)"), +Seq(Row(Map(2.0 -> 1.0, 3.4 -> 1.4, 4.7 -> 1.7 + + checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> k % 2 = 0 OR v)"), +Seq(Row(Map(true -> true, true -> false + + checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> if(v, 2 * k, 3 * k))"), +Seq(Row(Map(50 -> true, 78 -> false + + checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> if(v, 2 * k, 3 * k))"), +Seq(Row(Map(50 -> true, 78 -> false + + checkAnswer(dfExample4.selectExpr("transform_keys(y, (k, v) -> array_contains(k, 3) AND v)"), +Seq(Row(Map(false -> false +} +// Test with local relation, the Project will be evaluated without codegen +testMapOfPrimitiveTypesCombination() +dfExample1.cache() +dfExample2.cache() +dfExample3.cache() +dfExample4.cache() +// Test with cached relation, the Project will be evaluated with codegen +testMapOfPrimitiveTypesCombination() + } + + test("transform keys function - Invalid lambda functions and exceptions") { +val dfExample1 = Seq( + Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7) +).toDF("i") + +val dfExample2 = Seq( + Map[String, String]("a" -> "b") +).toDF("j") + +val dfExample3 = Seq( + Map[String, String]("a" -> null) +).toDF("x") + +def testInvalidLambdaFunctions(): Unit = { + val ex1 = intercept[AnalysisException] { +dfExample1.selectExpr("transform_keys(i, k -> k )") + } + assert(ex1.getMessage.contains("The number of lambda function arguments '1' does not match")) + + val ex2 = intercept[AnalysisException] { +dfExample2.selectExpr("transform_keys(j, (k, v, x) -> k + 1)") + } + assert(ex2.getMessage.contains( + "The number of lambda function arguments '3' does not match")) --- End diff -- nit: indent --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22013#discussion_r210161509 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -2302,6 +2302,97 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { assert(ex5.getMessage.contains("function map_zip_with does not support ordering on type map")) } + test("transform keys function - test various primitive data types combinations") { +val dfExample1 = Seq( + Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7) +).toDF("i") + +val dfExample2 = Seq( + Map[Int, Double](1 -> 1.0E0, 2 -> 1.4E0, 3 -> 1.7E0) +).toDF("j") + +val dfExample3 = Seq( + Map[Int, Boolean](25 -> true, 26 -> false) +).toDF("x") + +val dfExample4 = Seq( + Map[Array[Int], Boolean](Array(1, 2) -> false) +).toDF("y") + + +def testMapOfPrimitiveTypesCombination(): Unit = { + checkAnswer(dfExample1.selectExpr("transform_keys(i, (k, v) -> k + v)"), +Seq(Row(Map(2 -> 1, 18 -> 9, 16 -> 8, 14 -> 7 + + checkAnswer(dfExample2.selectExpr("transform_keys(j, " + +"(k, v) -> map_from_arrays(ARRAY(1, 2, 3), ARRAY('one', 'two', 'three'))[k])"), +Seq(Row(Map("one" -> 1.0, "two" -> 1.4, "three" -> 1.7 + + checkAnswer(dfExample2.selectExpr("transform_keys(j, (k, v) -> CAST(v * 2 AS BIGINT) + k)"), +Seq(Row(Map(3 -> 1.0, 4 -> 1.4, 6 -> 1.7 + + checkAnswer(dfExample2.selectExpr("transform_keys(j, (k, v) -> k + v)"), +Seq(Row(Map(2.0 -> 1.0, 3.4 -> 1.4, 4.7 -> 1.7 + + checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> k % 2 = 0 OR v)"), +Seq(Row(Map(true -> true, true -> false + + checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> if(v, 2 * k, 3 * k))"), +Seq(Row(Map(50 -> true, 78 -> false + + checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> if(v, 2 * k, 3 * k))"), +Seq(Row(Map(50 -> true, 78 -> false + + checkAnswer(dfExample4.selectExpr("transform_keys(y, (k, v) -> array_contains(k, 3) AND v)"), +Seq(Row(Map(false -> false +} +// Test with local relation, the Project will be evaluated without codegen +testMapOfPrimitiveTypesCombination() +dfExample1.cache() +dfExample2.cache() +dfExample3.cache() +dfExample4.cache() +// Test with cached relation, the Project will be evaluated with codegen +testMapOfPrimitiveTypesCombination() + } + + test("transform keys function - Invalid lambda functions and exceptions") { +val dfExample1 = Seq( + Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7) +).toDF("i") + +val dfExample2 = Seq( + Map[String, String]("a" -> "b") +).toDF("j") + +val dfExample3 = Seq( + Map[String, String]("a" -> null) +).toDF("x") + +def testInvalidLambdaFunctions(): Unit = { + val ex1 = intercept[AnalysisException] { +dfExample1.selectExpr("transform_keys(i, k -> k )") --- End diff -- nit: extra space after `k -> k`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22013#discussion_r210162791 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala --- @@ -283,6 +289,75 @@ class HigherOrderFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper 15) } + test("TransformKeys") { +val ai0 = Literal.create( + Map(1 -> 1, 2 -> 2, 3 -> 3, 4 -> 4), + MapType(IntegerType, IntegerType, valueContainsNull = false)) +val ai1 = Literal.create( + Map.empty[Int, Int], + MapType(IntegerType, IntegerType, valueContainsNull = true)) +val ai2 = Literal.create( + Map(1 -> 1, 2 -> null, 3 -> 3), + MapType(IntegerType, IntegerType, valueContainsNull = true)) +val ai3 = Literal.create(null, MapType(IntegerType, IntegerType, valueContainsNull = false)) + +val plusOne: (Expression, Expression) => Expression = (k, v) => k + 1 +val plusValue: (Expression, Expression) => Expression = (k, v) => k + v +val modKey: (Expression, Expression) => Expression = (k, v) => k % 3 + +checkEvaluation(transformKeys(ai0, plusOne), Map(2 -> 1, 3 -> 2, 4 -> 3, 5 -> 4)) +checkEvaluation(transformKeys(ai0, plusValue), Map(2 -> 1, 4 -> 2, 6 -> 3, 8 -> 4)) +checkEvaluation( + transformKeys(transformKeys(ai0, plusOne), plusValue), Map(3 -> 1, 5 -> 2, 7 -> 3, 9 -> 4)) +checkEvaluation(transformKeys(ai0, modKey), + ArrayBasedMapData(Array(1, 2, 0, 1), Array(1, 2, 3, 4))) +checkEvaluation(transformKeys(ai1, plusOne), Map.empty[Int, Int]) +checkEvaluation(transformKeys(ai1, plusOne), Map.empty[Int, Int]) +checkEvaluation( + transformKeys(transformKeys(ai1, plusOne), plusValue), Map.empty[Int, Int]) +checkEvaluation(transformKeys(ai2, plusOne), Map(2 -> 1, 3 -> null, 4 -> 3)) +checkEvaluation( + transformKeys(transformKeys(ai2, plusOne), plusOne), Map(3 -> 1, 4 -> null, 5 -> 3)) +checkEvaluation(transformKeys(ai3, plusOne), null) + +val as0 = Literal.create( + Map("a" -> "xy", "bb" -> "yz", "ccc" -> "zx"), + MapType(StringType, StringType, valueContainsNull = false)) +val as1 = Literal.create( + Map("a" -> "xy", "bb" -> "yz", "ccc" -> null), + MapType(StringType, StringType, valueContainsNull = true)) +val as2 = Literal.create(null, + MapType(StringType, StringType, valueContainsNull = false)) +val asn = Literal.create(Map.empty[StringType, StringType], --- End diff -- `as3`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22013#discussion_r210161936 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -2302,6 +2302,97 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { assert(ex5.getMessage.contains("function map_zip_with does not support ordering on type map")) } + test("transform keys function - test various primitive data types combinations") { +val dfExample1 = Seq( + Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7) +).toDF("i") + +val dfExample2 = Seq( + Map[Int, Double](1 -> 1.0E0, 2 -> 1.4E0, 3 -> 1.7E0) --- End diff -- Do we need `E0`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22013#discussion_r210162501 --- Diff: sql/core/src/test/resources/sql-tests/inputs/higher-order-functions.sql --- @@ -51,3 +51,17 @@ select exists(ys, y -> y > 30) as v from nested; -- Check for element existence in a null array select exists(cast(null as array), y -> y > 30) as v; + +create or replace temporary view nested as values + (1, map(1,1,2,2,3,3)), + (2, map(4,4,5,5,6,6)) --- End diff -- nit: ``` (1, map(1, 1, 2, 2, 3, 3)), (2, map(4, 4, 5, 5, 6, 6)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22013#discussion_r210161616 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -2302,6 +2302,97 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { assert(ex5.getMessage.contains("function map_zip_with does not support ordering on type map")) } + test("transform keys function - test various primitive data types combinations") { +val dfExample1 = Seq( + Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7) +).toDF("i") + +val dfExample2 = Seq( + Map[Int, Double](1 -> 1.0E0, 2 -> 1.4E0, 3 -> 1.7E0) +).toDF("j") + +val dfExample3 = Seq( + Map[Int, Boolean](25 -> true, 26 -> false) +).toDF("x") + +val dfExample4 = Seq( + Map[Array[Int], Boolean](Array(1, 2) -> false) +).toDF("y") + + +def testMapOfPrimitiveTypesCombination(): Unit = { + checkAnswer(dfExample1.selectExpr("transform_keys(i, (k, v) -> k + v)"), +Seq(Row(Map(2 -> 1, 18 -> 9, 16 -> 8, 14 -> 7 + + checkAnswer(dfExample2.selectExpr("transform_keys(j, " + +"(k, v) -> map_from_arrays(ARRAY(1, 2, 3), ARRAY('one', 'two', 'three'))[k])"), +Seq(Row(Map("one" -> 1.0, "two" -> 1.4, "three" -> 1.7 + + checkAnswer(dfExample2.selectExpr("transform_keys(j, (k, v) -> CAST(v * 2 AS BIGINT) + k)"), +Seq(Row(Map(3 -> 1.0, 4 -> 1.4, 6 -> 1.7 + + checkAnswer(dfExample2.selectExpr("transform_keys(j, (k, v) -> k + v)"), +Seq(Row(Map(2.0 -> 1.0, 3.4 -> 1.4, 4.7 -> 1.7 + + checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> k % 2 = 0 OR v)"), +Seq(Row(Map(true -> true, true -> false + + checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> if(v, 2 * k, 3 * k))"), +Seq(Row(Map(50 -> true, 78 -> false + + checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> if(v, 2 * k, 3 * k))"), +Seq(Row(Map(50 -> true, 78 -> false + + checkAnswer(dfExample4.selectExpr("transform_keys(y, (k, v) -> array_contains(k, 3) AND v)"), +Seq(Row(Map(false -> false +} +// Test with local relation, the Project will be evaluated without codegen +testMapOfPrimitiveTypesCombination() +dfExample1.cache() +dfExample2.cache() +dfExample3.cache() +dfExample4.cache() +// Test with cached relation, the Project will be evaluated with codegen +testMapOfPrimitiveTypesCombination() + } + + test("transform keys function - Invalid lambda functions and exceptions") { +val dfExample1 = Seq( + Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7) +).toDF("i") + +val dfExample2 = Seq( + Map[String, String]("a" -> "b") +).toDF("j") + +val dfExample3 = Seq( + Map[String, String]("a" -> null) +).toDF("x") + +def testInvalidLambdaFunctions(): Unit = { + val ex1 = intercept[AnalysisException] { +dfExample1.selectExpr("transform_keys(i, k -> k )") + } + assert(ex1.getMessage.contains("The number of lambda function arguments '1' does not match")) + + val ex2 = intercept[AnalysisException] { +dfExample2.selectExpr("transform_keys(j, (k, v, x) -> k + 1)") + } + assert(ex2.getMessage.contains( + "The number of lambda function arguments '3' does not match")) + + val ex3 = intercept[RuntimeException] { +dfExample3.selectExpr("transform_keys(x, (k, v) -> v)").show() + } + assert(ex3.getMessage.contains("Cannot use null as map key!")) --- End diff -- Seems like we can do those tests only with `dfExample3`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22013#discussion_r210163358 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -2302,6 +2302,97 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { assert(ex5.getMessage.contains("function map_zip_with does not support ordering on type map")) } + test("transform keys function - test various primitive data types combinations") { +val dfExample1 = Seq( + Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7) +).toDF("i") + +val dfExample2 = Seq( + Map[Int, Double](1 -> 1.0E0, 2 -> 1.4E0, 3 -> 1.7E0) +).toDF("j") + +val dfExample3 = Seq( + Map[Int, Boolean](25 -> true, 26 -> false) +).toDF("x") + +val dfExample4 = Seq( + Map[Array[Int], Boolean](Array(1, 2) -> false) +).toDF("y") + + +def testMapOfPrimitiveTypesCombination(): Unit = { + checkAnswer(dfExample1.selectExpr("transform_keys(i, (k, v) -> k + v)"), +Seq(Row(Map(2 -> 1, 18 -> 9, 16 -> 8, 14 -> 7 + + checkAnswer(dfExample2.selectExpr("transform_keys(j, " + +"(k, v) -> map_from_arrays(ARRAY(1, 2, 3), ARRAY('one', 'two', 'three'))[k])"), +Seq(Row(Map("one" -> 1.0, "two" -> 1.4, "three" -> 1.7 + + checkAnswer(dfExample2.selectExpr("transform_keys(j, (k, v) -> CAST(v * 2 AS BIGINT) + k)"), +Seq(Row(Map(3 -> 1.0, 4 -> 1.4, 6 -> 1.7 + + checkAnswer(dfExample2.selectExpr("transform_keys(j, (k, v) -> k + v)"), +Seq(Row(Map(2.0 -> 1.0, 3.4 -> 1.4, 4.7 -> 1.7 + + checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> k % 2 = 0 OR v)"), +Seq(Row(Map(true -> true, true -> false + + checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> if(v, 2 * k, 3 * k))"), +Seq(Row(Map(50 -> true, 78 -> false + + checkAnswer(dfExample3.selectExpr("transform_keys(x, (k, v) -> if(v, 2 * k, 3 * k))"), +Seq(Row(Map(50 -> true, 78 -> false + + checkAnswer(dfExample4.selectExpr("transform_keys(y, (k, v) -> array_contains(k, 3) AND v)"), +Seq(Row(Map(false -> false +} +// Test with local relation, the Project will be evaluated without codegen +testMapOfPrimitiveTypesCombination() +dfExample1.cache() +dfExample2.cache() +dfExample3.cache() +dfExample4.cache() +// Test with cached relation, the Project will be evaluated with codegen +testMapOfPrimitiveTypesCombination() + } + + test("transform keys function - Invalid lambda functions and exceptions") { +val dfExample1 = Seq( + Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7) +).toDF("i") + +val dfExample2 = Seq( + Map[String, String]("a" -> "b") +).toDF("j") + +val dfExample3 = Seq( + Map[String, String]("a" -> null) +).toDF("x") + +def testInvalidLambdaFunctions(): Unit = { + val ex1 = intercept[AnalysisException] { +dfExample1.selectExpr("transform_keys(i, k -> k )") + } + assert(ex1.getMessage.contains("The number of lambda function arguments '1' does not match")) + + val ex2 = intercept[AnalysisException] { +dfExample2.selectExpr("transform_keys(j, (k, v, x) -> k + 1)") + } + assert(ex2.getMessage.contains( + "The number of lambda function arguments '3' does not match")) + + val ex3 = intercept[RuntimeException] { +dfExample3.selectExpr("transform_keys(x, (k, v) -> v)").show() + } + assert(ex3.getMessage.contains("Cannot use null as map key!")) +} + +testInvalidLambdaFunctions() +dfExample1.cache() +dfExample2.cache() +testInvalidLambdaFunctions() --- End diff -- We need `dfExample3.cache()` as well? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22013#discussion_r210160909 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -497,6 +497,65 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } +/** + * Transform Keys for every entry of the map by applying the transform_keys function. + * Returns map with transformed key entries + */ +@ExpressionDescription( + usage = "_FUNC_(expr, func) - Transforms elements in a map using the function.", + examples = """ +Examples: + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 1); + map(array(2, 3, 4), array(1, 2, 3)) + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v); + map(array(2, 4, 6), array(1, 2, 3)) + """, + since = "2.4.0") +case class TransformKeys( +argument: Expression, +function: Expression) + extends MapBasedSimpleHigherOrderFunction with CodegenFallback { + + override def nullable: Boolean = argument.nullable + + override def dataType: DataType = { +val map = argument.dataType.asInstanceOf[MapType] +MapType(function.dataType, map.valueType, map.valueContainsNull) + } + + @transient val MapType(keyType, valueType, valueContainsNull) = argument.dataType + + override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction): TransformKeys = { +copy(function = f(function, (keyType, false) :: (valueType, valueContainsNull) :: Nil)) + } + + @transient lazy val (keyVar, valueVar) = { +val LambdaFunction( +_, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, _) = function +(keyVar, valueVar) + } + + override def nullSafeEval(inputRow: InternalRow, argumentValue: Any): Any = { +val map = argumentValue.asInstanceOf[MapData] +val f = functionForEval +val resultKeys = new GenericArrayData(new Array[Any](map.numElements)) +var i = 0 +while (i < map.numElements) { + keyVar.value.set(map.keyArray().get(i, keyVar.dataType)) + valueVar.value.set(map.valueArray().get(i, valueVar.dataType)) + val result = f.eval(inputRow) + if (result == null) { +throw new RuntimeException("Cannot use null as map key!") + } + resultKeys.update(i, result) + i += 1 +} +new ArrayBasedMapData(resultKeys, map.valueArray()) + } + + override def prettyName: String = "transform_keys" + } --- End diff -- nit: indent --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22013#discussion_r210159871 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -497,6 +497,65 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } +/** + * Transform Keys for every entry of the map by applying the transform_keys function. + * Returns map with transformed key entries + */ +@ExpressionDescription( + usage = "_FUNC_(expr, func) - Transforms elements in a map using the function.", + examples = """ +Examples: + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 1); + map(array(2, 3, 4), array(1, 2, 3)) + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v); + map(array(2, 4, 6), array(1, 2, 3)) + """, + since = "2.4.0") +case class TransformKeys( +argument: Expression, +function: Expression) + extends MapBasedSimpleHigherOrderFunction with CodegenFallback { + + override def nullable: Boolean = argument.nullable + + override def dataType: DataType = { +val map = argument.dataType.asInstanceOf[MapType] +MapType(function.dataType, map.valueType, map.valueContainsNull) + } + + @transient val MapType(keyType, valueType, valueContainsNull) = argument.dataType + + override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction): TransformKeys = { +copy(function = f(function, (keyType, false) :: (valueType, valueContainsNull) :: Nil)) + } + + @transient lazy val (keyVar, valueVar) = { +val LambdaFunction( +_, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, _) = function +(keyVar, valueVar) + } + + override def nullSafeEval(inputRow: InternalRow, argumentValue: Any): Any = { +val map = argumentValue.asInstanceOf[MapData] +val f = functionForEval +val resultKeys = new GenericArrayData(new Array[Any](map.numElements)) +var i = 0 +while (i < map.numElements) { + keyVar.value.set(map.keyArray().get(i, keyVar.dataType)) + valueVar.value.set(map.valueArray().get(i, valueVar.dataType)) + val result = f.eval(inputRow) + if (result == null) { --- End diff -- nit: extra space between `==` and `null`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22013#discussion_r210160419 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -497,6 +497,65 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } +/** + * Transform Keys for every entry of the map by applying the transform_keys function. + * Returns map with transformed key entries + */ +@ExpressionDescription( + usage = "_FUNC_(expr, func) - Transforms elements in a map using the function.", + examples = """ +Examples: + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 1); + map(array(2, 3, 4), array(1, 2, 3)) + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v); + map(array(2, 4, 6), array(1, 2, 3)) + """, + since = "2.4.0") +case class TransformKeys( +argument: Expression, +function: Expression) + extends MapBasedSimpleHigherOrderFunction with CodegenFallback { + + override def nullable: Boolean = argument.nullable + + override def dataType: DataType = { +val map = argument.dataType.asInstanceOf[MapType] +MapType(function.dataType, map.valueType, map.valueContainsNull) + } + + @transient val MapType(keyType, valueType, valueContainsNull) = argument.dataType --- End diff -- `lazy val`? Could you add a test when `argument` is not a map in invalid cases of `DataFrameFunctionsSuite`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22013: [SPARK-23939][SQL] Add transform_keys function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22013#discussion_r210160577 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala --- @@ -497,6 +497,65 @@ case class ArrayAggregate( override def prettyName: String = "aggregate" } +/** + * Transform Keys for every entry of the map by applying the transform_keys function. + * Returns map with transformed key entries + */ +@ExpressionDescription( + usage = "_FUNC_(expr, func) - Transforms elements in a map using the function.", + examples = """ +Examples: + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 1); + map(array(2, 3, 4), array(1, 2, 3)) + > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v); + map(array(2, 4, 6), array(1, 2, 3)) + """, + since = "2.4.0") +case class TransformKeys( +argument: Expression, +function: Expression) + extends MapBasedSimpleHigherOrderFunction with CodegenFallback { + + override def nullable: Boolean = argument.nullable + + override def dataType: DataType = { +val map = argument.dataType.asInstanceOf[MapType] +MapType(function.dataType, map.valueType, map.valueContainsNull) --- End diff -- We can use `valueType` and `valueContainsNull` from the following val? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21123: [SPARK-24045][SQL]Create base class for file data source...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21123 @gatorsmile Yea, it shouldn't break the compatibility for now for the reasons described in the PR I guess. @rdblue, however, strictly isn't that specific to Datasource V2, or did I misunderstand? I was wondering what blocks this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22093: [SPARK-25100][CORE] Fix no registering TaskCommitMessage...
Github user deshanxiao commented on the issue: https://github.com/apache/spark/pull/22093 @xuanyuanking Thanks for your suggestion. We could indeed test whether `TaskCommitMessage` can be serialized by `KryoSerializer`. But We could not explain why the framework must serialize `TaskCommitMessage`. So we create our own `SparkContext`, It could be more convinced to show that it won't work because of the framework not registering `TaskCommitMessage`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22013: [SPARK-23939][SQL] Add transform_keys function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22013 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94775/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22013: [SPARK-23939][SQL] Add transform_keys function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22013 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22013: [SPARK-23939][SQL] Add transform_keys function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22013 **[Test build #94775 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94775/testReport)** for PR 22013 at commit [`5db526b`](https://github.com/apache/spark/commit/5db526be7bad0fa38dc9743c919014b475cf8aeb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22104 > we can implement a dummy data source v1/v2 at scala side There's an example https://github.com/apache/spark/pull/21007 that implement something in Scala and use it in Python side test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22109 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21698 > ... assume that computation is idempotent - we do not support non determinism in computation Ah this is a reasonable restriction, we should document it in the RDD classdoc. How about the source (root RDD or shuffle)? The output of reduce task is non-deterministic because Spark fetches multiple shuffle blocks at the same time and it's random which shuffle blocks can finish fetching first. External sorter has the same problem: the output order can change if spilling happens. Generally I think there are 3 directions: 1. assume computing functions are idempotent, and also make Spark internal operations idempotent(reducer, external sorter, maybe more). I think this is hard to do, but should be the clearest semantic. 2. assume computing functions are idempotent and are insensitive to the input data order. Then Spark internal operations can have different output orders. An example is adding sort before round-robin, which makes this computing functions insensitive to the input data order. But I don't think it's reasonable to apply this restriction to all computing functions. 3. assume computing functions are random. This is not friendly to the scheduler, as it needs to be able to revert a finished task. We need to think about if it's possible to revert a result task. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21306 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94773/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21306 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21306 **[Test build #94773 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94773/testReport)** for PR 21306 at commit [`21acda2`](https://github.com/apache/spark/commit/21acda237744d4299e5bb449dce1ec8a1735f6de). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class CaseInsensitiveStringMap implements Map ` * `public class Catalogs ` * ` final class AddColumn implements TableChange ` * ` final class RenameColumn implements TableChange ` * ` final class UpdateColumn implements TableChange ` * ` final class DeleteColumn implements TableChange ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22109 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem of Eve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22109 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22109: [SPARK-25120][CORE][HistoryServer]Fix the problem...
GitHub user deshanxiao opened a pull request: https://github.com/apache/spark/pull/22109 [SPARK-25120][CORE][HistoryServer]Fix the problem of EventLogListener may miss driver SparkListenerBloc⦠## What changes were proposed in this pull request? Sometimes in spark history tab "Executors" , it couldn't find driver information because the information of this page created by EventLog replaying the event of SparkListenerBlockManagerAdded. In SparkContext, driver registers blockmanager before adding the EventLogEventListener to LiveListenerBus. In this case,the EventLogEventListener may miss the driver's SparkListenerBlockManagerAdded event and the history ui won't show the info of driver in "Executors" . ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/deshanxiao/spark fix-jira25120 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22109.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22109 commit 26ca9c2c08c62961183e6461183c2963b6a00474 Author: xiaodeshan Date: 2018-08-15T02:49:25Z fix the problem of EventLogListener may miss driver SparkListenerBlockManagerAdded event --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17461: [SPARK-20082][ml] LDA incremental model learning
Github user sprintcheng commented on the issue: https://github.com/apache/spark/pull/17461 May I know when this change being included into official release, I download spark 2.3.1 and still do NOT find this method(lda.setInitialModel) has been added. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21537 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2201/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2200/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21537 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21561 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21537 **[Test build #94782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94782/testReport)** for PR 21537 at commit [`508e091`](https://github.com/apache/spark/commit/508e091f53084deefc35001ce8d89455ca549e53). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21561 **[Test build #94781 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94781/testReport)** for PR 21561 at commit [`ecab85c`](https://github.com/apache/spark/commit/ecab85c921fbc81865b800be25d533fea8e75fd5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21537 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22024: [SPARK-25034][CORE] Remove allocations in onBlockFetchSu...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22024 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21537 Thanks @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/G...
Github user zhengruifeng commented on a diff in the pull request: https://github.com/apache/spark/pull/21561#discussion_r210158840 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -299,7 +299,7 @@ class KMeans private ( val bcCenters = sc.broadcast(centers) // Find the new centers - val newCenters = data.mapPartitions { points => + val collected = data.mapPartitions { points => --- End diff -- I am neutral on this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21860: [SPARK-24901][SQL]Merge the codegen of RegularHas...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21860#discussion_r210158345 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala --- @@ -232,6 +232,25 @@ class WholeStageCodegenSuite extends QueryTest with SharedSQLContext { } } + test("SPARK-24901 check merge FastHashMap and RegularHashMap generate code max size") { --- End diff -- This test looks weird, it's tricky to check the generated code size as it can change later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21860: [SPARK-24901][SQL]Merge the codegen of RegularHas...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21860#discussion_r210157862 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -853,33 +853,42 @@ case class HashAggregateExec( val updateRowInHashMap: String = { if (isFastHashMapEnabled) { -ctx.INPUT_ROW = fastRowBuffer -val boundUpdateExpr = updateExpr.map(BindReferences.bindReference(_, inputAttr)) -val subExprs = ctx.subexpressionEliminationForWholeStageCodegen(boundUpdateExpr) -val effectiveCodes = subExprs.codes.mkString("\n") -val fastRowEvals = ctx.withSubExprEliminationExprs(subExprs.states) { - boundUpdateExpr.map(_.genCode(ctx)) -} -val updateFastRow = fastRowEvals.zipWithIndex.map { case (ev, i) => - val dt = updateExpr(i).dataType - CodeGenerator.updateColumn( -fastRowBuffer, dt, i, ev, updateExpr(i).nullable, isVectorizedHashMapEnabled) -} +if (isVectorizedHashMapEnabled) { + ctx.INPUT_ROW = fastRowBuffer + val boundUpdateExpr = updateExpr.map(BindReferences.bindReference(_, inputAttr)) + val subExprs = ctx.subexpressionEliminationForWholeStageCodegen(boundUpdateExpr) + val effectiveCodes = subExprs.codes.mkString("\n") + val fastRowEvals = ctx.withSubExprEliminationExprs(subExprs.states) { +boundUpdateExpr.map(_.genCode(ctx)) + } + val updateFastRow = fastRowEvals.zipWithIndex.map { case (ev, i) => +val dt = updateExpr(i).dataType +CodeGenerator.updateColumn( + fastRowBuffer, dt, i, ev, updateExpr(i).nullable, isVectorizedHashMapEnabled) + } -// If fast hash map is on, we first generate code to update row in fast hash map, if the -// previous loop up hit fast hash map. Otherwise, update row in regular hash map. -s""" - |if ($fastRowBuffer != null) { - | // common sub-expressions - | $effectiveCodes - | // evaluate aggregate function - | ${evaluateVariables(fastRowEvals)} - | // update fast row - | ${updateFastRow.mkString("\n").trim} - |} else { - | $updateRowInRegularHashMap - |} - """.stripMargin + // If fast hash map is on, we first generate code to update row in fast hash map, if the + // previous loop up hit fast hash map. Otherwise, update row in regular hash map. + s""" + |if ($fastRowBuffer != null) { + | // common sub-expressions + | $effectiveCodes + | // evaluate aggregate function + | ${evaluateVariables(fastRowEvals)} + | // update fast row + | ${updateFastRow.mkString("\n").trim} + |} else { + | $updateRowInRegularHashMap + |} + """.stripMargin +} else { + s""" + |if ($fastRowBuffer != null) { + | $unsafeRowBuffer = $fastRowBuffer; + |} + |$updateRowInRegularHashMap --- End diff -- can you add some comment to explain it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22107 Seems fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22104: [SPARK-24721][SQL] Exclude Python UDFs filters in FileSo...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22104 @icexelloss we can implement a dummy data source v1/v2 at scala side and scan them in PySpark test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22107#discussion_r210157418 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -2482,6 +2482,32 @@ test_that("union(), unionByName(), rbind(), except(), and intersect() on a DataF unlink(jsonPath2) }) +test_that("intersectAll() and exceptAll()", { + df1 <- createDataFrame( +list(list("a", 1), + list("a", 1), + list("a", 1), + list("a", 1), + list("b", 3), + list("c", 4)), + schema = c("a", "b")) + df2 <- createDataFrame( +list(list("a", 1), list("a", 1), list("b", 3)), +schema = c("a", "b")) + intersect_all_expected <- data.frame("a" = c("a", "a", "b"), "b" = c(1, 1, 3), + stringsAsFactors = FALSE) + except_all_expected <- data.frame("a" = c("a", "a", "c"), "b" = c(1, 1, 4), +stringsAsFactors = FALSE) + intersect_all_df <- arrange(intersectAll(df1, df2), df1$a) --- End diff -- Strictly, the naming rule is `intersectAllDf` or `intersect.all.df` (see https://github.com/apache/spark/pull/17590#issuecomment-293732796) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22107#discussion_r210157193 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -2482,6 +2482,32 @@ test_that("union(), unionByName(), rbind(), except(), and intersect() on a DataF unlink(jsonPath2) }) +test_that("intersectAll() and exceptAll()", { + df1 <- createDataFrame( +list(list("a", 1), + list("a", 1), + list("a", 1), + list("a", 1), + list("b", 3), + list("c", 4)), + schema = c("a", "b")) + df2 <- createDataFrame( +list(list("a", 1), list("a", 1), list("b", 3)), +schema = c("a", "b")) --- End diff -- nit: ```r df2 <- createDataFrame(list(list("a", 1), list("a", 1), list("b", 3)), schema = c("a", "b")) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2199/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17086 **[Test build #94780 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94780/testReport)** for PR 17086 at commit [`9e59fd5`](https://github.com/apache/spark/commit/9e59fd592e9cbe43e9fc3d5c317cd3c4e2d6ac43). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22107#discussion_r210156892 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -2482,6 +2482,32 @@ test_that("union(), unionByName(), rbind(), except(), and intersect() on a DataF unlink(jsonPath2) }) +test_that("intersectAll() and exceptAll()", { + df1 <- createDataFrame( +list(list("a", 1), + list("a", 1), + list("a", 1), + list("a", 1), + list("b", 3), + list("c", 4)), --- End diff -- nit: ```r list(list("a", 1), list("a", 1), list("a", 1), list("a", 1), list("b", 3), list("c", 4)), ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17086 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17086 @WeichenXu123, is it good to go? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use weight...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17086 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22098: [SPARK-24886][INFRA] Fix the testing script to increase ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22098 Ah, @shaneknapp, probably one possibility about the motivation is that we don't touch build configuration often and control the build time within the codebase so that committers don't have to bother you every time it needs. For instance, although I know how to control the build time too, I don't touch them. Shall we just keep the current way and discuss if we happen to increase it again? I think 400m looks very enough as a limit, and we will put some efforts to reduce or keep the current limit I believe. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21931: [SPARK-24978][SQL]Add spark.sql.fast.hash.aggregate.row....
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/21931 @kiszk , @cloud-fan I was on vacation some time ago, I'm sorry to delayed reply. I have update it, Can you help to review it again if your have some times. thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22009 **[Test build #94779 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94779/testReport)** for PR 22009 at commit [`f4f85a8`](https://github.com/apache/spark/commit/f4f85a833ef319a6860134e12655574aca081ed6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22009 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2198/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22009 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22009 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org