[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22044 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22044#discussion_r208873100 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3442,17 +3464,15 @@ case class ArrayDistinct(child: Expression) } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -nullSafeCodeGen(ctx, ev, (array) => { - val i = ctx.freshName("i") - val j = ctx.freshName("j") - val sizeOfDistinctArray = ctx.freshName("sizeOfDistinctArray") - val getValue1 = CodeGenerator.getValue(array, elementType, i) - val getValue2 = CodeGenerator.getValue(array, elementType, j) - val foundNullElement = ctx.freshName("foundNullElement") - val openHashSet = classOf[OpenHashSet[_]].getName - val hs = ctx.freshName("hs") - val classTag = s"scala.reflect.ClassTag$$.MODULE$$.Object()" - if (elementTypeSupportEquals) { +if (canUseSpecializedHashSet) { + nullSafeCodeGen(ctx, ev, (array) => { +val i = ctx.freshName("i") +val sizeOfDistinctArray = ctx.freshName("sizeOfDistinctArray") +val foundNullElement = ctx.freshName("foundNullElement") +val openHashSet = classOf[OpenHashSet[_]].getName +val hs = ctx.freshName("hs") +val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$hsTypeName()" +val getValue1 = CodeGenerator.getValue(array, elementType, i) --- End diff -- nit: `val getValue`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22044#discussion_r208862917 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3410,6 +3410,28 @@ case class ArrayDistinct(child: Expression) case _ => false } + @transient protected lazy val canUseSpecializedHashSet = elementType match { --- End diff -- We can do so. To minimize the changes due to remaining time for cutting, I would like to do this in another PR #21912. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22044#discussion_r208854615 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3517,56 +3510,24 @@ case class ArrayDistinct(child: Expression) """.stripMargin } - private def setNotNullValue(isPrimitive: Boolean, + private def setNotNullValue( --- End diff -- We can inline this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22044#discussion_r208849675 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3410,6 +3410,28 @@ case class ArrayDistinct(child: Expression) case _ => false } + @transient protected lazy val canUseSpecializedHashSet = elementType match { +case ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType => true +case _ => false + } + + @transient protected lazy val (hsPostFix, hsTypeName) = { +val ptName = CodeGenerator.primitiveTypeName (elementType) --- End diff -- nit: extra space after `primitiveTypeName`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22044#discussion_r208855525 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3410,6 +3410,28 @@ case class ArrayDistinct(child: Expression) case _ => false } + @transient protected lazy val canUseSpecializedHashSet = elementType match { --- End diff -- Can we extract those common methods? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22044#discussion_r208856510 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3517,56 +3510,24 @@ case class ArrayDistinct(child: Expression) """.stripMargin } - private def setNotNullValue(isPrimitive: Boolean, + private def setNotNullValue( distinctArray: String, pos: String, getValue1: String, primitiveValueTypeName: String): String = { -if (!isPrimitive) { - s"$distinctArray[$pos] = $getValue1"; -} else { - s"$distinctArray.set$primitiveValueTypeName($pos, $getValue1)"; -} +s"$distinctArray.set$primitiveValueTypeName($pos, $getValue1)" } private def setValueForFastEval( --- End diff -- Maybe we should rename this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/22044 [SPARK-23912][SQL][Followup] Refactor ArrayDistinct ## What changes were proposed in this pull request? This PR simplified code generation for `ArrayDistinct`. #21966 enabled code generation only if the type can be specialized by the hash set. This PR follows this strategy. Optimization of null handling will be implemented in #21912. ## How was this patch tested? Existing UTs You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-23912-follow Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22044.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22044 commit 9d0cb100a85dc77569e1a62b83e0461ee2c33ddb Author: Kazuaki Ishizaki Date: 2018-08-08T18:43:51Z stop code generation when elementTypeSupportEquals is false --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org