[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...

2018-08-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22044


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...

2018-08-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22044#discussion_r208873100
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3442,17 +3464,15 @@ case class ArrayDistinct(child: Expression)
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-nullSafeCodeGen(ctx, ev, (array) => {
-  val i = ctx.freshName("i")
-  val j = ctx.freshName("j")
-  val sizeOfDistinctArray = ctx.freshName("sizeOfDistinctArray")
-  val getValue1 = CodeGenerator.getValue(array, elementType, i)
-  val getValue2 = CodeGenerator.getValue(array, elementType, j)
-  val foundNullElement = ctx.freshName("foundNullElement")
-  val openHashSet = classOf[OpenHashSet[_]].getName
-  val hs = ctx.freshName("hs")
-  val classTag = s"scala.reflect.ClassTag$$.MODULE$$.Object()"
-  if (elementTypeSupportEquals) {
+if (canUseSpecializedHashSet) {
+  nullSafeCodeGen(ctx, ev, (array) => {
+val i = ctx.freshName("i")
+val sizeOfDistinctArray = ctx.freshName("sizeOfDistinctArray")
+val foundNullElement = ctx.freshName("foundNullElement")
+val openHashSet = classOf[OpenHashSet[_]].getName
+val hs = ctx.freshName("hs")
+val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$hsTypeName()"
+val getValue1 = CodeGenerator.getValue(array, elementType, i)
--- End diff --

nit: `val getValue`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...

2018-08-09 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22044#discussion_r208862917
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3410,6 +3410,28 @@ case class ArrayDistinct(child: Expression)
 case _ => false
   }
 
+  @transient protected lazy val canUseSpecializedHashSet = elementType 
match {
--- End diff --

We can do so. To minimize the changes due to remaining time for cutting, I 
would like to do this in another PR #21912.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...

2018-08-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22044#discussion_r208854615
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3517,56 +3510,24 @@ case class ArrayDistinct(child: Expression)
 """.stripMargin
   }
 
-  private def setNotNullValue(isPrimitive: Boolean,
+  private def setNotNullValue(
--- End diff --

We can inline this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...

2018-08-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22044#discussion_r208849675
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3410,6 +3410,28 @@ case class ArrayDistinct(child: Expression)
 case _ => false
   }
 
+  @transient protected lazy val canUseSpecializedHashSet = elementType 
match {
+case ByteType | ShortType | IntegerType | LongType | FloatType | 
DoubleType => true
+case _ => false
+  }
+
+  @transient protected lazy val (hsPostFix, hsTypeName) = {
+val ptName = CodeGenerator.primitiveTypeName (elementType)
--- End diff --

nit: extra space after `primitiveTypeName`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...

2018-08-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22044#discussion_r208855525
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3410,6 +3410,28 @@ case class ArrayDistinct(child: Expression)
 case _ => false
   }
 
+  @transient protected lazy val canUseSpecializedHashSet = elementType 
match {
--- End diff --

Can we extract those common methods?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...

2018-08-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22044#discussion_r208856510
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3517,56 +3510,24 @@ case class ArrayDistinct(child: Expression)
 """.stripMargin
   }
 
-  private def setNotNullValue(isPrimitive: Boolean,
+  private def setNotNullValue(
   distinctArray: String,
   pos: String,
   getValue1: String,
   primitiveValueTypeName: String): String = {
-if (!isPrimitive) {
-  s"$distinctArray[$pos] = $getValue1";
-} else {
-  s"$distinctArray.set$primitiveValueTypeName($pos, $getValue1)";
-}
+s"$distinctArray.set$primitiveValueTypeName($pos, $getValue1)"
   }
 
   private def setValueForFastEval(
--- End diff --

Maybe we should rename this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...

2018-08-08 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/22044

[SPARK-23912][SQL][Followup] Refactor ArrayDistinct

## What changes were proposed in this pull request?

This PR simplified code generation for `ArrayDistinct`. #21966 enabled code 
generation only if the type can be specialized by the hash set. This PR follows 
this strategy.

Optimization of null handling will be implemented in #21912.

## How was this patch tested?

Existing UTs

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-23912-follow

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22044.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22044


commit 9d0cb100a85dc77569e1a62b83e0461ee2c33ddb
Author: Kazuaki Ishizaki 
Date:   2018-08-08T18:43:51Z

stop code generation when elementTypeSupportEquals is false




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org