[GitHub] [spark] dilipbiswal commented on a change in pull request #24073: [SPARK-27134][SQL] array_distinct function does not work correctly with columns containing array of array

GitBox Wed, 13 Mar 2019 13:59:56 -0700

dilipbiswal commented on a change in pull request #24073: [SPARK-27134][SQL] 
array_distinct function does not work correctly with columns containing array 
of array
URL: https://github.com/apache/spark/pull/24073#discussion_r265329064


 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ##########
 @@ -3112,29 +3112,30 @@ case class ArrayDistinct(child: Expression)
     (data: Array[AnyRef]) => new 
GenericArrayData(data.distinct.asInstanceOf[Array[Any]])
   } else {
     (data: Array[AnyRef]) => {
-      var foundNullElement = false
-      var pos = 0
+      val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
 
 Review comment:
   @srowen Sorry.. a little confused. So we have a input which is a 
Array[AnyRef]. Now if we declare the temporary buffer a 
ArrayBuffer[Array[AnyRef]], how do we populate its content ?
   Example :
   Input1 : Array[Integer] => Seq(1, 2, , 1)
   In this case our output is : ArrayBuffer[Int] = Array(1, 2)
   Input2 : Array[Array[Integer]] => Seq(Seq(1, 2), Seq(3, 4), Seq(3,4))
   In this case our output is : ArrayBuffer[Array[Int]] => Array(Array(1,2), 
Array(3,4))
   Input3 : Array[Struct] => Seq(struct(...), struct(...))
   
   Thanks for your help. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dilipbiswal commented on a change in pull request #24073: [SPARK-27134][SQL] array_distinct function does not work correctly with columns containing array of array

Reply via email to