nvander1 commented on a change in pull request #24761: [SPARK-27905] [SQL] Add 
higher order function 'forall'
URL: https://github.com/apache/spark/pull/24761#discussion_r293119295
 
 

 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ##########
 @@ -379,6 +379,31 @@ case class ArrayFilter(
   override def prettyName: String = "filter"
 }
 
+trait ArrayExistsForAllBase
+extends ArrayBasedSimpleHigherOrderFunction with CodegenFallback {
+  def check(cond: Boolean): Boolean
+
+  override def dataType: DataType = BooleanType
+  override def functionType: AbstractDataType = BooleanType
+
+  @transient lazy val LambdaFunction(_, Seq(elementVar: NamedLambdaVariable), 
_) = function
+
+  override def nullSafeEval(inputRow: InternalRow, argumentValue: Any): Any = {
+    val arr = argumentValue.asInstanceOf[ArrayData]
+    val f = functionForEval
+    var continue = true
+    var i = 0
+    while (i < arr.numElements && continue) {
+      elementVar.value.set(arr.get(i, elementVar.dataType))
+      if (check(f.eval(inputRow).asInstanceOf[Boolean])) {
+        continue = !continue
+      }
+      i += 1
+    }
+    !check(continue)
+  }
 
 Review comment:
   @yeikel perhaps the following implementation would be more clear?
   ```scala
     override def nullSafeEval(inputRow: InternalRow, argumentValue: Any): Any 
= {
       val arr = argumentValue.asInstanceOf[ArrayData]
       val f = functionForEval
       var res = emptyRes
       var i = 0
       while (!isConfirmed(res) && i < arr.numElements) {
         elementVar.value.set(arr.get(i, elementVar.dataType))
         res = f.eval(inputRow).asInstanceOf[Boolean]
         i += 1
       }
       res
     }
   ```
   
   Where `isConfirmed` represents whether we can break out early from our while 
loop:
     For ArrayExists, we can break out early as soon as we find an element that 
matches the predicate.
     For ArrayForAll, we can break out early as soon as we find an element that 
does NOT match the predicate.
   
   So for `ArrayExists`, we define the `emptyRes` to be false since there are 
no elements in the array to satisfy the predicate. And we define the 
isConfirmed to be just the result of the predicate on the most recent element.
   
   For `ArrayForAll`, we define the `emptyRes` to be true since the predicate 
holds for every element of an empty array. And we define the isConfirmed to be 
the negation of the result of the predicate on the most recent element. 
   
   This is similar to the approach employed by the scala stdlib: 
https://github.com/scala/scala/blob/v2.13.0/src/library/scala/collection/IterableOnce.scala#L587-L606
   
   Although they do not abstract out the operation over forall and exists. I'm 
all for keeping the code DRY like @rxin 's suggestion prompted, but if we can't 
find a way to do so that is easy to understand, maybe we should just have two 
implementations that are similar.
   
   Here is a branch that I can merge into this one if needed with the changes I 
described above:
   
https://github.com/nvander1/spark/commit/aa5c94f5fb5ce9d677a65af7184c35752d2ca491

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to