nvander1 commented on a change in pull request #24761: [SPARK-27905] [SQL] Add
higher order function 'forall'
URL: https://github.com/apache/spark/pull/24761#discussion_r293119295
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
##########
@@ -379,6 +379,31 @@ case class ArrayFilter(
override def prettyName: String = "filter"
}
+trait ArrayExistsForAllBase
+extends ArrayBasedSimpleHigherOrderFunction with CodegenFallback {
+ def check(cond: Boolean): Boolean
+
+ override def dataType: DataType = BooleanType
+ override def functionType: AbstractDataType = BooleanType
+
+ @transient lazy val LambdaFunction(_, Seq(elementVar: NamedLambdaVariable),
_) = function
+
+ override def nullSafeEval(inputRow: InternalRow, argumentValue: Any): Any = {
+ val arr = argumentValue.asInstanceOf[ArrayData]
+ val f = functionForEval
+ var continue = true
+ var i = 0
+ while (i < arr.numElements && continue) {
+ elementVar.value.set(arr.get(i, elementVar.dataType))
+ if (check(f.eval(inputRow).asInstanceOf[Boolean])) {
+ continue = !continue
+ }
+ i += 1
+ }
+ !check(continue)
+ }
Review comment:
@yeikel perhaps the following implementation would be more clear?
```scala
override def nullSafeEval(inputRow: InternalRow, argumentValue: Any): Any
= {
val arr = argumentValue.asInstanceOf[ArrayData]
val f = functionForEval
var res = emptyRes
var i = 0
while (!isConfirmed(res) && i < arr.numElements) {
elementVar.value.set(arr.get(i, elementVar.dataType))
res = f.eval(inputRow).asInstanceOf[Boolean]
i += 1
}
res
}
```
Where `isConfirmed` represents whether we can break out early from our while
loop:
For ArrayExists, we can break out early as soon as we find an element that
matches the predicate.
For ArrayForAll, we can break out early as soon as we find an element that
does NOT match the predicate.
So for `ArrayExists`, we define the `emptyRes` to be false since there are
no elements in the array to satisfy the predicate. And we define the
isConfirmed to be just the result of the predicate on the most recent element.
For `ArrayForAll`, we define the `emptyRes` to be true since the predicate
holds for every element of an empty array. And we define the isConfirmed to be
the negation of the result of the predicate on the most recent element.
This is similar to the approach employed by the scala stdlib:
https://github.com/scala/scala/blob/v2.13.0/src/library/scala/collection/IterableOnce.scala#L587-L606
Although they do not abstract out the operation over forall and exists. I'm
all for keeping the code DRY like @rxin 's suggestion prompted, but if we can't
find a way to do so that is easy to understand, maybe we should just have two
implementations that are similar.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]