rednaxelafx commented on a change in pull request #25666: [SPARK-28962][SQL]
Provide index argument to filter lambda functions
URL: https://github.com/apache/spark/pull/25666#discussion_r330699800
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
##########
@@ -369,6 +383,9 @@ case class ArrayFilter(
var i = 0
while (i < arr.numElements) {
elementVar.value.set(arr.get(i, elementVar.dataType))
+ if (indexVar.isDefined) {
Review comment:
+1 for this is ready to go for now and we can address the optimization
separately.
Side-comment on the version that @maropu gave:
The version that @henrydavidge gave would technically have less overhead:
```
// lambda invocation overhead outside of loop
for each element in array
do specialized filter action
```
whereas the version that @maropu gave would be:
```
// shared loop between the two versions
for each element in array
// lambda invocation overhead per element
invoke mayFillIndex lambda
```
With @maropu 's version, let's assume that we're running on the HotSpot JVM
and both the with-index and without-index paths have been used, then the best
HotSpot JIT could have done is a profile-guided bimorphic devirtualization on
that lambda call site, which will look like the following after
devirtualization+inlining:
```
local_mayFillIndex = this.mayFillIndex
klazz = local_mayFillIndex.klass
for each element in array
if (klazz == lambda_klass_1) {
// no-op
} else if (klazz == lambda_klass_2) {
idxVar.value.set(i)
}
}
```
The point is that this JIT-optimized version is actually a degenerated
version of Henry's hand-written inline per-row check version, so I wouldn't
want to go down this route.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]