rednaxelafx commented on a change in pull request #25666: [SPARK-28962][SQL] 
Provide index argument to filter lambda functions
URL: https://github.com/apache/spark/pull/25666#discussion_r330699800
 
 

 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ##########
 @@ -369,6 +383,9 @@ case class ArrayFilter(
     var i = 0
     while (i < arr.numElements) {
       elementVar.value.set(arr.get(i, elementVar.dataType))
+      if (indexVar.isDefined) {
 
 Review comment:
   +1 for this is ready to go for now and we can address the optimization 
separately.
   
   Side-comment on the version that @maropu gave:
   The version that @henrydavidge gave would technically have less overhead:
   ```
   // lambda invocation overhead outside of loop
   for each element in array
     do specialized filter action
   ```
   whereas the version that @maropu gave would be:
   ```
   // shared loop between the two versions
   for each element in array
     // lambda invocation overhead per element
     invoke mayFillIndex lambda
   ```
   
   With @maropu 's version, let's assume that we're running on the HotSpot JVM 
and both the with-index and without-index paths have been used, then the best 
HotSpot JIT could have done is a profile-guided bimorphic devirtualization on 
that lambda call site, which will look like the following after 
devirtualization+inlining:
   ```
   local_mayFillIndex = this.mayFillIndex
   for each element in array
     klazz = local_mayFillIndex.klass
     if (klazz == lambda_klass_1) {
       // no-op
     } else if (klazz == lambda_klass_2) {
       idxVar.value.set(i)
     }
   }
   ```
   The point is that this JIT-optimized version is actually a degenerated 
version of Henry's hand-written inline per-row check version, so I wouldn't 
want to go down this route.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to