rednaxelafx commented on a change in pull request #25666: [SPARK-28962][SQL] 
Provide index argument to filter lambda functions
URL: https://github.com/apache/spark/pull/25666#discussion_r330699800
 
 

 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala
 ##########
 @@ -369,6 +383,9 @@ case class ArrayFilter(
     var i = 0
     while (i < arr.numElements) {
       elementVar.value.set(arr.get(i, elementVar.dataType))
+      if (indexVar.isDefined) {
 
 Review comment:
   +1 for this is ready to go for now and we can address the optimization 
separately.
   
   Side-comment on the version that @maropu gave:
   The lambda version that @henrydavidge gave (i.e. "batch-wise lambda") would 
technically have less overhead:
   ```
   // lambda invocation overhead outside of loop
   for each element in array
     do specialized filter action
   ```
   whereas the version that @maropu gave (i.e. "element-wise lambda") would be:
   ```
   // shared loop between the two versions
   for each element in array
     // lambda invocation overhead per element
     invoke mayFillIndex lambda
   ```
   
   With @maropu 's version, let's assume that we're running on the HotSpot JVM 
and both the with-index and without-index paths have been used, then the best 
the HotSpot JIT compiler could have done is a profile-guided bimorphic 
devirtualization on that lambda call site, which will look like the following 
after devirtualization+inlining:
   ```
   local_mayFillIndex = this.mayFillIndex
   klazz = local_mayFillIndex.klass
   for each element in array
     // ...
     if (klazz == lambda_klass_1) {
       // no-op
     } else if (klazz == lambda_klass_2) {
       idxVar.value.set(i)
     } else {
       uncommon_trap() // aka deoptimize, or potentially a full virtual call
     }
   }
   ```
   The point is that this JIT-optimized version is actually a degenerated 
version of Henry's hand-written inline per-element check version, so I wouldn't 
want to go down this route.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to