[GitHub] [spark] peter-toth commented on pull request #40473: [SPARK-42851][SQL] Guard EquivalentExpressions.addExpr() with supportedExpression()

2023-03-18 Thread via GitHub


peter-toth commented on PR #40473:
URL: https://github.com/apache/spark/pull/40473#issuecomment-1474880345

   Hm, I think you are right @Kimahriman, `LambdaVariable` and 
`NamedLambdaVariable` are very different and `NamedLambdaVariable` seem to be 
used only in `LambdaFunction`s, so https://github.com/apache/spark/pull/39046 
doesn't make sense and actually it can prevent pulling out higher order 
functions and so cause performance regression... I think that PR should be 
reverted.
   
   But I feel that is orthogonal to the issue that we use 
`EquivalentExpressions` for different purposes in `PhysicalAggregation` (the 
only place where we use `.addExpr()`) and in executors (`.addExprTree()` for 
subexpression elimination).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] peter-toth commented on pull request #40473: [SPARK-42851][SQL] Guard EquivalentExpressions.addExpr() with supportedExpression()

2023-03-18 Thread via GitHub


peter-toth commented on PR #40473:
URL: https://github.com/apache/spark/pull/40473#issuecomment-1474774246

   Thanks @rednaxelafx for the fix and pinging me.
   I think you are right that `EquivalentExpressions.addExpr()` should be 
guarded by `supportedExpression()` if we guard `getExprState()`. But, I'm not 
sure it is right that we don't deduplicate the `max(transform(array(id), x -> 
x))` in your example query.
   Probably the real issue here is that in`PhysicalAggregation` the class 
`EquivalentExpressions` is used for simply deduplicating whole expressions 
while on executors we use it for common subexpression elimination. In the 
former case we don't need the `LambdaVariable ` guard but in the latter one we 
need it. So maybe we should add a argument to `EquivalentExpressions` to 
enable/disable the guards and in `PhysicalAggregation` we should disable it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org