holdenk commented on PR #46143:
URL: https://github.com/apache/spark/pull/46143#issuecomment-3712080976

   > That's why my initial suggestion was to not do this optimization at all. 
We just keep the `Filter` above the `Project`. By doing so we avoid the 
expensive expression duplication caused by filter pushdown, but all expressions 
in `Project` now need to be evaluated against the full input. I'm not sure how 
serious this issue is, and I was just trying to help simplify the algorithm 
given you are doing this optimization. I'm more than happier if you agree to 
drop this optimization and simplify the code.
   
   So just always leave up complex filters and don't don't attempt to split 
them if needed? I think that's sub-optimal for fairly self evident reasons 
*but* if you still find the current implementation too complex I could move it 
into a follow-on PR so there's less to review here and we *just* fix the perf 
regression introduced in 3.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to