wangyum opened a new pull request, #48907: URL: https://github.com/apache/spark/pull/48907
### What changes were proposed in this pull request? The root cause of this issue is the planner turns `Limit` + `Sort` into `TakeOrderedAndProjectExec` which adds an additional `Project` that does not exist in the logical plan. We shouldn't use this additional `Project` to optimize out other `Project`s, otherwise when AQE turns physical plan back to logical plan, we lose the `Project` and may mess up the output column order. This PR makes it does not remove redundant projects if AEQ is enabled and projectList is the same as child output in `TakeOrderedAndProjectExec`. ### Why are the changes needed? Fix potential data issue and avoid Spark Driver crash: ``` # more hs_err_pid193136.log # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f9d14841bc0, pid=193136, tid=223205 # # JRE version: OpenJDK Runtime Environment Zulu17.36+18-SA (17.0.4.1+1) (build 17.0.4.1+1-LTS) # Java VM: OpenJDK 64-Bit Server VM Zulu17.36+18-SA (17.0.4.1+1-LTS, mixed mode, sharing, tiered, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # v ~StubRoutines::jint_disjoint_arraycopy_avx3 # # Core dump will be written. Default location: /apache/spark-release/3.5.0-20241105/spark/core.193136 ... ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
