[PR] [SPARK-50258][SQL][3.4] Fix output column order changed issue after AQE optimization [spark]

via GitHub Wed, 20 Nov 2024 06:06:46 -0800


wangyum opened a new pull request, #48907:
URL: https://github.com/apache/spark/pull/48907


   ### What changes were proposed in this pull request?
   
   The root cause of this issue is the planner turns `Limit` + `Sort` into 
`TakeOrderedAndProjectExec` which adds an additional `Project` that does not 
exist in the logical plan. We shouldn't use this additional `Project` to 
optimize out other `Project`s, otherwise when AQE turns physical plan back to 
logical plan, we lose the `Project` and may mess up the output column order.
   
   This PR makes it does not remove redundant projects if AEQ is enabled and 
projectList is the same as child output in `TakeOrderedAndProjectExec`.
   
   ### Why are the changes needed?
   
   Fix potential data issue and avoid Spark Driver crash:
   ```
   # more hs_err_pid193136.log
   #
   # A fatal error has been detected by the Java Runtime Environment:
   #
   #  SIGSEGV (0xb) at pc=0x00007f9d14841bc0, pid=193136, tid=223205
   #
   # JRE version: OpenJDK Runtime Environment Zulu17.36+18-SA (17.0.4.1+1) 
(build 17.0.4.1+1-LTS)
   # Java VM: OpenJDK 64-Bit Server VM Zulu17.36+18-SA (17.0.4.1+1-LTS, mixed 
mode, sharing, tiered, compressed class ptrs, g1 gc, linux-amd64)
   # Problematic frame:
   # v  ~StubRoutines::jint_disjoint_arraycopy_avx3
   #
   # Core dump will be written. Default location: 
/apache/spark-release/3.5.0-20241105/spark/core.193136
   ...
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-50258][SQL][3.4] Fix output column order changed issue after AQE optimization [spark]

Reply via email to