zhengruifeng opened a new pull request, #48391: URL: https://github.com/apache/spark/pull/48391
### What changes were proposed in this pull request? 1, Introduce a helper class `Lazy` to replace the lazy vals 2, Fix a deadlock in subquery execution ### Why are the changes needed? we observed a deadlock between `QueryPlan.canonicalized` and `QueryPlan.references`: The main thread `TakeOrderedAndProject.doExecute` is trying to compute `outputOrdering`, it top-down traverse the tree, and requires the lock of `QueryPlan.canonicalized` in the path. In this deadlock, it successfully obtained the lock of `WholeStageCodegenExec` and requires the lock of `HashAggregateExec`; Concurrently, a subquery execution thread is performing code generation and bottom-up traverses the tree via `def consume`, which checks `WholeStageCodegenExec.usedInputs` and refererences a lazy val `QueryPlan.references`. It requires the lock of `QueryPlan.references` in the path. In this deadlock, it successfully obtained the lock of `HashAggregateExec` and requires the lock of `WholeStageCodegenExec`; This is due to Scala's lazy val internally calls this.synchronized on the instance that contains the val. This creates a potential for deadlocks. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manually test ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
