zhengruifeng opened a new pull request, #48279:
URL: https://github.com/apache/spark/pull/48279
### What changes were proposed in this pull request?
Fix a deadlock in subquery execution due to lazy vals
### Why are the changes needed?
we observed a deadlock between `QueryPlan.canonicalized` and
`QueryPlan.references`:
```
24/09/04 04:46:54 ERROR DeadlockDetector: Found 2 new deadlock thread(s):
"ScalaTest-run-running-SubquerySuite" prio=5 Id=1 BLOCKED on
org.apache.spark.sql.execution.aggregate.HashAggregateExec@87abc7f owned by
"subquery-5" Id=112
at
app//org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:684)
- blocked on
org.apache.spark.sql.execution.aggregate.HashAggregateExec@87abc7f
at
app//org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:684)
at
app//org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$2(QueryPlan.scala:716)
at
app//org.apache.spark.sql.catalyst.plans.QueryPlan$$Lambda$4058/0x00007f740f3d0cb0.apply(Unknown
Source)
at
app//org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1314)
at
app//org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1313)
at
app//org.apache.spark.sql.execution.WholeStageCodegenExec.mapChildren(WholeStageCodegenExec.scala:639)
at
app//org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:716)
...
"subquery-5" daemon prio=5 Id=112 BLOCKED on
org.apache.spark.sql.execution.WholeStageCodegenExec@132a3243 owned by
"ScalaTest-run-running-SubquerySuite" Id=1
at
app//org.apache.spark.sql.catalyst.plans.QueryPlan.references$lzycompute(QueryPlan.scala:101)
- blocked on
org.apache.spark.sql.execution.WholeStageCodegenExec@132a3243
at
app//org.apache.spark.sql.catalyst.plans.QueryPlan.references(QueryPlan.scala:101)
at
app//org.apache.spark.sql.execution.CodegenSupport.usedInputs(WholeStageCodegenExec.scala:325)
at
app//org.apache.spark.sql.execution.CodegenSupport.usedInputs$(WholeStageCodegenExec.scala:325)
at
app//org.apache.spark.sql.execution.WholeStageCodegenExec.usedInputs(WholeStageCodegenExec.scala:639)
at
app//org.apache.spark.sql.execution.CodegenSupport.consume(WholeStageCodegenExec.scala:187)
at
app//org.apache.spark.sql.execution.CodegenSupport.consume$(WholeStageCodegenExec.scala:157)
at
app//org.apache.spark.sql.execution.aggregate.HashAggregateExec.consume(HashAggregateExec.scala:53)
```
This is due to Scala's lazy val internally calls this.synchronized on the
instance that contains the val. This creates a potential for deadlocks.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
manually checked with `com.databricks.spark.sql.SubquerySuite`
we encountered this issue multiple times before this fix in `SubquerySuite`,
and after this fix we didn't hit this issue in multiple runs.
### Was this patch authored or co-authored using generative AI tooling?
no
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]