[PR] [SPARK-49808][SQL] Fix a deadlock in subquery execution due to lazy vals [spark]

via GitHub Fri, 27 Sep 2024 01:22:26 -0700


zhengruifeng opened a new pull request, #48279:
URL: https://github.com/apache/spark/pull/48279


   ### What changes were proposed in this pull request?
   Fix a deadlock in subquery execution due to lazy vals
   
   
   ### Why are the changes needed?
   we observed a deadlock between `QueryPlan.canonicalized` and 
`QueryPlan.references`:
   ```
   24/09/04 04:46:54 ERROR DeadlockDetector: Found 2 new deadlock thread(s):
   "ScalaTest-run-running-SubquerySuite" prio=5 Id=1 BLOCKED on 
org.apache.spark.sql.execution.aggregate.HashAggregateExec@87abc7f owned by 
"subquery-5" Id=112
        at 
app//org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized$lzycompute(QueryPlan.scala:684)
        -  blocked on 
org.apache.spark.sql.execution.aggregate.HashAggregateExec@87abc7f
        at 
app//org.apache.spark.sql.catalyst.plans.QueryPlan.canonicalized(QueryPlan.scala:684)
        at 
app//org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$doCanonicalize$2(QueryPlan.scala:716)
        at 
app//org.apache.spark.sql.catalyst.plans.QueryPlan$$Lambda$4058/0x00007f740f3d0cb0.apply(Unknown
 Source)
        at 
app//org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1314)
        at 
app//org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1313)
        at 
app//org.apache.spark.sql.execution.WholeStageCodegenExec.mapChildren(WholeStageCodegenExec.scala:639)
        at 
app//org.apache.spark.sql.catalyst.plans.QueryPlan.doCanonicalize(QueryPlan.scala:716)
        ...
   
   
   "subquery-5" daemon prio=5 Id=112 BLOCKED on 
org.apache.spark.sql.execution.WholeStageCodegenExec@132a3243 owned by 
"ScalaTest-run-running-SubquerySuite" Id=1
        at 
app//org.apache.spark.sql.catalyst.plans.QueryPlan.references$lzycompute(QueryPlan.scala:101)
        -  blocked on 
org.apache.spark.sql.execution.WholeStageCodegenExec@132a3243
        at 
app//org.apache.spark.sql.catalyst.plans.QueryPlan.references(QueryPlan.scala:101)
        at 
app//org.apache.spark.sql.execution.CodegenSupport.usedInputs(WholeStageCodegenExec.scala:325)
        at 
app//org.apache.spark.sql.execution.CodegenSupport.usedInputs$(WholeStageCodegenExec.scala:325)
        at 
app//org.apache.spark.sql.execution.WholeStageCodegenExec.usedInputs(WholeStageCodegenExec.scala:639)
        at 
app//org.apache.spark.sql.execution.CodegenSupport.consume(WholeStageCodegenExec.scala:187)
        at 
app//org.apache.spark.sql.execution.CodegenSupport.consume$(WholeStageCodegenExec.scala:157)
        at 
app//org.apache.spark.sql.execution.aggregate.HashAggregateExec.consume(HashAggregateExec.scala:53)
   ```
   
   This is due to Scala's lazy val internally calls this.synchronized on the 
instance that contains the val. This creates a potential for deadlocks.
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   
   ### How was this patch tested?
   manually checked with `com.databricks.spark.sql.SubquerySuite`
   
   we encountered this issue multiple times before this fix in `SubquerySuite`, 
and after this fix we didn't hit this issue in multiple runs.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-49808][SQL] Fix a deadlock in subquery execution due to lazy vals [spark]

Reply via email to