robreeves commented on code in PR #40812:
URL: https://github.com/apache/spark/pull/40812#discussion_r1180671671
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala:
##########
@@ -389,7 +389,7 @@ case class InMemoryRelation(
@volatile var statsOfPlanToCache: Statistics = null
- override def innerChildren: Seq[SparkPlan] = Seq(cachedPlan)
+ override val innerChildren: Seq[SparkPlan] = Seq(cachedPlan.clone())
Review Comment:
Since plans are considered immutable and usually cloned, thread safety is
not an issue. `cachedPlan` was not cloned so it was easy to get in a situation
where multiple threads access the same plan objects. This solves that narrow
issue. After this the `cachedPlan` is no more susceptible to concurrency issues
than any other object in the plan.
I still think there is a wider thread safety issue with the mutable state in
`TreeNode.tags`, but I need to investigate it more to propose a holistic
solution. Consider the case where in the application code:
1. User thread 1 calls `d.explain("formatted")` where `d` is a `Dataset`
object. This will set the `TreeNode.tag` keys `QueryPlan.OP_ID_TAG` and
`QueryPlan.CODEGEN_ID_TAG`.
2. At the same time in user thread 2 they call `d.filter($"foo" > 1)`. This
will create a new dataset that reads the `TreeNode.tags` during the clone.
At first I considered making `TreeNode.tags` a thread-safe map, but that
only solves the concurrency issue. At least for this issue, there would still
be a correctness issue with other threads setting different operator ids for
the same node. `TreeNode.tags` is used in so many other places that I'm not
sure if there are other correctness issues.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]