Re: [PR] [SPARK-49808][SQL] Fix a deadlock in subquery execution due to lazy vals [spark]

via GitHub Tue, 15 Oct 2024 19:13:19 -0700


JoshRosen commented on code in PR #48391:
URL: https://github.com/apache/spark/pull/48391#discussion_r1802217903



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala:
##########
@@ -94,10 +95,11 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]]
    * All Attributes that appear in expressions from this operator.  Note that 
this set does not
    * include attributes that are implicitly referenced by being passed through 
to the output tuple.
    */
-  @transient

Review Comment:
   Sorry for only spotting this now, but:
   
   Previously we would recompute these attribute references if they were 
requested on a QueryPlan that had been serialized and shipped to executors, 
whereas after this PR's change we'll use attributes sent from the driver.
   
   Are there any potentially adverse side effects of this, other than 
serialized size? I vaguely recall that there was some attribute JVM ID 
reference stuff somewhere, but I'm unsure offhand if that's a factor here.
   
   I mention this because the `@transient` was added in 
https://github.com/apache/spark/pull/24866/commits/4483331586090dc5ddc79c646e1c18eb1aa0742c
 and I'm wondering if it was somehow load bearing or whether it's just a perf. 
optimization.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49808][SQL] Fix a deadlock in subquery execution due to lazy vals [spark]

Reply via email to