[jira] [Commented] (SPARK-26851) CachedRDDBuilder only partially implements double-checked locking

Bruce Robbins (JIRA) Fri, 08 Feb 2019 10:57:44 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-26851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763818#comment-16763818
 ]


Bruce Robbins commented on SPARK-26851:
---------------------------------------

[~maropu] [~cloud_fan]

I will let this Jira marinate for a little while, since I might have missed 
something (misunderstanding of the code, improvements in the JMM since initial 
introduction). I will make a PR if needed, but I don't want to use up 4 hours 
of Jenkins resources for a nonsense PR.

> CachedRDDBuilder only partially implements double-checked locking
> -----------------------------------------------------------------
>
>                 Key: SPARK-26851
>                 URL: https://issues.apache.org/jira/browse/SPARK-26851
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0, 3.0.0
>            Reporter: Bruce Robbins
>            Priority: Minor
>
> In CachedRDDBuilder, {{cachedColumnBuffers}} uses double-checked locking to 
> lazily initialize {{_cachedColumnBuffers}}. Also, clearCache uses 
> double-checked locking to likely avoid synchronization when 
> {{_cachedColumnBuffers}} is still null.
> However, the resource (in this case, {{_cachedColumnBuffers}}) is not 
> declared as volatile, which could cause some visibility problems in both 
> methods. {{cachedColumnBuffers}} may see a null reference when there is 
> actually a constructed RDD, and {{clearCache}} may pick up a reference to 
> partially constructed RDD.
> From Java Concurrency in Practice by Brian Goetz et al:
> {quote}Subsequent changes in the JMM (Java 5.0 and later) have enabled DCL to 
> work if resource is made volatile, and the performance impact of this is 
> small since volatile reads are usually only slightly more expensive than 
> nonvolatile reads.
> {quote}
> There are comments in other documentation that volatile is not needed if the 
> resource in question (again, in this case {{_cachedColumnBuffers}}) is 
> immutable. While an RDD is immutable from a Spark user's point of view, it 
> may not be from a JVM's point of view, since not all internal fields are 
> final.
> I've marked this as minor since the race conditions are highly unlikely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26851) CachedRDDBuilder only partially implements double-checked locking

Reply via email to