DaveDeCaprio commented on a change in pull request #24028: [SPARK-26917][SQL] 
Further reduce locks in CacheManager
URL: https://github.com/apache/spark/pull/24028#discussion_r268482158
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
 ##########
 @@ -194,46 +167,36 @@ class CacheManager extends Logging {
   private def recacheByCondition(
       spark: SparkSession,
       condition: CachedData => Boolean): Unit = {
-    val needToRecache = scala.collection.mutable.ArrayBuffer.empty[CachedData]
-    readLock {
-      val it = cachedData.iterator()
-      while (it.hasNext) {
-        val cd = it.next()
-        if (condition(cd)) {
-          needToRecache += cd
-        }
-      }
+    val needToRecache = cachedData.filter(condition)
+    this.synchronized {
+      // Remove the cache entry before creating a new ones.
+      cachedData = cachedData.filterNot(cd => needToRecache.exists(_ eq cd))
 
 Review comment:
   The write itself is atomic, but you need the this.synchronized to ensure 
that the value you are writing is using the most up to date value of 
cachedData.  The value is read as part of cachedData.filterNot, and you need to 
make sure cachedData doesn't change between the time when it is read and when 
it is written.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to