xuechendi commented on a change in pull request #32717:
URL: https://github.com/apache/spark/pull/32717#discussion_r646303504



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##########
@@ -225,10 +228,25 @@ case class CachedRDDBuilder(
     _cachedColumnBuffers
   }
 
+  def manualClose[T <: CachedBatch](entry: T): T = {
+    entry match {
+      case o: AutoCloseable =>
+        try {
+          o.close
+        } catch {
+          case NonFatal(e) =>
+            logWarning("Fail to close a memory entry", e)
+        }
+      case _ =>
+    }
+    entry
+  }
+
   def clearCache(blocking: Boolean = false): Unit = {
     if (_cachedColumnBuffers != null) {
       synchronized {
         if (_cachedColumnBuffers != null) {
+          _cachedColumnBuffers.foreach(manualClose)
           _cachedColumnBuffers.unpersist(blocking)

Review comment:
       > `RDD.unpersist` will go to the new code path you added in 
`MemoryStore`, and release the memory, or do I miss something?
   
   Oh, I didn't realize that, I thought cachedBatch required an extra close if 
it hasn't been serialized/deserialized to persist... I just verified of what 
you said, and cachedBatch did get released without calling 
_cachedColumnBuffers.foreach(manualClose).
   
   Thanks for pointing out! Anyway, in that case, do you think it will be 
helpful if I keep the UT to make sure user-defined cachedBatch is closed? or I 
will close this PR. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to