xuechendi commented on a change in pull request #32717:
URL: https://github.com/apache/spark/pull/32717#discussion_r643202392



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##########
@@ -225,10 +231,31 @@ case class CachedRDDBuilder(
     _cachedColumnBuffers
   }
 
+  def manualClose[T <: CachedBatch](entry: T): T = {
+    val entryManualCloseTasks = Future {
+      entry match {
+        case o: AutoCloseable =>
+          try {
+            o.close

Review comment:
       @srowen , the reason I also want to do manual close in InMemoryRelation 
is because since 3.1.1, this class supports user-defined Serializer through 
"spark.sql.cache.serializer" , and by this new config, user is able to create 
user-defined CachedBatch and store it inside InMemoryRelation. In our case, we 
want to define an Apache ArrowCachedBatch to store ColumnarBatch data inside 
memory or serialize to disk. 
   That is why I am also thinking to add a manual close function here , and use 
AutoCloseable to make it more general.
   
   The reason of adding future here, is because from my local test, releasing 
couple Gigabytes (ex:  10G) data synchronizely will still hold df.unpersist() 
for a while.
   I also added an UT to explain my purpose with user-define 
CachedBatchSerializer and CachedBatch.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to