xuechendi commented on a change in pull request #32717:
URL: https://github.com/apache/spark/pull/32717#discussion_r643202392
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##########
@@ -225,10 +231,31 @@ case class CachedRDDBuilder(
_cachedColumnBuffers
}
+ def manualClose[T <: CachedBatch](entry: T): T = {
+ val entryManualCloseTasks = Future {
+ entry match {
+ case o: AutoCloseable =>
+ try {
+ o.close
Review comment:
@srowen , the reason I also want to do manual close in InMemoryRelation
is because since 3.1.1, this class supports user-defined Serializer through
"spark.sql.cache.serializer" , and by this new config, user is able to create
user-defined CachedBatch and store it inside InMemoryRelation. In our case, we
want to define an Apache ArrowCachedBatch to store ColumnarBatch data inside
memory or serialize to disk.
That is why I am also thinking to add a manual close function here , and use
AutoCloseable to make it more general.
The reason of adding future here, is because from my local test, releasing
couple Gigabytes (ex: 10G) data synchronizely will still hold df.unpersist()
for a while.
I also added an UT to explain my purpose with user-define
CachedBatchSerializer and CachedBatch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]