Re: [PR] [SPARK-45814][CONNECT][SQL]Make ArrowConverters.createEmptyArrowBatch call close() to avoid memory leak [spark]

via GitHub Wed, 20 Dec 2023 02:05:15 -0800


HyukjinKwon commented on code in PR #43691:
URL: https://github.com/apache/spark/pull/43691#discussion_r1432506779



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala:
##########
@@ -226,11 +229,19 @@ private[sql] object ArrowConverters extends Logging {
       schema: StructType,
       timeZoneId: String,
       errorOnDuplicatedFieldNames: Boolean): Array[Byte] = {
-    new ArrowBatchWithSchemaIterator(
+    val batches = new ArrowBatchWithSchemaIterator(
         Iterator.empty, schema, 0L, 0L,
         timeZoneId, errorOnDuplicatedFieldNames, TaskContext.get()) {
       override def hasNext: Boolean = true
-    }.next()
+    }
+    Utils.tryWithSafeFinally {
+      batches.next()
+    } {
+      // If taskContext is null, `batches.close()` should be called to avoid 
memory leak.
+      if (TaskContext.get() == null) {
+        batches.close()

Review Comment:
   Can we actually add a small test for this? This memory leaking should be 
easy to test if the `close` is actually being invoked with 
`createEmptyArrowBatch`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-45814][CONNECT][SQL]Make ArrowConverters.createEmptyArrowBatch call close() to avoid memory leak [spark]

Reply via email to