cloud-fan commented on a change in pull request #23438:
[SPARK-26525][SHUFFLE]Fast release ShuffleBlockFetcherIterator on completion of
the iteration
URL: https://github.com/apache/spark/pull/23438#discussion_r252127713
##########
File path:
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
##########
@@ -609,6 +617,27 @@ private class BufferReleasingInputStream(
override def reset(): Unit = delegate.reset()
}
+/**
+ * A listener to be called at the completion of the ShuffleBlockFetcherIterator
+ * @param data the ShuffleBlockFetcherIterator to process
+ */
+private class ShuffleFetchCompletionListener(var data:
ShuffleBlockFetcherIterator)
+ extends TaskCompletionListener {
+
+ override def onTaskCompletion(context: TaskContext): Unit = {
+ if (data != null) {
+ data.cleanup()
+ // Null out the referent here to make sure we don't keep a reference to
this
+ // ShuffleBlockFetcherIterator, after we're done reading from it, to let
it be
+ // collected during GC. Otherwise we can metadata on block
locations(blocksByAddress)
Review comment:
> Otherwise we can metadata on block locations(blocksByAddress)
do you mean `we can hold metadata ...`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]