viirya commented on a change in pull request #24070: [SPARK-23961][PYTHON] Fix
error when toLocalIterator goes out of scope
URL: https://github.com/apache/spark/pull/24070#discussion_r272078879
##########
File path: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
##########
@@ -168,7 +168,42 @@ private[spark] object PythonRDD extends Logging {
}
def toLocalIteratorAndServe[T](rdd: RDD[T]): Array[Any] = {
- serveIterator(rdd.toLocalIterator, s"serve toLocalIterator")
Review comment:
> It is also possible that this change would be very beneficial because if
the iterator is not fully consumed, it could save the triggering of unneeded
jobs where the behavior before eagerly queued jobs for all partitions. In this
sense, the change here more closely follows the Scala behavior.
Once the local iterator is out of scope in Python side, will remaining jobs
still be triggered after at Scala side it can't write into the closed
connection?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]