BryanCutler commented on issue #24070: [SPARK-23961][PYTHON] Fix error when 
toLocalIterator goes out of scope
URL: https://github.com/apache/spark/pull/24070#issuecomment-472133193
 
 
   I just want to highlight that the error that this fixes only kill the 
serving thread and Spark can continue normal operation. Although the error is 
pretty ugly and would lead users to think that something went terribly wrong. 
Since it's pretty common to not fully consume an iterator, e.g. taking a slice, 
I believe it is worth making this change.
   
   It is also possible that this change would be very beneficial because if the 
iterator is not fully consumed, it could save the triggering of unneeded jobs 
where the behavior before eagerly queued jobs for all partitions. In this 
sense, the change here more closely follows the Scala behavior.
   
   I'm also not entirely sure why I'm seeing a speedup for the RDD 
toLocalIterator. When using 8 partitions instead of 32, I noticed a slowdown. I 
will try to run some more tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to