holdenk commented on issue #24070: [SPARK-23961][PYTHON] Fix error when toLocalIterator goes out of scope URL: https://github.com/apache/spark/pull/24070#issuecomment-483338808 Thanks for the additional context @BryanCutler that really helps. I think supporting memory constrained consumption of a large dataset is core to the goal of toLocalIterator so while there is a performance penality of this change, and we can work to minimize it, I think it's the right thing to do. I agree with you solving it in the Python side seems like the right set of trade-offs. I think it _might_ make sense to kick off the job for the next partition (e.g. lookahead of 1), but we should totally do that in a follow up PR/JIRA as an optiimization. What do you think?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
