holdenk commented on issue #24070: [SPARK-23961][PYTHON] Fix error when 
toLocalIterator goes out of scope
URL: https://github.com/apache/spark/pull/24070#issuecomment-483338808
 
 
   Thanks for the additional context @BryanCutler that really helps.
   
   I think supporting memory constrained consumption of a large dataset is core 
to the goal of toLocalIterator so while there is a performance penality of this 
change, and we can work to minimize it, I think it's the right thing to do.
   
   I agree with you solving it in the Python side seems like the right set of 
trade-offs. I think it _might_ make sense to kick off the job for the next 
partition (e.g. lookahead of 1), but we should totally do that in a follow up 
PR/JIRA as an optiimization. What do you think?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to