Github user viirya commented on the issue: https://github.com/apache/spark/pull/16263 @holdenk Thanks for review. Yeah, I've considering this probability that a partition taking longer time to evaluate might cause problem again. But in the end as in Python side we have a buffered reading, I first guess we can alleviate this case. I have thought to remove the timeout in Python side, or only apply timeout for connection. But I have concern that it may break the purpose of this timeout. As reported on the jira, this eagerly evaluating seems don't completely solve the issue, I would try to take the approach that only sets timeout for connection and see if it solves this. Then we can ask others' opinion about the timeout removing.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org