Github user davies commented on the pull request: https://github.com/apache/spark/pull/2838#issuecomment-59635035 take() is not the only one which will introduce problems, user could call mapPartitions(), and read parts of the items in the infile. Not only re-use the worker, we also want to re-use the socket. try to call next() on infile maybe better than current approach. We still need a special code to tell JVM that the socket can be re-used or not, and any special code could be conflicted by random data. Such as the serialized data is broken, we will read the special code from it. In order to reduce the change of conflict, we could use VERY long special code, such as 64 bits or 128 bits or even more.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org