Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/4923#issuecomment-77786229
This looks really good to me overall. It would be great if you could
update the PR description to reflect the most recent changes (collecting via a
socket instead of a temporary file).
A couple of quick questions:
- Does this have a performance impact (positive or negative)?
- Do we need to write any additional tests? Are there any corner cases
that we should explicitly address in their own tests, such as not consuming the
iterator or not connecting to the socket after creating it?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]