Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/4923#discussion_r26011521
--- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
---
@@ -341,7 +342,7 @@ private[spark] object PythonRDD extends Logging {
/**
* Adapter for calling SparkContext#runJob from Python.
*
- * This method will return an iterator of an array that contains all
elements in the RDD
+ * This method will serve an iterator of an array that contains all
elements in the RDD
* (effectively a collect()), but allows you to run on a certain subset
of partitions,
* or to enable local execution.
*/
--- End diff --
Even with the updated description, this method's return type could be
confusing to new readers.
It might help to add an explicit description of the return type, e.g.
```scala
@return the port number of a local socket which serves the data collected
from this job.
```
We should also document the lifecycle of the server socket created by this
method: what happens if a client does not consume the whole iterator? Is it
the caller's responsibility to close the socket at that point?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]