[GitHub] spark pull request: [SPARK-6194] [PySpark] fix memory leak in coll...

JoshRosen Sun, 08 Mar 2015 17:50:57 -0700

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4923#discussion_r26011521
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
---
    @@ -341,7 +342,7 @@ private[spark] object PythonRDD extends Logging {
       /**
        * Adapter for calling SparkContext#runJob from Python.
        *
    -   * This method will return an iterator of an array that contains all 
elements in the RDD
    +   * This method will serve an iterator of an array that contains all 
elements in the RDD
        * (effectively a collect()), but allows you to run on a certain subset 
of partitions,
        * or to enable local execution.
        */
    --- End diff --
    
    Even with the updated description, this method's return type could be 
confusing to new readers.
    
    It might help to add an explicit description of the return type, e.g.
    
    ```scala
    @return the port number of a local socket which serves the data collected 
from this job.
    ```
    
    We should also document the lifecycle of the server socket created by this 
method: what happens if a client does not consume the whole iterator?  Is it 
the caller's responsibility to close the socket at that point?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6194] [PySpark] fix memory leak in coll...

Reply via email to