[GitHub] spark pull request #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream ...

BryanCutler Mon, 02 Jul 2018 13:56:39 -0700

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21546#discussion_r199618628
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
---
    @@ -398,6 +398,25 @@ private[spark] object PythonRDD extends Logging {
        *         data collected from this job, and the secret for 
authentication.
        */
       def serveIterator(items: Iterator[_], threadName: String): Array[Any] = {
    +    serveToStream(threadName) { out =>
    +      writeIteratorToStream(items, new DataOutputStream(out))
    +    }
    +  }
    +
    +  /**
    +   * Create a socket server and background thread to execute the block of 
code
    +   * for the given DataOutputStream.
    +   *
    +   * The socket server can only accept one connection, or close if no 
connection
    +   * in 15 seconds.
    +   *
    +   * Once a connection comes in, it will execute the block of code and 
pass in
    +   * the socket output stream.
    +   *
    +   * The thread will terminate after the block of code is executed or any
    +   * exceptions happen.
    +   */
    +  private[spark] def serveToStream(threadName: String)(block: OutputStream 
=> Unit): Array[Any] = {
    --- End diff --
    
    Yeah, I think I started off with `writeFunc`.. I agree sounds a bit better



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21546: [WIP][SPARK-23030][SQL][PYTHON] Use Arrow stream ...

Reply via email to