WeichenXu123 commented on code in PR #44294: URL: https://github.com/apache/spark/pull/44294#discussion_r1440018036
########## core/src/main/scala/org/apache/spark/SparkEnv.scala: ########## @@ -99,6 +99,10 @@ class SparkEnv ( private[spark] var executorBackend: Option[ExecutorBackend] = None + private[spark] var cachedArrowBatchServerPort: Option[Int] = None + + private[spark] var cachedArrowBatchServerSecret: Option[String] = None Review Comment: I am considering adding API like: ``` # 1. User calls this developer API in pyspark UDF # to start a arrow stream server in local executor. server_port, server_secret = startChunkServer() # 2.read chunk data using the server created above. # user can call this function in pyspark UDF or descendent processes # of pyspark UDF. readChunk(chunk_id, server_port, server_secret) # 3. shut down the server created above shutdownChunkServer(server_port, server_secret) ``` so that we can avoid each executor launches a long-running server. https://docs.google.com/document/d/1qs8lKQ3IwF5QGGAaa6OIiXYhdG4_HJtS66dswtx9kd0/edit#bookmark=id.f6cwxc97g3ig Then we don't need these variables -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
