felixcheung commented on a change in pull request #24826: 
[SPARK-27870][SQL][PYTHON] Add a runtime buffer size configuration for Pandas 
UDFs
URL: https://github.com/apache/spark/pull/24826#discussion_r291893358
 
 

 ##########
 File path: python/pyspark/daemon.py
 ##########
 @@ -54,8 +54,9 @@ def worker(sock, authenticated):
     # Read the socket using fdopen instead of socket.makefile() because the 
latter
     # seems to be very slow; note that we need to dup() the file descriptor 
because
     # otherwise writes also cause a seek that makes us miss data on the read 
side.
-    infile = os.fdopen(os.dup(sock.fileno()), "rb", 65536)
-    outfile = os.fdopen(os.dup(sock.fileno()), "wb", 65536)
+    buffer_size = int(os.environ.get("SPARK_BUFFER_SIZE", 65536))
 
 Review comment:
   is `int(os.environ.get("SPARK_BUFFER_SIZE", 65536))` going to return 
something sensible if `SPARK_BUFFER_SIZE` is set to something crazy like `-1`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to