HyukjinKwon commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline) URL: https://github.com/apache/spark/pull/24734#issuecomment-500101639 Sorry, I don't still get why we need to flush instead of setting no buffer. Did we try https://github.com/apache/spark/pull/24734#issuecomment-498092577? **Buffer size 0**: ```bash echo " buf_size = 0 import socket import os import time server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server_socket.bind(('localhost', 12345)) server_socket.listen(0) sock, addr = server_socket.accept() infile = os.fdopen(os.dup(sock.fileno()), 'rb', buf_size) print('got %s' % repr(infile.read(5 * 8))) time.sleep(10) " > server.py echo " buf_size = 0 import socket import os import time sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect(('localhost', 12345)) outfile = os.fdopen(os.dup(sock.fileno()), 'wb', buf_size) outfile.write(b'Hello ') print('sent %s' % repr(b'Hello ')) time.sleep(10) " > client.py ``` **Buffer size 65536** ```bash echo " buf_size = 65536 import socket import os import time server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server_socket.bind(('localhost', 54321)) server_socket.listen(0) sock, addr = server_socket.accept() infile = os.fdopen(os.dup(sock.fileno()), 'rb', buf_size) print('got %s' % repr(infile.read(5 * 8))) time.sleep(10) " > server.py echo " buf_size = 65536 import socket import os import time sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect(('localhost', 54321)) outfile = os.fdopen(os.dup(sock.fileno()), 'wb', buf_size) outfile.write(b'Hello ') print('sent %s' % repr(b'Hello ')) time.sleep(10) " > client.py ``` **Run this in two separate terminals:** ```bash python server.py ``` ```bash python client.py ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
