HyukjinKwon commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush batch 
timely for pandas UDF (for improving pandas UDFs pipeline)
URL: https://github.com/apache/spark/pull/24734#issuecomment-500101639
 
 
   Sorry, I don't still get why we need to flush instead of setting no buffer. 
Did we try https://github.com/apache/spark/pull/24734#issuecomment-498092577?
   
   **Buffer size 0**:
   
   ```bash
   echo "
   buf_size = 0
   
   import socket
   import os
   import time
    
   server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
   server_socket.bind(('localhost', 12345))
   server_socket.listen(0)
   sock, addr = server_socket.accept()
   infile = os.fdopen(os.dup(sock.fileno()), 'rb', buf_size) 
   print('got %s' % repr(infile.read(5 * 8)))
   time.sleep(10)
   " > server.py
   
   echo "
   buf_size = 0
   
   import socket
   import os
   import time
    
   sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
   sock.connect(('localhost', 12345))
   outfile = os.fdopen(os.dup(sock.fileno()), 'wb', buf_size)
   outfile.write(b'Hello ')
   print('sent %s' % repr(b'Hello '))
   time.sleep(10)
   " > client.py
   ```
   
   **Buffer size 65536**
   
   ```bash
   echo "
   buf_size = 65536
   
   import socket
   import os
   import time
    
   server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
   server_socket.bind(('localhost', 54321))
   server_socket.listen(0)
   sock, addr = server_socket.accept()
   infile = os.fdopen(os.dup(sock.fileno()), 'rb', buf_size) 
   print('got %s' % repr(infile.read(5 * 8)))
   time.sleep(10)
   " > server.py
   
   echo "
   buf_size = 65536
   
   import socket
   import os
   import time
    
   sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
   sock.connect(('localhost', 54321))
   outfile = os.fdopen(os.dup(sock.fileno()), 'wb', buf_size)
   outfile.write(b'Hello ')
   print('sent %s' % repr(b'Hello '))
   time.sleep(10)
   " > client.py
   ```
   
   
   **Run this in two separate terminals:**
   
   ```bash
   python server.py
   ```
   
   ```bash
   python client.py
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to