[GitHub] [spark] BryanCutler commented on a change in pull request #24834: [SPARK-27992][PYTHON] Synchronize with Python connection thread to propagate errors

GitBox Mon, 24 Jun 2019 10:23:44 -0700

BryanCutler commented on a change in pull request #24834: [SPARK-27992][PYTHON] 
Synchronize with Python connection thread to propagate errors
URL: https://github.com/apache/spark/pull/24834#discussion_r296829988


 ##########
 File path: python/pyspark/rdd.py
 ##########
 @@ -140,14 +140,29 @@ def _parse_memory(s):
 
 
 def _create_local_socket(sock_info):
-    (sockfile, sock) = local_connect_and_auth(*sock_info)
+    """
+    Create a local socket that can be used to load deserialized data from the 
JVM
+
+    :param sock_info: Tuple containing port number and authentication secret 
for a local socket.
+    :return: sockfile file descriptor of the local socket
+    """
+    port = sock_info[0]
+    auth_secret = sock_info[1]
+    sockfile, sock = local_connect_and_auth(port, auth_secret)
     # The RDD materialization time is unpredictable, if we set a timeout for 
socket reading
     # operation, it will very possibly fail. See SPARK-18281.
     sock.settimeout(None)
     return sockfile
 
 
 def _load_from_socket(sock_info, serializer):
 
 Review comment:
   Uggh, yeah I'm not too happy with this. Java returns a 3-tuple with (port, 
auth_secret, server) and most places only use the first 2, such as 
`_load_from_socket`.  It gets a little confusing, so I thought it might be 
better to expand the values returned by java for `serveToStream` etc., but it 
ended up with a lot of changes where the third value is ignored like this
   
   ```python
   port, auth_secret, _ = ...
   ```
   and I don't think it really made things clearer.  I'll try to think of 
something better and maybe do a followup.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] BryanCutler commented on a change in pull request #24834: [SPARK-27992][PYTHON] Synchronize with Python connection thread to propagate errors

Reply via email to