HyukjinKwon commented on a change in pull request #24834: [SPARK-27992][PYTHON]
Synchronize with Python connection thread to propagate errors
URL: https://github.com/apache/spark/pull/24834#discussion_r296549441
##########
File path: python/pyspark/rdd.py
##########
@@ -140,14 +140,29 @@ def _parse_memory(s):
def _create_local_socket(sock_info):
- (sockfile, sock) = local_connect_and_auth(*sock_info)
+ """
+ Create a local socket that can be used to load deserialized data from the
JVM
+
+ :param sock_info: Tuple containing port number and authentication secret
for a local socket.
+ :return: sockfile file descriptor of the local socket
+ """
+ port = sock_info[0]
+ auth_secret = sock_info[1]
+ sockfile, sock = local_connect_and_auth(port, auth_secret)
# The RDD materialization time is unpredictable, if we set a timeout for
socket reading
# operation, it will very possibly fail. See SPARK-18281.
sock.settimeout(None)
return sockfile
def _load_from_socket(sock_info, serializer):
Review comment:
@BryanCutler, what does `sock_info` expect to be? Seems it can be both
2-tuple and 3-tuple (with server).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]