BryanCutler commented on a change in pull request #24834:
[WIP][SPARK-27992][PYTHON] Synchronize with Python connection thread to
propagate errors
URL: https://github.com/apache/spark/pull/24834#discussion_r292222080
##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -2200,10 +2200,13 @@ def _collectAsArrow(self):
.. note:: Experimental.
"""
with SCCallSiteSync(self._sc) as css:
- sock_info = self._jdf.collectAsArrowToPython()
+ port, auth_secret, jserver_obj = self._jdf.collectAsArrowToPython()
# Collect list of un-ordered batches where last element is a list of
correct order indices
- results = list(_load_from_socket(sock_info, ArrowCollectSerializer()))
+ from pyspark.rdd import _create_local_socket
Review comment:
this should be cleaned up. below basically duplicates `_load_from_socket`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]