[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

GitBox Thu, 03 Nov 2022 20:13:36 -0700


zhengruifeng commented on code in PR #38468:
URL: https://github.com/apache/spark/pull/38468#discussion_r1013560372



##########
python/pyspark/sql/connect/client.py:
##########
@@ -182,6 +191,10 @@ def _to_pandas(self, plan: pb2.Plan) -> 
Optional[pandas.DataFrame]:
         req = pb2.Request()
         req.user_context.user_id = self._user_id
         req.plan.CopyFrom(plan)
+        if self.has_arrow:
+            req.preferred_result_type = pb2.Request.ArrowBatch
+        else:
+            req.preferred_result_type = pb2.Request.JSONBatch

Review Comment:
   i notice that pyspark checks whether schema is supported 
https://github.com/apache/spark/blob/master/python/pyspark/sql/pandas/conversion.py#L102
 ,
   i move this check to the server side since it needs schema.
   
   ~~maybe we can keep it as a fallback if json could support more data types 
(not sure about this)~~
   
   yes, we always prefer arrow, but may get json batches from server if schema 
is not supported



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

Reply via email to