itholic commented on PR #40525:
URL: https://github.com/apache/spark/pull/40525#issuecomment-1501047078

   Thank you for the feedback, @bjornjorgensen !
   
   IMHO, it seems more reasonable to add `grpcio` as a dependency for the 
Pandas API on Spark instead of reverting all this change back (Oh, seems like 
you already open https://github.com/apache/spark/pull/40716 for this? 😄)
   
   The purpose of Spark Connect is to allow users to use existing PySpark 
project without any code changes through a remote client. Therefore, if a user 
is using the `pyspark.pandas` module in their existing code, it should work the 
same way through the remote client as well.
   
   I think we should support all the functionality of PySpark as much as 
possible including pandas API on Spark, since nobody can not sure whether 
existing PySpark users will use the Pandas API on Spark through Spark Connect 
or not at this point, not only the existing pandas users.
   
   Alternatively, we might be able to create completely separate package path 
for the Pandas API on Spark for Spark Connect. This would allow the existing 
Pandas API on Spark to be used without installing `grpcio`, but it would be 
much more overhead than simply changing the policy to add one package as an 
additional installation.
   
   WDYT? also cc @HyukjinKwon @grundprinzip @ueshin @zhengruifeng FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to