[GitHub] [spark] HyukjinKwon commented on pull request #40525: [SPARK-42859][CONNECT][PS] Basic support for pandas API on Spark Connect

via GitHub Sun, 09 Apr 2023 17:58:29 -0700


HyukjinKwon commented on PR #40525:
URL: https://github.com/apache/spark/pull/40525#issuecomment-1501261686


   > Is it so had to add the dependency for grpc when using the pandas API?
   
   It's not super hard. But it's a bit odd to add this alone to pandas API on 
Spark. We should probably think about adding grpcio as a hard dependency for 
whole PySpark project, but definitely not alone for pandas API on Spark.
   
   > What are we achieving with this? GRPC is a stable protocol and not a 
random library. It's available throughout all platforms.
   > What's the benefit of trying this pure approach?
   
   So for the current status, we're trying to add the dependencies that the 
module need so users won't need to install the unnecessary dependency. In 
addition, adding the dependency breaks existing applications when they migrate 
from 3.4 to 3.5. It matters when PySpark is installed without `pip` (which is 
actually the official release channel of Apache Spark).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on pull request #40525: [SPARK-42859][CONNECT][PS] Basic support for pandas API on Spark Connect

Reply via email to