HyukjinKwon commented on issue #24234: [WIP][SPARK_26022][PYTHON][DOCS] PySpark Comparison with Pandas URL: https://github.com/apache/spark/pull/24234#issuecomment-477447661 Hi, @gatorsmile, @BryanCutler, @ueshin, @rxin, @viirya, @thunterdb I realised the difference is too vast so I kind of tried to narrow down it to: 1. Describing fundamental differences 2. Common DataFrame related API usages 3. Notable differences. Few concerns from me are: - This has to describe both in details to compare, which can be change soon in both Pandas and PySpark. I tried to avoid those details that can be changed soon. - Since it is comparison, it's very easy for me to be biased onto one side. I tired to avoid this too at my best. - It's too vast to compare. High level, both are similar; however, in details, so many stuff are different, completely.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
