[GitHub] [spark] HyukjinKwon commented on pull request #32036: [SPARK-34890][PYTHON] Port/integrate Koalas main codes into PySpark

GitBox Sat, 03 Apr 2021 19:08:43 -0700


HyukjinKwon commented on pull request #32036:
URL: https://github.com/apache/spark/pull/32036#issuecomment-812957328



   Yeah, actually I have thought a lot about it, and discussed with some other 
people offline.
   
   > since we already have pre-existing Pandas integration, having unrelated 
options referring to pandas could be confusing.
   
   This is a very good point .. I am thinking about using a different name such 
as pandas-on-Spark internally. So, for example, we could have a configuration 
such as `spark.pandas-on-spark.blahblah`.
   
   My current thought is to stick to use `pyspark.pandas` because:
   - I checked few references such as Modin (which is probably the most similar 
case with us) that uses `modin.pandas`.
   - Spark in general does not have its own naming in a component up to my best 
knowledge. As an example, [Shark](https://github.com/amplab/shark) became Spark 
SQL
   - Koalas might not be a good name in a long run either (as far as I know it 
was more like related to branding?) - it might be best to clarify in its 
component name in a way.
   
   However, I am open to change and to other names if many people think that 
`pyspark.koalas` or other alternatives are better.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on pull request #32036: [SPARK-34890][PYTHON] Port/integrate Koalas main codes into PySpark

Reply via email to