[GitHub] [spark] HyukjinKwon commented on a change in pull request #32835: [SPARK-35591][PYTHON][DOCS] Rename "Koalas" to "pandas API on Spark" in the documents

GitBox Wed, 09 Jun 2021 18:51:36 -0700


HyukjinKwon commented on a change in pull request #32835:
URL: https://github.com/apache/spark/pull/32835#discussion_r648793340




##########
File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst
##########
@@ -5,23 +5,23 @@ Best Practices
 Leverage PySpark APIs
 ---------------------
 
-Koalas uses Spark under the hood; therefore, many features and performance 
optimization are available
-in Koalas as well. Leverage and combine those cutting-edge features with 
Koalas.
+Pandas APIs on Spark uses Spark under the hood; therefore, many features and 
performance optimization are available
+in pandas APIs on Spark as well. Leverage and combine those cutting-edge 
features with pandas APIs on Spark.
 
-Existing Spark context and Spark sessions are used out of the box in Koalas. 
If you already have your own
-configured Spark context or sessions running, Koalas uses them.
+Existing Spark context and Spark sessions are used out of the box in pandas 
APIs on Spark. If you already have your own
+configured Spark context or sessions running, pandas APIs on Spark uses them.
 
 If there is no Spark context or session running in your environment (e.g., 
ordinary Python interpreter),
 such configurations can be set to ``SparkContext`` and/or ``SparkSession``.
-Once Spark context and/or session is created, Koalas can use this context 
and/or session automatically.
+Once Spark context and/or session is created, pandas APIs on Spark can use 
this context and/or session automatically.
 For example, if you want to configure the executor memory in Spark, you can do 
as below:
 
 .. code-block:: python
 
    from pyspark import SparkConf, SparkContext
    conf = SparkConf()
    conf.set('spark.executor.memory', '2g')
-   # Koalas automatically uses this Spark context with the configurations set.
+   # Pandas APIs on Spark automatically uses this Spark context with the 
configurations set.

Review comment:
       ```suggestion
      # Pandas APIs on Spark automatically use this Spark context with the 
configurations set.
   ```

##########
File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst
##########
@@ -33,23 +33,23 @@ it can be set into Spark session as below:
 .. code-block:: python
 
    from pyspark.sql import SparkSession
-   builder = SparkSession.builder.appName("Koalas")
+   builder = SparkSession.builder.appName("pandas-on-spark")
    builder = builder.config("spark.sql.execution.arrow.enabled", "true")
-   # Koalas automatically uses this Spark session with the configurations set.
+   # Pandas APIs on Spark automatically uses this Spark session with the 
configurations set.

Review comment:
       ```suggestion
      # Pandas APIs on Spark automatically use this Spark session with the 
configurations set.
   ```

##########
File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst
##########
@@ -33,23 +33,23 @@ it can be set into Spark session as below:
 .. code-block:: python
 
    from pyspark.sql import SparkSession
-   builder = SparkSession.builder.appName("Koalas")
+   builder = SparkSession.builder.appName("pandas-on-spark")
    builder = builder.config("spark.sql.execution.arrow.enabled", "true")
-   # Koalas automatically uses this Spark session with the configurations set.
+   # Pandas APIs on Spark automatically uses this Spark session with the 
configurations set.
    builder.getOrCreate()
 
    import pyspark.pandas as ks
    ...
 
-All Spark features such as history server, web UI and deployment modes can be 
used as are with Koalas.
+All Spark features such as history server, web UI and deployment modes can be 
used as are with pandas APIs on Spark.
 If you are interested in performance tuning, please see also `Tuning Spark 
<https://spark.apache.org/docs/latest/tuning.html>`_.
 
 
 Check execution plans
 ---------------------
 
 Expensive operations can be predicted by leveraging PySpark API 
`DataFrame.spark.explain()`
-before the actual computation since Koalas is based on lazy execution. For 
example, see below.
+before the actual computation since pandas APIs on Spark is based on lazy 
execution. For example, see below.

Review comment:
       ```suggestion
   before the actual computation since pandas APIs on Spark are based on lazy 
execution. For example, see below.
   ```

##########
File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst
##########
@@ -65,14 +65,14 @@ before the actual computation since Koalas is based on lazy 
execution. For examp
 Whenever you are not sure about such cases, you can check the actual execution 
plans and
 foresee the expensive cases.
 
-Even though Koalas tries its best to optimize and reduce such shuffle 
operations by leveraging Spark
+Even though pandas APIs on Spark tries its best to optimize and reduce such 
shuffle operations by leveraging Spark

Review comment:
       ```suggestion
   Even though pandas APIs on Spark try its best to optimize and reduce such 
shuffle operations by leveraging Spark
   ```

##########
File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst
##########
@@ -157,14 +157,14 @@ as it is less expensive because data can be distributed 
and computed for each gr
 Avoid reserved column names
 ---------------------------
 
-Columns with leading ``__`` and trailing ``__`` are reserved in Koalas. To 
handle internal behaviors for, such as, index,
-Koalas uses some internal columns. Therefore, it is discouraged to use such 
column names and not guaranteed to work.
+Columns with leading ``__`` and trailing ``__`` are reserved in pandas APIs on 
Spark. To handle internal behaviors for, such as, index,
+pandas APIs on Spark uses some internal columns. Therefore, it is discouraged 
to use such column names and not guaranteed to work.
 
 
 Do not use duplicated column names
 ----------------------------------
 
-It is disallowed to use duplicated column names because Spark SQL does not 
allow this in general. Koalas inherits
+It is disallowed to use duplicated column names because Spark SQL does not 
allow this in general. Pandas APIs on Spark inherits

Review comment:
       ```suggestion
   It is disallowed to use duplicated column names because Spark SQL does not 
allow this in general. Pandas APIs on Spark inherit
   ```

##########
File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst
##########
@@ -175,7 +175,7 @@ this behavior. For instance, see below:
    ...
    Reference 'a' is ambiguous, could be: a, a.;
 
-Additionally, it is strongly discouraged to use case sensitive column names. 
Koalas disallows it by default.
+Additionally, it is strongly discouraged to use case sensitive column names. 
Pandas APIs on Spark disallows it by default.

Review comment:
       ```suggestion
   Additionally, it is strongly discouraged to use case sensitive column names. 
Pandas APIs on Spark disallow it by default.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32835: [SPARK-35591][PYTHON][DOCS] Rename "Koalas" to "pandas API on Spark" in the documents

Reply via email to