HyukjinKwon commented on a change in pull request #32835:
URL: https://github.com/apache/spark/pull/32835#discussion_r648793340
##########
File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst
##########
@@ -5,23 +5,23 @@ Best Practices
Leverage PySpark APIs
---------------------
-Koalas uses Spark under the hood; therefore, many features and performance
optimization are available
-in Koalas as well. Leverage and combine those cutting-edge features with
Koalas.
+Pandas APIs on Spark uses Spark under the hood; therefore, many features and
performance optimization are available
+in pandas APIs on Spark as well. Leverage and combine those cutting-edge
features with pandas APIs on Spark.
-Existing Spark context and Spark sessions are used out of the box in Koalas.
If you already have your own
-configured Spark context or sessions running, Koalas uses them.
+Existing Spark context and Spark sessions are used out of the box in pandas
APIs on Spark. If you already have your own
+configured Spark context or sessions running, pandas APIs on Spark uses them.
If there is no Spark context or session running in your environment (e.g.,
ordinary Python interpreter),
such configurations can be set to ``SparkContext`` and/or ``SparkSession``.
-Once Spark context and/or session is created, Koalas can use this context
and/or session automatically.
+Once Spark context and/or session is created, pandas APIs on Spark can use
this context and/or session automatically.
For example, if you want to configure the executor memory in Spark, you can do
as below:
.. code-block:: python
from pyspark import SparkConf, SparkContext
conf = SparkConf()
conf.set('spark.executor.memory', '2g')
- # Koalas automatically uses this Spark context with the configurations set.
+ # Pandas APIs on Spark automatically uses this Spark context with the
configurations set.
Review comment:
```suggestion
# Pandas APIs on Spark automatically use this Spark context with the
configurations set.
```
##########
File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst
##########
@@ -33,23 +33,23 @@ it can be set into Spark session as below:
.. code-block:: python
from pyspark.sql import SparkSession
- builder = SparkSession.builder.appName("Koalas")
+ builder = SparkSession.builder.appName("pandas-on-spark")
builder = builder.config("spark.sql.execution.arrow.enabled", "true")
- # Koalas automatically uses this Spark session with the configurations set.
+ # Pandas APIs on Spark automatically uses this Spark session with the
configurations set.
Review comment:
```suggestion
# Pandas APIs on Spark automatically use this Spark session with the
configurations set.
```
##########
File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst
##########
@@ -33,23 +33,23 @@ it can be set into Spark session as below:
.. code-block:: python
from pyspark.sql import SparkSession
- builder = SparkSession.builder.appName("Koalas")
+ builder = SparkSession.builder.appName("pandas-on-spark")
builder = builder.config("spark.sql.execution.arrow.enabled", "true")
- # Koalas automatically uses this Spark session with the configurations set.
+ # Pandas APIs on Spark automatically uses this Spark session with the
configurations set.
builder.getOrCreate()
import pyspark.pandas as ks
...
-All Spark features such as history server, web UI and deployment modes can be
used as are with Koalas.
+All Spark features such as history server, web UI and deployment modes can be
used as are with pandas APIs on Spark.
If you are interested in performance tuning, please see also `Tuning Spark
<https://spark.apache.org/docs/latest/tuning.html>`_.
Check execution plans
---------------------
Expensive operations can be predicted by leveraging PySpark API
`DataFrame.spark.explain()`
-before the actual computation since Koalas is based on lazy execution. For
example, see below.
+before the actual computation since pandas APIs on Spark is based on lazy
execution. For example, see below.
Review comment:
```suggestion
before the actual computation since pandas APIs on Spark are based on lazy
execution. For example, see below.
```
##########
File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst
##########
@@ -65,14 +65,14 @@ before the actual computation since Koalas is based on lazy
execution. For examp
Whenever you are not sure about such cases, you can check the actual execution
plans and
foresee the expensive cases.
-Even though Koalas tries its best to optimize and reduce such shuffle
operations by leveraging Spark
+Even though pandas APIs on Spark tries its best to optimize and reduce such
shuffle operations by leveraging Spark
Review comment:
```suggestion
Even though pandas APIs on Spark try its best to optimize and reduce such
shuffle operations by leveraging Spark
```
##########
File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst
##########
@@ -157,14 +157,14 @@ as it is less expensive because data can be distributed
and computed for each gr
Avoid reserved column names
---------------------------
-Columns with leading ``__`` and trailing ``__`` are reserved in Koalas. To
handle internal behaviors for, such as, index,
-Koalas uses some internal columns. Therefore, it is discouraged to use such
column names and not guaranteed to work.
+Columns with leading ``__`` and trailing ``__`` are reserved in pandas APIs on
Spark. To handle internal behaviors for, such as, index,
+pandas APIs on Spark uses some internal columns. Therefore, it is discouraged
to use such column names and not guaranteed to work.
Do not use duplicated column names
----------------------------------
-It is disallowed to use duplicated column names because Spark SQL does not
allow this in general. Koalas inherits
+It is disallowed to use duplicated column names because Spark SQL does not
allow this in general. Pandas APIs on Spark inherits
Review comment:
```suggestion
It is disallowed to use duplicated column names because Spark SQL does not
allow this in general. Pandas APIs on Spark inherit
```
##########
File path: python/docs/source/user_guide/pandas_on_spark/best_practices.rst
##########
@@ -175,7 +175,7 @@ this behavior. For instance, see below:
...
Reference 'a' is ambiguous, could be: a, a.;
-Additionally, it is strongly discouraged to use case sensitive column names.
Koalas disallows it by default.
+Additionally, it is strongly discouraged to use case sensitive column names.
Pandas APIs on Spark disallows it by default.
Review comment:
```suggestion
Additionally, it is strongly discouraged to use case sensitive column names.
Pandas APIs on Spark disallow it by default.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]