HyukjinKwon commented on code in PR #46096:
URL: https://github.com/apache/spark/pull/46096#discussion_r1569712898


##########
python/docs/source/getting_started/install.rst:
##########
@@ -165,16 +168,109 @@ To install PySpark from source, refer to 
|building_spark|_.
 
 Dependencies
 ------------
-========================== ========================= 
======================================================================================
-Package                    Supported version Note
-========================== ========================= 
======================================================================================
-`py4j`                     >=0.10.9.7                Required
-`pandas`                   >=1.4.4                   Required for pandas API 
on Spark and Spark Connect; Optional for Spark SQL
-`pyarrow`                  >=10.0.0                  Required for pandas API 
on Spark and Spark Connect; Optional for Spark SQL
-`numpy`                    >=1.21                    Required for pandas API 
on Spark and MLLib DataFrame-based API; Optional for Spark SQL
-`grpcio`                   >=1.62.0                  Required for Spark Connect
-`grpcio-status`            >=1.62.0                  Required for Spark Connect
-`googleapis-common-protos` >=1.56.4                  Required for Spark Connect
-========================== ========================= 
======================================================================================
+
+Required dependencies
+~~~~~~~~~~~~~~~~~~~~~
+
+PySpark requires the following dependencies.
+
+========================== ========================= =======================
+Package                    Supported version         Note
+========================== ========================= =======================
+`py4j`                     >=0.10.9.7                Used to interact to JVM
+========================== ========================= =======================
+
+Additional libraries that enhance functionality but are not included in the 
installation packages:
+
+- **memory-profiler**: Used for PySpark UDF memory profiling, 
``spark.profile.show(...)`` and ``spark.sql.pyspark.udf.profiler``.
 
 Note that PySpark requires Java 17 or later with ``JAVA_HOME`` properly set 
and refer to |downloading|_.
+
+
+.. _optional-dependencies:
+
+Optional dependencies
+~~~~~~~~~~~~~~~~~~~~~
+
+PySpark has several optional dependencies that enhance its functionality for 
specific modules.
+These dependencies are only required for certain features and are not 
necessary for the basic functionality of PySpark.
+If these optional dependencies are not installed, PySpark will function 
correctly for basic operations but will raise an ``ImportError``
+when you try to use features that require these dependencies.
+
+Spark Connect
+^^^^^^^^^^^^^
+
+Installable with ``pip install "pyspark[connect]"``.
+
+========================== ================= 
====================================================================
+Package                    Supported version Note
+========================== ================= 
====================================================================
+`pandas`                   >=1.4.4           Required for Spark Connect.
+`pyarrow`                  >=10.0.0          Crucial for data serialization 
and network communication efficiency.

Review Comment:
   ```suggestion
   `pyarrow`                  >=10.0.0          Used for data serialization and 
network communication efficiency.
   ```



##########
python/docs/source/getting_started/install.rst:
##########
@@ -165,16 +168,109 @@ To install PySpark from source, refer to 
|building_spark|_.
 
 Dependencies
 ------------
-========================== ========================= 
======================================================================================
-Package                    Supported version Note
-========================== ========================= 
======================================================================================
-`py4j`                     >=0.10.9.7                Required
-`pandas`                   >=1.4.4                   Required for pandas API 
on Spark and Spark Connect; Optional for Spark SQL
-`pyarrow`                  >=10.0.0                  Required for pandas API 
on Spark and Spark Connect; Optional for Spark SQL
-`numpy`                    >=1.21                    Required for pandas API 
on Spark and MLLib DataFrame-based API; Optional for Spark SQL
-`grpcio`                   >=1.62.0                  Required for Spark Connect
-`grpcio-status`            >=1.62.0                  Required for Spark Connect
-`googleapis-common-protos` >=1.56.4                  Required for Spark Connect
-========================== ========================= 
======================================================================================
+
+Required dependencies
+~~~~~~~~~~~~~~~~~~~~~
+
+PySpark requires the following dependencies.
+
+========================== ========================= =======================
+Package                    Supported version         Note
+========================== ========================= =======================
+`py4j`                     >=0.10.9.7                Used to interact to JVM
+========================== ========================= =======================
+
+Additional libraries that enhance functionality but are not included in the 
installation packages:
+
+- **memory-profiler**: Used for PySpark UDF memory profiling, 
``spark.profile.show(...)`` and ``spark.sql.pyspark.udf.profiler``.
 
 Note that PySpark requires Java 17 or later with ``JAVA_HOME`` properly set 
and refer to |downloading|_.
+
+
+.. _optional-dependencies:
+
+Optional dependencies
+~~~~~~~~~~~~~~~~~~~~~
+
+PySpark has several optional dependencies that enhance its functionality for 
specific modules.
+These dependencies are only required for certain features and are not 
necessary for the basic functionality of PySpark.
+If these optional dependencies are not installed, PySpark will function 
correctly for basic operations but will raise an ``ImportError``
+when you try to use features that require these dependencies.
+
+Spark Connect
+^^^^^^^^^^^^^
+
+Installable with ``pip install "pyspark[connect]"``.
+
+========================== ================= 
====================================================================
+Package                    Supported version Note
+========================== ================= 
====================================================================
+`pandas`                   >=1.4.4           Required for Spark Connect.
+`pyarrow`                  >=10.0.0          Crucial for data serialization 
and network communication efficiency.
+`grpcio`                   >=1.62.0          Necessary for implementing RPC 
functionalities in Spark Connect.

Review Comment:
   ```suggestion
   `grpcio`                   >=1.62.0          Used for implementing RPC 
functionalities in Spark Connect.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to