Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

via GitHub Wed, 17 Apr 2024 02:05:09 -0700


HyukjinKwon commented on code in PR #46096:
URL: https://github.com/apache/spark/pull/46096#discussion_r1568483558



##########
python/docs/source/getting_started/install.rst:
##########
@@ -165,16 +168,117 @@ To install PySpark from source, refer to 
|building_spark|_.
 
 Dependencies
 ------------
-========================== ========================= 
======================================================================================
-Package                    Supported version Note
-========================== ========================= 
======================================================================================
-`py4j`                     >=0.10.9.7                Required
-`pandas`                   >=1.4.4                   Required for pandas API 
on Spark and Spark Connect; Optional for Spark SQL
-`pyarrow`                  >=10.0.0                  Required for pandas API 
on Spark and Spark Connect; Optional for Spark SQL
-`numpy`                    >=1.21                    Required for pandas API 
on Spark and MLLib DataFrame-based API; Optional for Spark SQL
-`grpcio`                   >=1.62.0                  Required for Spark Connect
-`grpcio-status`            >=1.62.0                  Required for Spark Connect
-`googleapis-common-protos` >=1.56.4                  Required for Spark Connect
-========================== ========================= 
======================================================================================
+
+Required dependencies
+~~~~~~~~~~~~~~~~~~~~~
+
+PySpark requires the following dependencies.
+
+========================== ========================= 
============================================
+Package                    Supported version         Note
+========================== ========================= 
============================================
+`py4j`                     >=0.10.9.7                Essential for Python to 
interface with the
+                                                     Java objects in Spark; 
ensures seamless
+                                                     interaction between 
Python and JVM.
+========================== ========================= 
============================================
+
+Additional libraries that enhance functionality but are not included in the 
installation packages:
+
+- **memory-profiler**: Useful for diagnosing and analyzing memory usage in 
PySpark applications.

Review Comment:
   ```suggestion
   - **memory-profiler**: Used for PySpark UDF memory profiling, 
``spark.profile.show(...)`` and ``spark.sql.pyspark.udf.profiler``.
   ```
   
   cc @xinrong-meng 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47864][PYTHON][DOCS] Enhance "Installation" page to cover all installable options [spark]

Reply via email to