HyukjinKwon commented on a change in pull request #34315:
URL: https://github.com/apache/spark/pull/34315#discussion_r733247234



##########
File path: python/docs/source/getting_started/install.rst
##########
@@ -83,46 +83,43 @@ Note that this installation way of PySpark with/without a 
specific Hadoop versio
 Using Conda
 -----------
 
-Conda is an open-source package management and environment management system 
which is a part of
-the `Anaconda <https://docs.continuum.io/anaconda/>`_ distribution. It is both 
cross-platform and
-language agnostic. In practice, Conda can replace both `pip 
<https://pip.pypa.io/en/latest/>`_ and
-`virtualenv <https://virtualenv.pypa.io/en/latest/>`_.
+Conda is an open-source package management and environment management system 
(developed by
+`Anaconda <https://www.anaconda.com/>`_), which is best installed through
+`Miniconda <https://docs.conda.io/en/latest/miniconda.html/>`_ or `Miniforge 
<https://github.com/conda-forge/miniforge/>`_.
+The tool is both cross-platform and language agnostic, and in practice, conda 
can replace both
+`pip <https://pip.pypa.io/en/latest/>`_ and `virtualenv 
<https://virtualenv.pypa.io/en/latest/>`_.
 
-Create new virtual environment from your terminal as shown below:
+Conda uses so-called channels to distribute packages, and together with the 
default channels by
+Anaconda itself, the most important channel is `conda-forge 
<https://conda-forge.org/>`_, which
+is the community-driven packaging effort that is the most extensive & the most 
current (and also
+serves as the upstream for the Anaconda channels in most cases).
 
-.. code-block:: bash
-
-    conda create -n pyspark_env
-
-After the virtual environment is created, it should be visible under the list 
of Conda environments
-which can be seen using the following command:
-
-.. code-block:: bash
-
-    conda env list
-
-Now activate the newly created environment with the following command:
+To create a new conda environment from your terminal and activate it, proceed 
as shown below:
 
 .. code-block:: bash
 
+    conda create -n pyspark_env
     conda activate pyspark_env
 
-You can install pyspark by `Using PyPI <#using-pypi>`_ to install PySpark in 
the newly created
-environment, for example as below. It will install PySpark under the new 
virtual environment
-``pyspark_env`` created above.
+After activating the environment, use the following command to install pyspark,
+a python version of your choice, as well as other packages you want to use in
+the same session as pyspark (you can install in several steps too).
 
 .. code-block:: bash
 
-    pip install pyspark
-
-Alternatively, you can install PySpark from Conda itself as below:
+    conda install -c conda-forge pyspark python [other packages]  # can also 
use python=3.8, etc.

Review comment:
       ```suggestion
       conda install -c conda-forge pyspark  # can also use python=3.8, etc.
   ```

##########
File path: python/docs/source/getting_started/install.rst
##########
@@ -83,46 +83,43 @@ Note that this installation way of PySpark with/without a 
specific Hadoop versio
 Using Conda
 -----------
 
-Conda is an open-source package management and environment management system 
which is a part of
-the `Anaconda <https://docs.continuum.io/anaconda/>`_ distribution. It is both 
cross-platform and
-language agnostic. In practice, Conda can replace both `pip 
<https://pip.pypa.io/en/latest/>`_ and
-`virtualenv <https://virtualenv.pypa.io/en/latest/>`_.
+Conda is an open-source package management and environment management system 
(developed by
+`Anaconda <https://www.anaconda.com/>`_), which is best installed through
+`Miniconda <https://docs.conda.io/en/latest/miniconda.html/>`_ or `Miniforge 
<https://github.com/conda-forge/miniforge/>`_.
+The tool is both cross-platform and language agnostic, and in practice, conda 
can replace both
+`pip <https://pip.pypa.io/en/latest/>`_ and `virtualenv 
<https://virtualenv.pypa.io/en/latest/>`_.
 
-Create new virtual environment from your terminal as shown below:
+Conda uses so-called channels to distribute packages, and together with the 
default channels by
+Anaconda itself, the most important channel is `conda-forge 
<https://conda-forge.org/>`_, which
+is the community-driven packaging effort that is the most extensive & the most 
current (and also
+serves as the upstream for the Anaconda channels in most cases).
 
-.. code-block:: bash
-
-    conda create -n pyspark_env
-
-After the virtual environment is created, it should be visible under the list 
of Conda environments
-which can be seen using the following command:
-
-.. code-block:: bash
-
-    conda env list
-
-Now activate the newly created environment with the following command:
+To create a new conda environment from your terminal and activate it, proceed 
as shown below:
 
 .. code-block:: bash
 
+    conda create -n pyspark_env
     conda activate pyspark_env
 
-You can install pyspark by `Using PyPI <#using-pypi>`_ to install PySpark in 
the newly created
-environment, for example as below. It will install PySpark under the new 
virtual environment
-``pyspark_env`` created above.
+After activating the environment, use the following command to install pyspark,
+a python version of your choice, as well as other packages you want to use in
+the same session as pyspark (you can install in several steps too).
 
 .. code-block:: bash
 
-    pip install pyspark
-
-Alternatively, you can install PySpark from Conda itself as below:
+    conda install -c conda-forge pyspark python [other packages]  # can also 
use python=3.8, etc.
 
-.. code-block:: bash
+Note that `PySpark for conda <https://anaconda.org/conda-forge/pyspark>`_ is 
maintained
+separately by the community; while new versions generally get packaged 
quickly, the
+availability through conda(-forge) is not directly in sync with the PySpark 
release cycle.
 
-    conda install pyspark
+While using pip in a conda environment is technically feasible (with the same 
command as
+`above <#using-pypi>`_), this approach is `discouraged 
<https://www.anaconda.com/blog/using-pip-in-a-conda-environment/>`_,
+because pip does not interoperate with conda. In particular, pip might install 
over existing
+(conda-installed) packages and consequently break the functionality of the 
environment.

Review comment:
       ```suggestion
   because pip does not interoperate with conda.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to