Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r193673500
--- Diff: docs/submitting-applications.md ---
@@ -218,6 +218,115 @@ These commands can be used with `pyspark`,
`spark-shell`, and `spark-submit` to
For Python, the equivalent `--py-files` option can be used to distribute
`.egg`, `.zip` and `.py` libraries
to executors.
+# VirtualEnv for PySpark (This is an experimental feature and may evolve
in future version)
+For simple PySpark application, we can use `--py-files` to add its
dependencies. While for a large PySpark application,
+usually you will have many dependencies which may also have transitive
dependencies and even some dependencies need to be compiled
+first to be installed. In this case `--py-files` is not so convenient.
Luckily, in python world we have virtualenv/conda to help create isolated
+python work environment. We also implement virtualenv in PySpark (It is
only supported in yarn mode for now). User can use this feature
+in 2 scenarios:
+* Batch mode (submit spark app via spark-submit)
+* Interactive mode (PySpark shell or other third party Spark Notebook)
+
--- End diff --
Ah, maybe we can leave a note at the end instead of adding it in the title.
```
Note that this is an experimental feature added from Spark 2.4.0 and may
evolve in the future version.
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]