nchammas commented on issue #28200:
URL: https://github.com/apache/spark/pull/28200#issuecomment-618525131
One use case I have for this feature is to make up for the fact that
`spark.jars` and `spark.driver.extraClassPath` don't support relative paths.
This is handy if you use Spark locally to run tests or to execute small-scale
data tasks.
For example, say you have two projects on your local machine that each use
PySpark.
```
project1/
spark-conf/
spark-defaults.conf
jars/
some.jar
local-tool1.py
requirements.txt
project2/
spark-conf/
spark-defaults.conf
jars/
another.jar
local-tool2.py
requirements.txt
```
You install/manage PySpark for each project via separate Python virtual
environments. The projects need different versions of Spark, and depend on
different jars and packages that Spark needs to be pointed to. You specify all
of that via each project's spark-defaults.conf, and set `SPARK_CONF_DIR`
depending on what project you're working on.
The problem is that you also want to point Spark at the appropriate jars,
and spark-defaults.conf is not that helpful since you must specify absolute
paths for `spark.jars` and `spark.driver.extraClassPath`. The absolute path to
the jars will be different for each developer, since they can clone these
projects to arbitrary directories, and since the absolute path will likely
include their username in some way.
I believe `SPARK_JARS_DIR` solves this problem. We can remove any mention of
`spark.jars` or `spark.driver.extraClassPath` from the defaults files, which
are checked in to version control, and instead have people set `SPARK_JARS_DIR`
depending on what project they are using.
I know this is not the use case that @marblejenka had in mind, but I wanted
to share it since we have this problem at my current job. Perhaps this is more
an argument for using Spark via Docker instead of virtual environments, but I
wanted to share it for the record.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]