nchammas commented on issue #28200:
URL: https://github.com/apache/spark/pull/28200#issuecomment-618525131


   One use case I have for this feature is to make up for the fact that 
`spark.jars` and `spark.driver.extraClassPath` don't support relative paths. 
This is handy if you use Spark locally to run tests or to execute small-scale 
data tasks.
   
   For example, say you have two projects on your local machine that each use 
PySpark.
   
   ```
   project1/
     spark-conf/
       spark-defaults.conf
       jars/
         some.jar
     local-tool1.py
     requirements.txt
   project2/
     spark-conf/
       spark-defaults.conf
       jars/
         another.jar
     local-tool2.py
     requirements.txt
   ```
   
   You install/manage PySpark for each project via separate Python virtual 
environments. The projects need different versions of Spark, and depend on 
different jars and packages that Spark needs to be pointed to. You specify all 
of that via each project's spark-defaults.conf, and set `SPARK_CONF_DIR` 
depending on what project you're working on.
   
   The problem is that you also want to point Spark at the appropriate jars, 
and spark-defaults.conf is not that helpful since you must specify absolute 
paths for `spark.jars` and `spark.driver.extraClassPath`. The absolute path to 
the jars will be different for each developer, since they can clone these 
projects to arbitrary directories, and since the absolute path will likely 
include their username in some way.
   
   I believe `SPARK_JARS_DIR` solves this problem. We can remove any mention of 
`spark.jars` or `spark.driver.extraClassPath` from the defaults files, which 
are checked in to version control, and instead have people set `SPARK_JARS_DIR` 
depending on what project they are using.
   
   I know this is not the use case that @marblejenka had in mind, but I wanted 
to share it since we have this problem at my current job. Perhaps this is more 
an argument for using Spark via Docker instead of virtual environments, but I 
wanted to share it for the record.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to