HyukjinKwon opened a new pull request, #39463:
URL: https://github.com/apache/spark/pull/39463

   ### What changes were proposed in this pull request?
   
   This PR mainly proposes to pass the user-specified configurations to local 
remote mode.
   
   Previously, all user-specific configurations were ignored in case of PySpark 
shell such as`./bin/pyspark` or plain Python interpreter - PySpark application 
submission case was fine.
   
   Now, configurations are properly passed to the server side, e.g., 
`./bin/pyspark --remote local --conf aaa=bbb` and `aaa=bbb` is properly passed 
to the server side.
   
   For `spark.master` and `spark.plugins`, user-specific configurations are 
respected. If they are unset, they are automatically set, e.g., 
`org.apache.spark.sql.connect.SparkConnectPlugin`. If they are set, users have 
to provide the proper values to overwrite them, meaning that either:
   
   ```bash
   ./bin/pyspark --remote local --conf 
spark.plugins="other.Plugin,org.apache.spark.sql.connect.SparkConnectPlugin"
   ```
   
   or
   
   ```bash
   ./bin/pyspark --remote local
   ```
   
   In addition, this PR fixes the related code as below:
   - Adds `spark.local.connect` internal configuration to be used in Spark 
Submit (so we don't have to parse and manipulate user specified arguments in 
Python in order to remove `--remote` or `spark.remote` configuration).
   - Adds some more validation on arguments in `SparkSubmitCommandBuilder` so 
invalid combination can fail fast (e.g., setting both remote and master like 
`--master ...` and `--conf spark.remote=...`)
   - In dev mode, do not set `spark.jars` anymore since it adds the jars into 
the class path of the JVM through `addJarToCurrentClassLoader`.
   
   ### Why are the changes needed?
   
   To correctly pass the configurations specified from users.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, Spark Connect has not been released yet.
   This is kind of a followup of https://github.com/apache/spark/pull/39441 to 
complete its support.
   
   ### How was this patch tested?
   
   Manually tested all combinations such as:
   
   ```bash
   ./bin/pyspark --conf spark.remote=local
   ./bin/pyspark --conf spark.remote=local --conf spark.jars=a
   ./bin/pyspark --conf spark.remote=local --jars 
/.../spark/connector/connect/server/target/scala-2.12/spark-connect-assembly-3.4.0-SNAPSHOT.jar
   ./bin/spark-submit --conf spark.remote=local --jars 
/.../spark/connector/connect/server/target/scala-2.12/spark-connect-assembly-3.4.0-SNAPSHOT.jar
 app.py
   ./bin/pyspark --conf spark.remote=local --conf 
spark.jars=/.../spark/connector/connect/server/target/scala-2.12/spark-connect-assembly-3.4.0-SNAPSHOT.jar
   ./bin/pyspark --master "local[*]" --remote "local"
   ./bin/spark-submit --conf spark.remote=local app.py
   ./bin/spark-submit --master="local[*]" --conf spark.remote=local app.py
   ./bin/spark-submit --master="local[*]" --remote=local app.py
   ./bin/pyspark --master "local[*]" --remote "local"
   ./bin/pyspark --master "local[*]" --remote "local"
   ./bin/pyspark --master "local[*]" --conf spark.remote="local"
   ./bin/spark-submit --master="local[*]" --remote=local app.py
   ./bin/pyspark --remote local
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to