HyukjinKwon commented on code in PR #47434:
URL: https://github.com/apache/spark/pull/47434#discussion_r1734006850


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -859,6 +865,49 @@ object SparkSession extends Logging {
     }
   }
 
+  /**
+   * Create a new Spark Connect server to connect locally.
+   */
+  private[sql] def withLocalConnectServer[T](f: => T): T = {
+    synchronized {
+      val remoteString = initOptions

Review Comment:
   > Can you give an example of what spark.remote is supposed to contain? 
Instead of spark connect string, it seems to contain the spark master URL. I 
don't really like overloading configuration like this. This is also illustrated 
in a little bit further down the line, where are reconfiguring this thing.
   
   The code path is a bit convoluted. 
   - `spark.remote` can contain `local[...]`, `local-cluster[...]` and 
`sc://...` only.
   - This specific `withLocalConnectServer` will happen when `local[...]` and 
`local-cluster[...]` are given (by the condition below 
`remoteString.exists(_.startsWith("local"))`).
     - After that, we switch them to `sc://localhost` internally when actually 
creating a Spark Connect cleint.
   
   > In a perfect world I think we should not start the driver process here. 
IMO that belongs in the launcher. The more pressing downside is that submit now 
only works for the languages we directly support, which conflicts with the 
language agnostic nature of connect. In that case this code can disappear :)...
   
   I totally support to get rid of this concept - in fact I don't like it very 
much either. We can hide it or make it dev/test-only. The only problem is the 
migration from Spark Classic.
   
   As we discussed offline a little bit, I am open to rearrange those stuff and 
would be pretty easy so I don't worry much about it itself.
   
   > For now, I would just make the launcher set the normal configurations and 
set the spark.remote flag to what it is supposed to be. If the master conf is 
present we will start a driver process. The spark remote could also be 
generated, in Scala/JVM land we know that we are in connect mode if this code 
is being invoked.
   
   This is also related to Spark Classic migration. Without launching Spark 
Connect server locally, we can't start Spark Shell just like Spark Classic, and 
users would have to turn Spark Connect off.
   
   > If the master conf is present we will start a driver process. The spark 
remote could also be generated, in Scala/JVM land we know that we are in 
connect mode if this code is being invoked.
   
   So, I think this one is what we need to define. It would be pretty easy to 
fix once we set it down.
   
   > How does this work in PySpark do we need a remote for this?
   
   How it's done is consistent with the current patch. BTW, for clarification, 
after this change
   - `spark-submit --remote local*` cases  are still disabled (although other 
code are ready/able to do, it will fail during Spark Connect client creation)
   - `spark-submit --remote "sc://" works
   - `spark-shell --remote local* works
   - `spark-shell --remote "sc://" works
   
   For PySpark, the cases below already work:
   - `spark-submit --remote local*` works
   - `spark-submit --remote "sc://" works
   - `pyspark --remote local* works
   - `pyspark --remote "sc://" works
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to