HyukjinKwon commented on code in PR #47434:
URL: https://github.com/apache/spark/pull/47434#discussion_r1734021012
##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala:
##########
@@ -859,6 +865,49 @@ object SparkSession extends Logging {
}
}
+ /**
+ * Create a new Spark Connect server to connect locally.
+ */
+ private[sql] def withLocalConnectServer[T](f: => T): T = {
+ synchronized {
+ val remoteString = initOptions
+ .get("spark.remote")
+ .orElse(Option(System.getProperty("spark.remote"))) // Set from Spark
Submit
+ .orElse(sys.env.get(SparkConnectClient.SPARK_REMOTE))
+
+ if (server.isEmpty && remoteString.exists(_.startsWith("local"))) {
+ val sparkHome = System.getenv("SPARK_HOME")
+ server = Some {
+ val args = Seq(
+ Paths.get(sparkHome, "sbin", "start-connect-server.sh").toString,
+ "--master",
+ remoteString.get) ++ initOptions
+ .filter(p => !p._1.startsWith("spark.remote"))
+ .flatMap { case (k, v) => Seq("--conf", s"$k=$v") }
+ val pb = new ProcessBuilder(args: _*)
+ // So don't exclude spark-sql jar in classpath
+ pb.environment().remove(SparkConnectClient.SPARK_REMOTE)
+ pb.start()
+ }
+
+ // Let the server start. We will directly request to set the
configurations
+ // and this sleep makes less noisy with retries.
+ Thread.sleep(2000L)
+ System.setProperty("spark.remote", "sc://localhost")
+
+ // scalastyle:off runtimeaddshutdownhook
+ Runtime.getRuntime.addShutdownHook(new Thread() {
Review Comment:
I actually suffered a lot from those points
> Is the driver process started a child process?
It is not a child process by using the script. For example, if we don't
invoke this script, Spark Connect server cannot be killed together.
> Is the runtime hook a way to gracefully shutdown? From the code it seems
to issue a kill command.
Probably there should be a better way but I couldn't really come up with
it... For now, this will be only used for `local[...]` and `local-cluster[...]`
so it should be fine since they aren't encouraged in production. However, if
this is used in other cases too (e.g., Spark Classic master configurations), we
should probably take a deeper look into it ...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]