[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41942: [SPARK-44348][CORE][CONNECT][PYTHON] Reenable test_artifact with relevant changes

via GitHub Wed, 12 Jul 2023 03:15:33 -0700


HyukjinKwon commented on code in PR #41942:
URL: https://github.com/apache/spark/pull/41942#discussion_r1260951116



##########
core/src/main/scala/org/apache/spark/SparkContext.scala:
##########
@@ -1775,21 +1773,31 @@ class SparkContext(config: SparkConf) extends Logging {
     }
 
     val timestamp = if (addedOnSubmit) startTime else System.currentTimeMillis
+    // If the session ID was specified from SparkSession, it's from a Spark 
Connect client.
+    // Specify a dedicated directory for Spark Connect client.
+    // We're running Spark Connect as a service so regular PySpark path
+    // is not affected.
+    lazy val root = if (jobArtifactUUID != "default") {
+      val newDest = new File(SparkFiles.getRootDirectory(), jobArtifactUUID)

Review Comment:
   Yeah, it now needs to reuse `PythonWorkerFactory` in which assumes that 
there is a UUID named directory under `SparkFiles.getRootDirectory()` at both 
Driver and Executor. We _could_ try to reuse the local artifact directory but I 
would prefer to have another copy in the local for now for better 
maintainability and reusability for now.
   
   Otherwise, it does upload to the Spark file server twice (as we discussed 
offline). I pushed new changes to avoid this. So, after this change, we do not 
upload twice anymore by:
   1. Directly pass the `spark://` URI to `addFile` and `addJar`
   2. `addFile` and `addJar` will not attempt to upload the files, but bypass 
the original URI.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41942: [SPARK-44348][CORE][CONNECT][PYTHON] Reenable test_artifact with relevant changes

Reply via email to