[GitHub] [spark] vicennial commented on a diff in pull request #41942: [SPARK-44348][CORE][CONNECT][PYTHON] Reenable test_artifact with relevant changes

via GitHub Wed, 12 Jul 2023 08:38:21 -0700


vicennial commented on code in PR #41942:
URL: https://github.com/apache/spark/pull/41942#discussion_r1261364588



##########
core/src/main/scala/org/apache/spark/SparkContext.scala:
##########
@@ -1775,21 +1773,31 @@ class SparkContext(config: SparkConf) extends Logging {
     }
 
     val timestamp = if (addedOnSubmit) startTime else System.currentTimeMillis
+    // If the session ID was specified from SparkSession, it's from a Spark 
Connect client.
+    // Specify a dedicated directory for Spark Connect client.
+    // We're running Spark Connect as a service so regular PySpark path
+    // is not affected.
+    lazy val root = if (jobArtifactUUID != "default") {
+      val newDest = new File(SparkFiles.getRootDirectory(), jobArtifactUUID)

Review Comment:
   >  reuse PythonWorkerFactory in which assumes that there is a UUID named 
directory under SparkFiles.getRootDirectory() at both Driver and Executor
   
   Ahh gotcha, I am not very well aware of the Python side, good to know 👍 
   
   > So, after this change, we do not upload twice anymore by:
   Directly pass the spark:// URI to addFile and addJar
   addFile and addJar will not attempt to upload the files, but bypass the 
original URI.
   
   Awesome!
   
   



##########
core/src/main/scala/org/apache/spark/SparkContext.scala:
##########
@@ -1775,21 +1773,31 @@ class SparkContext(config: SparkConf) extends Logging {
     }
 
     val timestamp = if (addedOnSubmit) startTime else System.currentTimeMillis
+    // If the session ID was specified from SparkSession, it's from a Spark 
Connect client.
+    // Specify a dedicated directory for Spark Connect client.
+    // We're running Spark Connect as a service so regular PySpark path
+    // is not affected.
+    lazy val root = if (jobArtifactUUID != "default") {
+      val newDest = new File(SparkFiles.getRootDirectory(), jobArtifactUUID)

Review Comment:
   >  reuse PythonWorkerFactory in which assumes that there is a UUID named 
directory under SparkFiles.getRootDirectory() at both Driver and Executor
   
   Ahh gotcha, I am not very well aware of the Python side, good to know 👍 
   
   > So, after this change, we do not upload twice anymore by:
   Directly pass the spark:// URI to addFile and addJar
   addFile and addJar will not attempt to upload the files, but bypass the 
original URI.
   
   Awesome!
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] vicennial commented on a diff in pull request #41942: [SPARK-44348][CORE][CONNECT][PYTHON] Reenable test_artifact with relevant changes

Reply via email to