tgravescs commented on a change in pull request #32810:
URL: https://github.com/apache/spark/pull/32810#discussion_r654460021



##########
File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
##########
@@ -1464,6 +1464,34 @@ private object Client extends Logging {
     (mainUri ++ secondaryUris).toArray
   }
 
+  /**
+   * Returns a list of local, absolute file URLs representing the user 
classpath. Note that this
+   * must be executed on the same host which will access the URLs, as it will 
resolve relative
+   * paths based on the current working directory.
+   *
+   * @param conf Spark configuration.
+   * @param useClusterPath Whether to use the 'cluster' path when resolving 
paths with the
+   *                       `local` scheme. This should be used when running on 
the cluster, but
+   *                       not when running on the gateway (i.e. for the 
driver in `client` mode).
+   * @return Array of local URLs ready to be passed to a 
[[java.net.URLClassLoader]].
+   */
+  def getUserClasspathUrls(conf: SparkConf, useClusterPath: Boolean): 
Array[URL] = {
+    Client.getUserClasspath(conf).map { uri =>
+      val inputPath = uri.getPath
+      val localPath = if (Utils.isLocalUri(uri.toString)) {

Review comment:
       so this isn't exactly the same as the logic in ExecutorRunnable since 
this only applies getClusterPath if they have local: scheme and it applied to 
any absolute path.  I would have expected that the getClusterPath applied to 
just file: as well.   is there a reason we didn't just keep it if absolute path?
   Really if someone specifies local: they mean it will be at that location on 
all the worker nodes in that location, I don't see that the original issue that 
added this said anything about local: 
(https://issues.apache.org/jira/browse/SPARK-8302)

##########
File path: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
##########
@@ -583,6 +584,31 @@ class ClientSuite extends SparkFunSuite with Matchers {
     }
   }
 
+  test("SPARK-35672: test Client.getUserClasspathUrls") {
+    val conf = new SparkConf()
+        .set(SECONDARY_JARS, Seq(
+          "local:/local/matching/replace/foo.jar",
+          "local:/local/not/matching/replace/foo.jar",
+          "file:/absolute/file/path/foo.jar",
+          "relative/file/path/foo.jar"

Review comment:
       we should make sure absolute paths that match the gateway root path are 
replaced

##########
File path: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
##########
@@ -583,6 +584,31 @@ class ClientSuite extends SparkFunSuite with Matchers {
     }
   }
 
+  test("SPARK-35672: test Client.getUserClasspathUrls") {
+    val conf = new SparkConf()
+        .set(SECONDARY_JARS, Seq(
+          "local:/local/matching/replace/foo.jar",
+          "local:/local/not/matching/replace/foo.jar",
+          "file:/absolute/file/path/foo.jar",
+          "relative/file/path/foo.jar"

Review comment:
       we should make sure absolute paths that match the gateway root path are 
replaced for non local: as well

##########
File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
##########
@@ -1464,6 +1464,34 @@ private object Client extends Logging {
     (mainUri ++ secondaryUris).toArray
   }
 
+  /**
+   * Returns a list of local, absolute file URLs representing the user 
classpath. Note that this
+   * must be executed on the same host which will access the URLs, as it will 
resolve relative
+   * paths based on the current working directory.
+   *
+   * @param conf Spark configuration.
+   * @param useClusterPath Whether to use the 'cluster' path when resolving 
paths with the
+   *                       `local` scheme. This should be used when running on 
the cluster, but
+   *                       not when running on the gateway (i.e. for the 
driver in `client` mode).
+   * @return Array of local URLs ready to be passed to a 
[[java.net.URLClassLoader]].
+   */
+  def getUserClasspathUrls(conf: SparkConf, useClusterPath: Boolean): 
Array[URL] = {
+    Client.getUserClasspath(conf).map { uri =>
+      val inputPath = uri.getPath
+      val localPath = if (Utils.isLocalUri(uri.toString)) {

Review comment:
       there is a test in ClientSuite.scala  but it doesn't cover this part, it 
just covers client side, there it tests that absolute path without local: is 
changed.

##########
File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
##########
@@ -1464,6 +1464,34 @@ private object Client extends Logging {
     (mainUri ++ secondaryUris).toArray
   }
 
+  /**
+   * Returns a list of local, absolute file URLs representing the user 
classpath. Note that this
+   * must be executed on the same host which will access the URLs, as it will 
resolve relative
+   * paths based on the current working directory.
+   *
+   * @param conf Spark configuration.
+   * @param useClusterPath Whether to use the 'cluster' path when resolving 
paths with the
+   *                       `local` scheme. This should be used when running on 
the cluster, but
+   *                       not when running on the gateway (i.e. for the 
driver in `client` mode).
+   * @return Array of local URLs ready to be passed to a 
[[java.net.URLClassLoader]].
+   */
+  def getUserClasspathUrls(conf: SparkConf, useClusterPath: Boolean): 
Array[URL] = {
+    Client.getUserClasspath(conf).map { uri =>
+      val inputPath = uri.getPath
+      val localPath = if (Utils.isLocalUri(uri.toString)) {

Review comment:
       ah yes you are right, this is only looking at the jars we uploaded, was 
thinking of other class path, but those are handled differently.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to