tgravescs commented on a change in pull request #32810:
URL: https://github.com/apache/spark/pull/32810#discussion_r654460021
##########
File path:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
##########
@@ -1464,6 +1464,34 @@ private object Client extends Logging {
(mainUri ++ secondaryUris).toArray
}
+ /**
+ * Returns a list of local, absolute file URLs representing the user
classpath. Note that this
+ * must be executed on the same host which will access the URLs, as it will
resolve relative
+ * paths based on the current working directory.
+ *
+ * @param conf Spark configuration.
+ * @param useClusterPath Whether to use the 'cluster' path when resolving
paths with the
+ * `local` scheme. This should be used when running on
the cluster, but
+ * not when running on the gateway (i.e. for the
driver in `client` mode).
+ * @return Array of local URLs ready to be passed to a
[[java.net.URLClassLoader]].
+ */
+ def getUserClasspathUrls(conf: SparkConf, useClusterPath: Boolean):
Array[URL] = {
+ Client.getUserClasspath(conf).map { uri =>
+ val inputPath = uri.getPath
+ val localPath = if (Utils.isLocalUri(uri.toString)) {
Review comment:
so this isn't exactly the same as the logic in ExecutorRunnable since
this only applies getClusterPath if they have local: scheme and it applied to
any absolute path. I would have expected that the getClusterPath applied to
just file: as well. is there a reason we didn't just keep it if absolute path?
Really if someone specifies local: they mean it will be at that location on
all the worker nodes in that location, I don't see that the original issue that
added this said anything about local:
(https://issues.apache.org/jira/browse/SPARK-8302)
##########
File path:
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
##########
@@ -583,6 +584,31 @@ class ClientSuite extends SparkFunSuite with Matchers {
}
}
+ test("SPARK-35672: test Client.getUserClasspathUrls") {
+ val conf = new SparkConf()
+ .set(SECONDARY_JARS, Seq(
+ "local:/local/matching/replace/foo.jar",
+ "local:/local/not/matching/replace/foo.jar",
+ "file:/absolute/file/path/foo.jar",
+ "relative/file/path/foo.jar"
Review comment:
we should make sure absolute paths that match the gateway root path are
replaced
##########
File path:
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala
##########
@@ -583,6 +584,31 @@ class ClientSuite extends SparkFunSuite with Matchers {
}
}
+ test("SPARK-35672: test Client.getUserClasspathUrls") {
+ val conf = new SparkConf()
+ .set(SECONDARY_JARS, Seq(
+ "local:/local/matching/replace/foo.jar",
+ "local:/local/not/matching/replace/foo.jar",
+ "file:/absolute/file/path/foo.jar",
+ "relative/file/path/foo.jar"
Review comment:
we should make sure absolute paths that match the gateway root path are
replaced for non local: as well
##########
File path:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
##########
@@ -1464,6 +1464,34 @@ private object Client extends Logging {
(mainUri ++ secondaryUris).toArray
}
+ /**
+ * Returns a list of local, absolute file URLs representing the user
classpath. Note that this
+ * must be executed on the same host which will access the URLs, as it will
resolve relative
+ * paths based on the current working directory.
+ *
+ * @param conf Spark configuration.
+ * @param useClusterPath Whether to use the 'cluster' path when resolving
paths with the
+ * `local` scheme. This should be used when running on
the cluster, but
+ * not when running on the gateway (i.e. for the
driver in `client` mode).
+ * @return Array of local URLs ready to be passed to a
[[java.net.URLClassLoader]].
+ */
+ def getUserClasspathUrls(conf: SparkConf, useClusterPath: Boolean):
Array[URL] = {
+ Client.getUserClasspath(conf).map { uri =>
+ val inputPath = uri.getPath
+ val localPath = if (Utils.isLocalUri(uri.toString)) {
Review comment:
there is a test in ClientSuite.scala but it doesn't cover this part, it
just covers client side, there it tests that absolute path without local: is
changed.
##########
File path:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
##########
@@ -1464,6 +1464,34 @@ private object Client extends Logging {
(mainUri ++ secondaryUris).toArray
}
+ /**
+ * Returns a list of local, absolute file URLs representing the user
classpath. Note that this
+ * must be executed on the same host which will access the URLs, as it will
resolve relative
+ * paths based on the current working directory.
+ *
+ * @param conf Spark configuration.
+ * @param useClusterPath Whether to use the 'cluster' path when resolving
paths with the
+ * `local` scheme. This should be used when running on
the cluster, but
+ * not when running on the gateway (i.e. for the
driver in `client` mode).
+ * @return Array of local URLs ready to be passed to a
[[java.net.URLClassLoader]].
+ */
+ def getUserClasspathUrls(conf: SparkConf, useClusterPath: Boolean):
Array[URL] = {
+ Client.getUserClasspath(conf).map { uri =>
+ val inputPath = uri.getPath
+ val localPath = if (Utils.isLocalUri(uri.toString)) {
Review comment:
ah yes you are right, this is only looking at the jars we uploaded, was
thinking of other class path, but those are handled differently.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]