[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21771#discussion_r202543543 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging { Utils.fetchFile(uri.toString, new File(SparkFiles.getRootDirectory()), conf, env.securityManager, hadoopConfiguration, timestamp, useCache = false) postEnvironmentUpdate() +} else { + logWarning(s"The path $path has been added already. Overwriting of added paths " + --- End diff -- I mean I'm happy with the warning message but less sure if we note. I'm okay. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21771#discussion_r202536141 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging { Utils.fetchFile(uri.toString, new File(SparkFiles.getRootDirectory()), conf, env.securityManager, hadoopConfiguration, timestamp, useCache = false) postEnvironmentUpdate() +} else { + logWarning(s"The path $path has been added already. Overwriting of added paths " + --- End diff -- @HyukjinKwon Our support receives a few "bug" reports per months. For now we can provide a link to the note at least. The warning itself is needed to our support engineers to detect such kind of problems from logs of already finished jobs. Actually customers do not say in their bug reports that files/jars weren't overwritten (it would be easier). They report problems like calling a method from a lib crashes due to incompatible signature of method or a class doesn't exists. Or final result of a Spark job is not correct because a config/resource files added via `addFile()` is not up to date. Now I can detect the situation from logs and provide a link to docs for `addFile()`/`addJar()`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21771#discussion_r202531999 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging { Utils.fetchFile(uri.toString, new File(SparkFiles.getRootDirectory()), conf, env.securityManager, hadoopConfiguration, timestamp, useCache = false) postEnvironmentUpdate() +} else { + logWarning(s"The path $path has been added already. Overwriting of added paths " + --- End diff -- I was wondering how common it is for an user to add the same jar expecting it will overwrite since mostly we consider those cases as immutable resources. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21771#discussion_r202531920 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1849,6 +1858,9 @@ class SparkContext(config: SparkConf) extends Logging { if (addedJars.putIfAbsent(key, timestamp).isEmpty) { logInfo(s"Added JAR $path at $key with timestamp $timestamp") postEnvironmentUpdate() +} else { + logWarning(s"The jar $path has been added already. Overwriting of added jars " + --- End diff -- This could confuse what it means with `spark.files.overwrite`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21771#discussion_r202531879 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging { Utils.fetchFile(uri.toString, new File(SparkFiles.getRootDirectory()), conf, env.securityManager, hadoopConfiguration, timestamp, useCache = false) postEnvironmentUpdate() +} else { + logWarning(s"The path $path has been added already. Overwriting of added paths " + --- End diff -- I meant a comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21771 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21771#discussion_r202531555 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging { Utils.fetchFile(uri.toString, new File(SparkFiles.getRootDirectory()), conf, env.securityManager, hadoopConfiguration, timestamp, useCache = false) postEnvironmentUpdate() +} else { + logWarning(s"The path $path has been added already. Overwriting of added paths " + --- End diff -- The notes are also reasonable to me. This is a common user error. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21771#discussion_r202531553 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging { Utils.fetchFile(uri.toString, new File(SparkFiles.getRootDirectory()), conf, env.securityManager, hadoopConfiguration, timestamp, useCache = false) postEnvironmentUpdate() +} else { + logWarning(s"The path $path has been added already. Overwriting of added paths " + --- End diff -- We normally do not post the JIRA number in the message. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21771#discussion_r202530515 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging { Utils.fetchFile(uri.toString, new File(SparkFiles.getRootDirectory()), conf, env.securityManager, hadoopConfiguration, timestamp, useCache = false) postEnvironmentUpdate() +} else { + logWarning(s"The path $path has been added already. Overwriting of added paths " + --- End diff -- Shell we leave a JIRA link as a comment for example SPARK-16787 and/or SPARK-19417 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21771#discussion_r202530492 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging { Utils.fetchFile(uri.toString, new File(SparkFiles.getRootDirectory()), conf, env.securityManager, hadoopConfiguration, timestamp, useCache = false) postEnvironmentUpdate() +} else { + logWarning(s"The path $path has been added already. Overwriting of added paths " + --- End diff -- eh, @MaxGekk, how about we just give warnings without notes for now? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21771#discussion_r202530444 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging { Utils.fetchFile(uri.toString, new File(SparkFiles.getRootDirectory()), conf, env.securityManager, hadoopConfiguration, timestamp, useCache = false) postEnvironmentUpdate() +} else { + logWarning(s"The path $path has been added already. Overwriting of added paths " + --- End diff -- How does it relate to `spark.files.overwrite`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21771#discussion_r202530411 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging { Utils.fetchFile(uri.toString, new File(SparkFiles.getRootDirectory()), conf, env.securityManager, hadoopConfiguration, timestamp, useCache = false) postEnvironmentUpdate() +} else { + logWarning(s"The path $path has been added already. Overwriting of added paths " + --- End diff -- How does it relate to `spark.files.overwrite` and how much likely do we reach here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org