[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...

2018-07-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21771#discussion_r202543543
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging 
{
   Utils.fetchFile(uri.toString, new 
File(SparkFiles.getRootDirectory()), conf,
 env.securityManager, hadoopConfiguration, timestamp, useCache = 
false)
   postEnvironmentUpdate()
+} else {
+  logWarning(s"The path $path has been added already. Overwriting of 
added paths " +
--- End diff --

I mean I'm happy with the warning message but less sure if we note. I'm 
okay.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...

2018-07-15 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21771#discussion_r202536141
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging 
{
   Utils.fetchFile(uri.toString, new 
File(SparkFiles.getRootDirectory()), conf,
 env.securityManager, hadoopConfiguration, timestamp, useCache = 
false)
   postEnvironmentUpdate()
+} else {
+  logWarning(s"The path $path has been added already. Overwriting of 
added paths " +
--- End diff --

@HyukjinKwon Our support receives a few "bug" reports per months. For now 
we can provide a link to the note at least. The warning itself is needed to our 
support engineers to detect such kind of problems from logs of already finished 
jobs. Actually customers do not say in their bug reports that files/jars 
weren't overwritten (it would be easier). They report problems like calling a 
method from a lib crashes due to incompatible signature of method or a class 
doesn't exists. Or final result of a Spark job is not correct because a 
config/resource files added via `addFile()`  is not up to date. Now I can 
detect the situation from logs and provide a link to docs for 
`addFile()`/`addJar()`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...

2018-07-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21771#discussion_r202531999
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging 
{
   Utils.fetchFile(uri.toString, new 
File(SparkFiles.getRootDirectory()), conf,
 env.securityManager, hadoopConfiguration, timestamp, useCache = 
false)
   postEnvironmentUpdate()
+} else {
+  logWarning(s"The path $path has been added already. Overwriting of 
added paths " +
--- End diff --

I was wondering how common it is for an user to add the same jar expecting 
it will overwrite since mostly we consider those cases as immutable resources.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...

2018-07-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21771#discussion_r202531920
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1849,6 +1858,9 @@ class SparkContext(config: SparkConf) extends Logging 
{
 if (addedJars.putIfAbsent(key, timestamp).isEmpty) {
   logInfo(s"Added JAR $path at $key with timestamp $timestamp")
   postEnvironmentUpdate()
+} else {
+  logWarning(s"The jar $path has been added already. Overwriting 
of added jars " +
--- End diff --

This could confuse what it means with `spark.files.overwrite`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...

2018-07-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21771#discussion_r202531879
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging 
{
   Utils.fetchFile(uri.toString, new 
File(SparkFiles.getRootDirectory()), conf,
 env.securityManager, hadoopConfiguration, timestamp, useCache = 
false)
   postEnvironmentUpdate()
+} else {
+  logWarning(s"The path $path has been added already. Overwriting of 
added paths " +
--- End diff --

I meant a comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...

2018-07-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21771


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...

2018-07-14 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21771#discussion_r202531555
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging 
{
   Utils.fetchFile(uri.toString, new 
File(SparkFiles.getRootDirectory()), conf,
 env.securityManager, hadoopConfiguration, timestamp, useCache = 
false)
   postEnvironmentUpdate()
+} else {
+  logWarning(s"The path $path has been added already. Overwriting of 
added paths " +
--- End diff --

The notes are also reasonable to me. This is a common user error. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...

2018-07-14 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21771#discussion_r202531553
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging 
{
   Utils.fetchFile(uri.toString, new 
File(SparkFiles.getRootDirectory()), conf,
 env.securityManager, hadoopConfiguration, timestamp, useCache = 
false)
   postEnvironmentUpdate()
+} else {
+  logWarning(s"The path $path has been added already. Overwriting of 
added paths " +
--- End diff --

We normally do not post the JIRA number in the message.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...

2018-07-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21771#discussion_r202530515
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging 
{
   Utils.fetchFile(uri.toString, new 
File(SparkFiles.getRootDirectory()), conf,
 env.securityManager, hadoopConfiguration, timestamp, useCache = 
false)
   postEnvironmentUpdate()
+} else {
+  logWarning(s"The path $path has been added already. Overwriting of 
added paths " +
--- End diff --

Shell we leave a JIRA link as a comment for example SPARK-16787 and/or 
SPARK-19417


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...

2018-07-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21771#discussion_r202530492
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging 
{
   Utils.fetchFile(uri.toString, new 
File(SparkFiles.getRootDirectory()), conf,
 env.securityManager, hadoopConfiguration, timestamp, useCache = 
false)
   postEnvironmentUpdate()
+} else {
+  logWarning(s"The path $path has been added already. Overwriting of 
added paths " +
--- End diff --

eh, @MaxGekk, how about we just give warnings without notes for now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...

2018-07-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21771#discussion_r202530444
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging 
{
   Utils.fetchFile(uri.toString, new 
File(SparkFiles.getRootDirectory()), conf,
 env.securityManager, hadoopConfiguration, timestamp, useCache = 
false)
   postEnvironmentUpdate()
+} else {
+  logWarning(s"The path $path has been added already. Overwriting of 
added paths " +
--- End diff --

How does it relate to `spark.files.overwrite`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...

2018-07-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21771#discussion_r202530411
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging 
{
   Utils.fetchFile(uri.toString, new 
File(SparkFiles.getRootDirectory()), conf,
 env.securityManager, hadoopConfiguration, timestamp, useCache = 
false)
   postEnvironmentUpdate()
+} else {
+  logWarning(s"The path $path has been added already. Overwriting of 
added paths " +
--- End diff --

How does it relate to `spark.files.overwrite` and how much likely do we 
reach here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org