[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Github user caneGuy closed the pull request at: https://github.com/apache/spark/pull/21526 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/21526#discussion_r202006002 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -357,6 +357,11 @@ package object config { .intConf .createWithDefault(256) + private[spark] val HADOOP_OUTPUTCOMMITCOORDINATION_ENABLED = +ConfigBuilder("spark.hadoop.outputCommitCoordination.enabled") + .booleanConf --- End diff -- Done @cloud-fan Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/21526#discussion_r202005902 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -357,6 +357,11 @@ package object config { .intConf .createWithDefault(256) + private[spark] val HADOOP_OUTPUTCOMMITCOORDINATION_ENABLED = +ConfigBuilder("spark.hadoop.outputCommitCoordination.enabled") + .booleanConf --- End diff -- Done @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21526#discussion_r202005442 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -357,6 +357,11 @@ package object config { .intConf .createWithDefault(256) + private[spark] val HADOOP_OUTPUTCOMMITCOORDINATION_ENABLED = +ConfigBuilder("spark.hadoop.outputCommitCoordination.enabled") + .booleanConf --- End diff -- .doc("when enabled, tasks will coordinate with the driver to make sure that, for a certain partition, at most one task attempt can commit.") --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/21526#discussion_r201963160 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) // users that they may loss data if they are using a direct output committer. val speculationEnabled = self.conf.getBoolean("spark.speculation", false) val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "") -if (speculationEnabled && outputCommitterClass.contains("Direct")) { +val outputCommitCoordinationEnabled = self.conf.getBoolean( + "spark.hadoop.outputCommitCoordination.enabled", true) +if (speculationEnabled && outputCommitterClass.contains("Direct") + && !outputCommitCoordinationEnabled) { val warningMessage = --- End diff -- Also modify `HiveFileFormat`. @cloud-fan @jiangxb1987 And the reason i do not use an other common function to refactor this is that i can't find a good place to put the function.Any suggestion? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/21526#discussion_r201962635 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) // users that they may loss data if they are using a direct output committer. val speculationEnabled = self.conf.getBoolean("spark.speculation", false) val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "") -if (speculationEnabled && outputCommitterClass.contains("Direct")) { +val outputCommitCoordinationEnabled = self.conf.getBoolean( + "spark.hadoop.outputCommitCoordination.enabled", true) --- End diff -- Update @cloud-fan Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21526#discussion_r201720609 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) // users that they may loss data if they are using a direct output committer. val speculationEnabled = self.conf.getBoolean("spark.speculation", false) val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "") -if (speculationEnabled && outputCommitterClass.contains("Direct")) { +val outputCommitCoordinationEnabled = self.conf.getBoolean( + "spark.hadoop.outputCommitCoordination.enabled", true) +if (speculationEnabled && outputCommitterClass.contains("Direct") + && !outputCommitCoordinationEnabled) { val warningMessage = --- End diff -- Yea, I found a similar message in `HiveFileFormat` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21526#discussion_r201717982 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) // users that they may loss data if they are using a direct output committer. val speculationEnabled = self.conf.getBoolean("spark.speculation", false) val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "") -if (speculationEnabled && outputCommitterClass.contains("Direct")) { +val outputCommitCoordinationEnabled = self.conf.getBoolean( + "spark.hadoop.outputCommitCoordination.enabled", true) +if (speculationEnabled && outputCommitterClass.contains("Direct") + && !outputCommitCoordinationEnabled) { val warningMessage = --- End diff -- is this the only place? IIRC we log a warning for this in several places. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21526#discussion_r201717626 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) // users that they may loss data if they are using a direct output committer. val speculationEnabled = self.conf.getBoolean("spark.speculation", false) val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "") -if (speculationEnabled && outputCommitterClass.contains("Direct")) { +val outputCommitCoordinationEnabled = self.conf.getBoolean( + "spark.hadoop.outputCommitCoordination.enabled", true) --- End diff -- since we are touching it, can we define this config in `org.apache.spark.internal.config`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...
GitHub user caneGuy opened a pull request: https://github.com/apache/spark/pull/21526 [SPARK-24515][CORE] No need to warning when output commit coordination enabled ## What changes were proposed in this pull request? No need to warning user when output commit coordination enabled ``` // When speculation is on and output committer class name contains "Direct", we should warn // users that they may loss data if they are using a direct output committer. val speculationEnabled = self.conf.getBoolean("spark.speculation", false) val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "") if (speculationEnabled && outputCommitterClass.contains("Direct")) { val warningMessage = s"$outputCommitterClass may be an output committer that writes data directly to " + "the final location. Because speculation is enabled, this output committer may " + "cause data loss (see the case in SPARK-10063). If possible, please use an output " + "committer that does not have this behavior (e.g. FileOutputCommitter)." logWarning(warningMessage) } ``` ## How was this patch tested? UT You can merge this pull request into a Git repository by running: $ git pull https://github.com/caneGuy/spark zhoukang/fix-warning Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21526.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21526 commit 6bac1531929e914764d980e4eb4228a10436876b Author: zhoukang Date: 2018-06-11T08:57:11Z [SPARK][CORE] No need to warning when output commit coordination enabled --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org