[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

2018-11-14 Thread caneGuy
Github user caneGuy closed the pull request at:

https://github.com/apache/spark/pull/21526


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

2018-07-12 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/21526#discussion_r202006002
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -357,6 +357,11 @@ package object config {
   .intConf
   .createWithDefault(256)
 
+  private[spark] val HADOOP_OUTPUTCOMMITCOORDINATION_ENABLED =
+ConfigBuilder("spark.hadoop.outputCommitCoordination.enabled")
+  .booleanConf
--- End diff --

Done @cloud-fan Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

2018-07-12 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/21526#discussion_r202005902
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -357,6 +357,11 @@ package object config {
   .intConf
   .createWithDefault(256)
 
+  private[spark] val HADOOP_OUTPUTCOMMITCOORDINATION_ENABLED =
+ConfigBuilder("spark.hadoop.outputCommitCoordination.enabled")
+  .booleanConf
--- End diff --

Done @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

2018-07-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21526#discussion_r202005442
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -357,6 +357,11 @@ package object config {
   .intConf
   .createWithDefault(256)
 
+  private[spark] val HADOOP_OUTPUTCOMMITCOORDINATION_ENABLED =
+ConfigBuilder("spark.hadoop.outputCommitCoordination.enabled")
+  .booleanConf
--- End diff --

.doc("when enabled, tasks will coordinate with the driver to make sure 
that, for a certain partition, at most one task attempt can commit.")


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

2018-07-12 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/21526#discussion_r201963160
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
 // users that they may loss data if they are using a direct output 
committer.
 val speculationEnabled = self.conf.getBoolean("spark.speculation", 
false)
 val outputCommitterClass = 
hadoopConf.get("mapred.output.committer.class", "")
-if (speculationEnabled && outputCommitterClass.contains("Direct")) {
+val outputCommitCoordinationEnabled = self.conf.getBoolean(
+  "spark.hadoop.outputCommitCoordination.enabled", true)
+if (speculationEnabled && outputCommitterClass.contains("Direct")
+  && !outputCommitCoordinationEnabled) {
   val warningMessage =
--- End diff --

Also modify `HiveFileFormat`. @cloud-fan @jiangxb1987 
And the reason i do not use an other common function to refactor this is 
that i can't find a good place to put the function.Any suggestion?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

2018-07-12 Thread caneGuy
Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/21526#discussion_r201962635
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
 // users that they may loss data if they are using a direct output 
committer.
 val speculationEnabled = self.conf.getBoolean("spark.speculation", 
false)
 val outputCommitterClass = 
hadoopConf.get("mapred.output.committer.class", "")
-if (speculationEnabled && outputCommitterClass.contains("Direct")) {
+val outputCommitCoordinationEnabled = self.conf.getBoolean(
+  "spark.hadoop.outputCommitCoordination.enabled", true)
--- End diff --

Update @cloud-fan Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

2018-07-11 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21526#discussion_r201720609
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
 // users that they may loss data if they are using a direct output 
committer.
 val speculationEnabled = self.conf.getBoolean("spark.speculation", 
false)
 val outputCommitterClass = 
hadoopConf.get("mapred.output.committer.class", "")
-if (speculationEnabled && outputCommitterClass.contains("Direct")) {
+val outputCommitCoordinationEnabled = self.conf.getBoolean(
+  "spark.hadoop.outputCommitCoordination.enabled", true)
+if (speculationEnabled && outputCommitterClass.contains("Direct")
+  && !outputCommitCoordinationEnabled) {
   val warningMessage =
--- End diff --

Yea, I found a similar message in `HiveFileFormat`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

2018-07-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21526#discussion_r201717982
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
 // users that they may loss data if they are using a direct output 
committer.
 val speculationEnabled = self.conf.getBoolean("spark.speculation", 
false)
 val outputCommitterClass = 
hadoopConf.get("mapred.output.committer.class", "")
-if (speculationEnabled && outputCommitterClass.contains("Direct")) {
+val outputCommitCoordinationEnabled = self.conf.getBoolean(
+  "spark.hadoop.outputCommitCoordination.enabled", true)
+if (speculationEnabled && outputCommitterClass.contains("Direct")
+  && !outputCommitCoordinationEnabled) {
   val warningMessage =
--- End diff --

is this the only place? IIRC we log a warning for this in several places.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

2018-07-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21526#discussion_r201717626
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -1053,7 +1053,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
 // users that they may loss data if they are using a direct output 
committer.
 val speculationEnabled = self.conf.getBoolean("spark.speculation", 
false)
 val outputCommitterClass = 
hadoopConf.get("mapred.output.committer.class", "")
-if (speculationEnabled && outputCommitterClass.contains("Direct")) {
+val outputCommitCoordinationEnabled = self.conf.getBoolean(
+  "spark.hadoop.outputCommitCoordination.enabled", true)
--- End diff --

since we are touching it, can we define this config in 
`org.apache.spark.internal.config`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21526: [SPARK-24515][CORE] No need to warning when outpu...

2018-06-11 Thread caneGuy
GitHub user caneGuy opened a pull request:

https://github.com/apache/spark/pull/21526

[SPARK-24515][CORE] No need to warning when output commit coordination 
enabled

## What changes were proposed in this pull request?

No need to warning user when output commit coordination enabled
```
// When speculation is on and output committer class name contains 
"Direct", we should warn
// users that they may loss data if they are using a direct output 
committer.
val speculationEnabled = self.conf.getBoolean("spark.speculation", false)
val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", 
"")
if (speculationEnabled && outputCommitterClass.contains("Direct")) {
 val warningMessage =
 s"$outputCommitterClass may be an output committer that writes data 
directly to " +
 "the final location. Because speculation is enabled, this output committer 
may " +
 "cause data loss (see the case in SPARK-10063). If possible, please use an 
output " +
 "committer that does not have this behavior (e.g. FileOutputCommitter)."
 logWarning(warningMessage)
}
```

## How was this patch tested?
UT


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/caneGuy/spark zhoukang/fix-warning

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21526.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21526


commit 6bac1531929e914764d980e4eb4228a10436876b
Author: zhoukang 
Date:   2018-06-11T08:57:11Z

[SPARK][CORE] No need to warning when output commit coordination enabled




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org