Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8687#discussion_r39184789
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
    @@ -1051,6 +1049,15 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
         val jobAttemptId = newTaskAttemptID(jobtrackerID, stageId, isMap = 
true, 0, 0)
         val jobTaskContext = newTaskAttemptContext(wrappedConf.value, 
jobAttemptId)
         val jobCommitter = jobFormat.getOutputCommitter(jobTaskContext)
    +
    +    // If speculation is enabled, we only allow FileOutputCommitter.
    +    val speculationEnabled = self.conf.getBoolean("spark.speculation", 
false)
    +    if (speculationEnabled && jobCommitter.getClass !=
    +      classOf[org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]) 
{
    +      throw new SparkException(s"Cannot use ${jobCommitter.getClass} as 
output committer when" +
    +        "speculation is enabled.")
    +    }
    +
    --- End diff --
    
    `jobFormat.getOutputCommitter()` will return the output committer 
associated with the output format. Basically, in normal cases, you cannot 
specify output committer for `mapreduce` api (the new api). So, I think we 
should not make change at here. Also, `jobCommitter.getClass != 
classOf[org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]` is a 
strong condition, which is not always the case. Every output format implemented 
with `mapreduce` API can have its own output committer (e.g. 
`org.apache.parquet.hadoop.ParquetOutputCommitter`).
    
    Let's leave this part unchanged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to