Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/20824#discussion_r174829859
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala ---
@@ -145,15 +146,23 @@ object FileCommitProtocol {
jobId: String,
outputPath: String,
dynamicPartitionOverwrite: Boolean = false): FileCommitProtocol = {
+
+ logDebug(s"Creating committer $className; job $jobId;
output=$outputPath;" +
+ s" dynamic=$dynamicPartitionOverwrite")
val clazz =
Utils.classForName(className).asInstanceOf[Class[FileCommitProtocol]]
// First try the constructor with arguments (jobId: String,
outputPath: String,
// dynamicPartitionOverwrite: Boolean).
// If that doesn't exist, try the one with (jobId: string, outputPath:
String).
try {
val ctor = clazz.getDeclaredConstructor(classOf[String],
classOf[String], classOf[Boolean])
+ logDebug("Using (String, String, Boolean) constructor")
ctor.newInstance(jobId, outputPath,
dynamicPartitionOverwrite.asInstanceOf[java.lang.Boolean])
} catch {
case _: NoSuchMethodException =>
+ logDebug("Falling back to (String, String) constructor")
+ require(!dynamicPartitionOverwrite,
+ "Dynamic Partition Overwrite is enabled but" +
+ s" the committer ${className} does not have the appropriate
constructor")
--- End diff --
Problem is that the dynamic partition logic in
`InsertIntoHadoopFsRelationCommand` assumes that rename() is a fast reliable
operation you can do with any implementation of the FileCommitProtocol, sets
itself up for it when enabled, then instantiates the inner committer, and
carries on with the dynamic partitioning, irrespective of whether or not.
rename() doesn't always work like that, breaking the rest of the algorithm.
If the committer doesn't have that 3-arg constructor, you can't be
confident that you can do that. To silently log and continue is to run the risk
that the underlying committers commit algorithm isn't compatible with the
algorithm.
A fail-fast ensures that when the outcome is going to be unknown, you aren'
t left trying to work out what's happened.
Regarding your example, yes, it's in trouble. Problem is: how to
differentiate that from subclasses which don't know anything at all about the
new feature. you can't even look for an interface on the newly created object
if the base class implements it; you are left with some dynamic probe of the
instance.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]