GitHub user steveloughran opened a pull request:
https://github.com/apache/spark/pull/20824
With SPARK-20236, FileCommitProtocol.instantiate() looks for a three â¦
## What changes were proposed in this pull request?
With SPARK-20236, `FileCommitProtocol.instantiate()` looks for a three
argument constructor, passing in the `dynamicPartitionOverwrite` parameter. If
there is no such constructor, it falls back to the classic two-arg one.
When `InsertIntoHadoopFsRelationCommand` passes down that
`dynamicPartitionOverwrite` flag `to FileCommitProtocol.instantiate(`), it
assumes that the instantiated protocol supports the specific requirements of
dynamic partition overwrite. It does not notice when this does not hold, and so
the output generated may be incorrect.
This patch changes `FileCommitProtocol.instantiate()` so when
`dynamicPartitionOverwrite == true`, it requires the protocol implementation to
have a 3-arg constructor. Classic two arg constructors are supported when it is
false.
Also it adds some debug level logging for anyone trying to understand
what's going on.
## How was this patch tested?
Unit tests verify that
* classes with only 2-arg constructor cannot be used with dynamic overwrite
* classes with only 2-arg constructor can be used without dynamic overwrite
* classes with 3 arg constructors can be used with both.
* the fallback to any two arg ctor takes place after the attempt to load
the 3-arg ctor,
* passing in invalid class types fail as expected (regression tests on
expected behavior)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/steveloughran/spark
stevel/SPARK-23683-protocol-instantiate
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20824.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20824
----
commit 529db0851e50f5bf15cc71d400a030db4e696350
Author: Steve Loughran <stevel@...>
Date: 2018-03-14T17:43:34Z
With SPARK-20236, FileCommitProtocol.instantiate() looks for a three
argument constructor, passing in the dynamicPartitionOverwrite parameter. If
there is no such constructor, it falls back to the classic two-arg one.
When InsertIntoHadoopFsRelationCommand passes down that
dynamicPartitionOverwrite flag to FileCommitProtocol.instantiate(), it assumes
that the instantiated protocol supports the specific requirements of dynamic
partition overwrite. It does not notice when this does not hold, and so the
output generated may be incorrect.
This patch changes FileCommitProtocol.instantiate() so when
dynamicPartitionOverwrite == true, it requires the protocol implementation to
have a 3-arg constructor.
Tests verify that
* classes with only 2-arg constructor cannot be used with dynamic overwrite
* classes with only 2-arg constructor can be used without dynamic overwrite
* classes with 3 arg constructors can be used with both
* the fallback to any two arg ctor takes place after the attempt to load
the 3-arg ctor,
* passing in invalid class types fail as expected (regression tests on
expected behavior)
Change-Id: I694868aecf865cfa552e031ea3f6dde8b600fa7b
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]