Github user lindblombr commented on the issue:
https://github.com/apache/spark/pull/21847
@gengliangwang I think I agree with your take on the multi-type unions in
principal. The only issue is that avro itself does support that as a valid use
case; however, when I think about the behavior...
If we read in A to SparkSQL where a is type `["null", "int", "long"]`,
Spark will automatically up-convert to a long. This will mean that even though
the original data may have had a combination of records stored as either "int"
or "long", any attempt to write out that same data with user-specified schema
will convert all ints to longs, resulting in a slightly different dataset.
This side-effect may be undesirable in some cases (maybe). I personally
don't have a use-case for this, except the test data in this module itself
includes a multi-type union of this nature, and I wanted this functionality to
work for as much of the test data as was available. If, for the sake of
simplicity, we'd like to restrict user-specified schemas to two-type unions
only, it would still work for my use cases.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]