[GitHub] spark issue #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO support shou...

lindblombr Fri, 27 Jul 2018 13:21:33 -0700

Github user lindblombr commented on the issue:

    https://github.com/apache/spark/pull/21847
  
    @gengliangwang I think I agree with your take on the multi-type unions in 
principal.  The only issue is that avro itself does support that as a valid use 
case; however, when I think about the behavior...
    
    If we read in A to SparkSQL where a is type `["null", "int", "long"]`, 
Spark will automatically up-convert to a long.  This will mean that even though 
the original data may have had a combination of records stored as either "int" 
or "long", any attempt to write out that same data with user-specified schema 
will convert all ints to longs, resulting in a slightly different dataset.
    
    This side-effect may be undesirable in some cases (maybe).  I personally 
don't have a use-case for this, except the test data in this module itself 
includes a multi-type union of this nature, and I wanted this functionality to 
work for as much of the test data as was available.  If, for the sake of 
simplicity, we'd like to restrict user-specified schemas to two-type unions 
only, it would still work for my use cases.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO support shou...

Reply via email to