I think what happened was applying the narrowest possible type. Type widening is required, and as a result, the narrowest type is string between a string and an int.
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala#L144 On Tue, Apr 7, 2015 at 5:00 PM, Justin Yip <yipjus...@prediction.io> wrote: > Hello, > > I am experimenting with DataFrame. I tried to construct two DataFrames > with: > 1. case class A(a: Int, b: String) > scala> adf.printSchema() > root > |-- a: integer (nullable = false) > |-- b: string (nullable = true) > > 2. case class B(a: String, c: Int) > scala> bdf.printSchema() > root > |-- a: string (nullable = true) > |-- c: integer (nullable = false) > > > Then I unioned the these two DataFrame with the unionAll function, and I > get the following schema. It is kind of a mixture of A and B. > > scala> val udf = adf.unionAll(bdf) > scala> udf.printSchema() > root > |-- a: string (nullable = false) > |-- b: string (nullable = true) > > The unionAll documentation says it behaves like the SQL UNION ALL > function. However, unioning incompatible types is not well defined for SQL. > Is there any expected behavior for unioning incompatible data frames? > > Thanks. > > Justin >