Re: Expected behavior for DataFrame.unionAll

Reynold Xin Mon, 13 Apr 2015 23:29:41 -0700

I think what happened was applying the narrowest possible type. Type
widening is required, and as a result, the narrowest type is string between
a string and an int.


https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala#L144


On Tue, Apr 7, 2015 at 5:00 PM, Justin Yip <yipjus...@prediction.io> wrote:

> Hello,
>
> I am experimenting with DataFrame. I tried to construct two DataFrames
> with:
> 1. case class A(a: Int, b: String)
> scala> adf.printSchema()
> root
>  |-- a: integer (nullable = false)
>  |-- b: string (nullable = true)
>
> 2. case class B(a: String, c: Int)
> scala> bdf.printSchema()
> root
>  |-- a: string (nullable = true)
>  |-- c: integer (nullable = false)
>
>
> Then I unioned the these two DataFrame with the unionAll function, and I
> get the following schema. It is kind of a mixture of A and B.
>
> scala> val udf = adf.unionAll(bdf)
> scala> udf.printSchema()
> root
>  |-- a: string (nullable = false)
>  |-- b: string (nullable = true)
>
> The unionAll documentation says it behaves like the SQL UNION ALL
> function. However, unioning incompatible types is not well defined for SQL.
> Is there any expected behavior for unioning incompatible data frames?
>
> Thanks.
>
> Justin
>

Re: Expected behavior for DataFrame.unionAll

Reply via email to