Re: Expected behavior for DataFrame.unionAll

Justin Yip Mon, 13 Apr 2015 23:43:47 -0700

That explains it. Thanks Reynold.

Justin


On Mon, Apr 13, 2015 at 11:26 PM, Reynold Xin <r...@databricks.com> wrote:

> I think what happened was applying the narrowest possible type. Type
> widening is required, and as a result, the narrowest type is string between
> a string and an int.
>
>
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala#L144
>
>
>
> On Tue, Apr 7, 2015 at 5:00 PM, Justin Yip <yipjus...@prediction.io>
> wrote:
>
>> Hello,
>>
>> I am experimenting with DataFrame. I tried to construct two DataFrames
>> with:
>> 1. case class A(a: Int, b: String)
>> scala> adf.printSchema()
>> root
>>  |-- a: integer (nullable = false)
>>  |-- b: string (nullable = true)
>>
>> 2. case class B(a: String, c: Int)
>> scala> bdf.printSchema()
>> root
>>  |-- a: string (nullable = true)
>>  |-- c: integer (nullable = false)
>>
>>
>> Then I unioned the these two DataFrame with the unionAll function, and I
>> get the following schema. It is kind of a mixture of A and B.
>>
>> scala> val udf = adf.unionAll(bdf)
>> scala> udf.printSchema()
>> root
>>  |-- a: string (nullable = false)
>>  |-- b: string (nullable = true)
>>
>> The unionAll documentation says it behaves like the SQL UNION ALL
>> function. However, unioning incompatible types is not well defined for SQL.
>> Is there any expected behavior for unioning incompatible data frames?
>>
>> Thanks.
>>
>> Justin
>>
>
>

Re: Expected behavior for DataFrame.unionAll

Reply via email to