Hi Haopu,
actually here `key` is nullable because this is your input's schema :

scala> result.printSchema
root
|-- key: string (nullable = true)
|-- SUM(value): long (nullable = true)

scala> df.printSchema
root
|-- key: string (nullable = true)
|-- value: long (nullable = false)

I tried it with a schema where the key is not flagged as nullable, and the
schema is actually respected. What you can argue however is that SUM(value)
should also be not nullable since value is not nullable.

@rxin do you think it would be reasonable to flag the Sum aggregation
function as nullable (or not) depending on the input expression's schema ?

Regards,

Olivier.
Le lun. 11 mai 2015 à 22:07, Reynold Xin <r...@databricks.com> a écrit :

> Not by design. Would you be interested in submitting a pull request?
>
> On Mon, May 11, 2015 at 1:48 AM, Haopu Wang <hw...@qilinsoft.com> wrote:
>
>> I try to get the result schema of aggregate functions using DataFrame
>> API.
>>
>> However, I find the result field of groupBy columns are always nullable
>> even the source field is not nullable.
>>
>> I want to know if this is by design, thank you! Below is the simple code
>> to show the issue.
>>
>> ======
>>
>>   import sqlContext.implicits._
>>   import org.apache.spark.sql.functions._
>>   case class Test(key: String, value: Long)
>>   val df = sc.makeRDD(Seq(Test("k1",2),Test("k1",1))).toDF
>>
>>   val result = df.groupBy("key").agg($"key", sum("value"))
>>
>>   // From the output, you can see the "key" column is nullable, why??
>>   result.printSchema
>> //    root
>> //     |-- key: string (nullable = true)
>> //     |-- SUM(value): long (nullable = true)
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to