Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/21611#discussion_r198509965
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala ---
@@ -333,4 +406,28 @@ class DatasetAggregatorSuite extends QueryTest with
SharedSQLContext {
df.groupBy($"i").agg(VeryComplexResultAgg.toColumn),
Row(1, Row(Row(1, "a"), Row(1, "a"))) :: Row(2, Row(Row(2, "bc"),
Row(2, "bc"))) :: Nil)
}
+
+ test("SPARK-24569: Aggregator with output type Option[Boolean] creates
column of type Row") {
+ val df = Seq(
+ OptionBooleanData("bob", Some(true)),
+ OptionBooleanData("bob", Some(false)),
+ OptionBooleanData("bob", None)).toDF()
+ val group = df
+ .groupBy("name")
--- End diff --
Yes, if you use similar `Aggregator` with `groupByKey`, you gets a struct
too:
```scala
val df = Seq(
OptionBooleanData("bob", Some(true)),
OptionBooleanData("bob", Some(false)),
OptionBooleanData("bob", None)).toDF()
val df2 = df.groupByKey((r: Row) => r.getString(0))
.agg(OptionBooleanAggregator("isGood").toColumn)
df2.printSchema
```
```
root
|-- value: string (nullable = true)
|-- OptionBooleanAggregator(org.apache.spark.sql.Row): struct (nullable =
true)
| |-- value: boolean (nullable = true)
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]