Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22316#discussion_r214842133
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
@@ -308,4 +308,27 @@ class DataFramePivotSuite extends QueryTest with
SharedSQLContext {
assert(exception.getMessage.contains("aggregate functions are not
allowed"))
}
+
+ test("pivoting column list with values") {
+ val expected = Row(2012, 10000.0, null) :: Row(2013, 48000.0, 30000.0)
:: Nil
+ val df = trainingSales
+ .groupBy($"sales.year")
+ .pivot(struct(lower($"sales.course"), $"training"), Seq(
+ struct(lit("dotnet"), lit("Experts")),
+ struct(lit("java"), lit("Dummies")))
+ ).agg(sum($"sales.earnings"))
+
+ checkAnswer(df, expected)
+ }
+
+ test("pivoting column list") {
+ val exception = intercept[RuntimeException] {
+ trainingSales
+ .groupBy($"sales.year")
+ .pivot(struct(lower($"sales.course"), $"training"))
+ .agg(sum($"sales.earnings"))
+ .collect()
--- End diff --
> I miss something?
No, you don't. The exception for sure is thrown inside of `lit` because
`collect()` returns a complex value which cannot be "wrapped" by lit. This is
exactly checked in the test which I added to show existing behavior.
> btw, IMHO AnalysisException is better than RuntimeException in this case?
@maropu Could you explain, please, why do you think `AnalysisException` is
better for the error occurs in run-time?
Just in case, in the PR, I don't aim to change behavior of existing method:
`def pivot(pivotColumn: Column): RelationalGroupedDataset`. I believe it should
be discussed separately regarding to needs for changing user visible behavior.
The PR aims to improve `def pivot(pivotColumn: Column, values: Seq[Any]):
RelationalGroupedDataset` to allow users to specify `struct` literals in
particular. Please, see the description.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]