[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

MaxGekk Tue, 04 Sep 2018 02:14:25 -0700

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22316#discussion_r214842133
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala ---
    @@ -308,4 +308,27 @@ class DataFramePivotSuite extends QueryTest with 
SharedSQLContext {
     
         assert(exception.getMessage.contains("aggregate functions are not 
allowed"))
       }
    +
    +  test("pivoting column list with values") {
    +    val expected = Row(2012, 10000.0, null) :: Row(2013, 48000.0, 30000.0) 
:: Nil
    +    val df = trainingSales
    +      .groupBy($"sales.year")
    +      .pivot(struct(lower($"sales.course"), $"training"), Seq(
    +        struct(lit("dotnet"), lit("Experts")),
    +        struct(lit("java"), lit("Dummies")))
    +      ).agg(sum($"sales.earnings"))
    +
    +    checkAnswer(df, expected)
    +  }
    +
    +  test("pivoting column list") {
    +    val exception = intercept[RuntimeException] {
    +      trainingSales
    +        .groupBy($"sales.year")
    +        .pivot(struct(lower($"sales.course"), $"training"))
    +        .agg(sum($"sales.earnings"))
    +        .collect()
    --- End diff --
    
    > I miss something?
    
    No, you don't. The exception for sure is thrown inside of `lit` because 
`collect()` returns a complex value which cannot be "wrapped" by lit. This is 
exactly checked in the test which I added to show existing behavior.
    
    > btw, IMHO AnalysisException is better than RuntimeException in this case?
    
    @maropu Could you explain, please, why do you think `AnalysisException` is 
better for the error occurs in run-time?
    
    Just in case, in the PR, I don't aim to change behavior of existing method: 
`def pivot(pivotColumn: Column): RelationalGroupedDataset`. I believe it should 
be discussed separately regarding to needs for changing user visible behavior.  
The PR aims to improve `def pivot(pivotColumn: Column, values: Seq[Any]): 
RelationalGroupedDataset` to allow users to specify `struct` literals in 
particular. Please, see the description.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

Reply via email to