[GitHub] spark pull request #22030: [SPARK-25048][SQL] Pivoting by multiple columns i...

maryannxue Tue, 07 Aug 2018 16:00:47 -0700

Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22030#discussion_r208410422
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -384,6 +392,10 @@ class RelationalGroupedDataset protected[sql](
           .sort(pivotColumn)  // ensure that the output columns are in a 
consistent logical order
           .collect()
           .map(_.get(0))
    +      .collect {
    +        case row: GenericRow => struct(row.values.map(lit): _*)
    --- End diff --
    
    I suspect this will not work for nested struct types, or say, multiple 
pivot columns with nested type. Could you please add a test like:
    ```
      test("pivoting column list") {
        val expected = ...
        val df = trainingSales
          .groupBy($"sales.year")
          .pivot(struct($"sales", $"training"))
          .agg(sum($"sales.earnings"))
         checkAnswer(df, expected)
      }
    ```
    And can we also check if it works for other complex nested types, like 
Array(Struct(...))?




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22030: [SPARK-25048][SQL] Pivoting by multiple columns i...

Reply via email to