[GitHub] [spark] EnricoMi commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

GitBox Thu, 06 Oct 2022 01:17:03 -0700


EnricoMi commented on code in PR #37407:
URL: https://github.com/apache/spark/pull/37407#discussion_r988700017



##########
sql/core/src/test/resources/sql-tests/results/unpivot.sql.out:
##########
@@ -0,0 +1,409 @@
+-- Automatically generated by SQLQueryTestSuite
+-- !query
+create temporary view courseEarnings as select * from values
+  ("dotNET", 15000, 48000, 22500),
+  ("Java", 20000, 30000, NULL)
+  as courseEarnings(course, `2012`, `2013`, `2014`)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+SELECT * FROM courseEarnings
+UNPIVOT (
+  earningsYear FOR year IN (`2012`, `2013`, `2014`)
+)
+-- !query schema
+struct<course:string,year:string,earningsYear:int>
+-- !query output
+Java   2012    20000
+Java   2013    30000
+dotNET 2012    15000
+dotNET 2013    48000
+dotNET 2014    22500
+
+
+-- !query
+SELECT * FROM courseEarnings
+UNPIVOT INCLUDE NULLS (
+  earningsYear FOR year IN (`2012`, `2013`, `2014`)
+)
+-- !query schema
+struct<course:string,year:string,earningsYear:int>
+-- !query output
+Java   2012    20000
+Java   2013    30000
+Java   2014    NULL
+dotNET 2012    15000
+dotNET 2013    48000
+dotNET 2014    22500
+
+
+-- !query
+SELECT * FROM courseEarnings
+UNPIVOT EXCLUDE NULLS (
+  earningsYear FOR year IN (`2012`, `2013`, `2014`)
+)
+-- !query schema
+struct<course:string,year:string,earningsYear:int>
+-- !query output
+Java   2012    20000
+Java   2013    30000
+dotNET 2012    15000
+dotNET 2013    48000
+dotNET 2014    22500
+
+
+-- !query
+SELECT * FROM (
+  SELECT course, `the.earnings`.* FROM (
+    SELECT course, struct(`2012`, `2013`, `2014`) AS `the.earnings`
+    FROM courseEarnings
+  )
+)
+UNPIVOT (
+  earningsYear FOR year IN (`2012`, `2013`, `2014`)
+)
+-- !query schema
+struct<course:string,year:string,earningsYear:int>
+-- !query output
+Java   2012    20000
+Java   2013    30000
+dotNET 2012    15000
+dotNET 2013    48000
+dotNET 2014    22500
+
+
+-- !query
+SELECT * FROM (
+  SELECT course, struct(`2012`, `2013`, `2014`) AS `the.earnings`
+  FROM courseEarnings
+)
+UNPIVOT (
+  earningsYear FOR year IN (`the.earnings`.`2012`, `the.earnings`.`2013`, 
`the.earnings`.`2014`)
+)
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "UNPIVOT_REQUIRES_ATTRIBUTES",
+  "sqlState" : "42000",
+  "messageParameters" : {
+    "empty" : "ids",
+    "given" : "values",
+    "types" : "Alias (the.earnings.2012 AS the.earnings.2012, 
the.earnings.2013 AS the.earnings.2013, the.earnings.2014 AS the.earnings.2014)"
+  }
+}
+
+
+-- !query
+SELECT * FROM courseEarnings
+UNPIVOT (
+  earningsYear FOR year IN (`2013`, `2014`)
+)
+-- !query schema
+struct<course:string,2012:int,year:string,earningsYear:int>
+-- !query output
+Java   20000   2013    30000
+dotNET 15000   2013    48000
+dotNET 15000   2014    22500
+
+
+-- !query
+SELECT * FROM courseEarningsAndSales
+UNPIVOT (
+  values FOR year IN ()
+)
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.parser.ParseException
+{
+  "errorClass" : "PARSE_SYNTAX_ERROR",
+  "sqlState" : "42000",
+  "messageParameters" : {
+    "error" : "')'",
+    "hint" : ""
+  }
+}
+
+
+-- !query
+SELECT * FROM courseEarnings
+UNPIVOT (
+  earningsYear FOR year IN (`2012` as `twenty-twelve`, `2013` as 
`twenty-thirteen`, `2014` as `twenty-fourteen`)
+)
+-- !query schema
+struct<course:string,year:string,earningsYear:int>
+-- !query output
+Java   twenty-thirteen 30000
+Java   twenty-twelve   20000
+dotNET twenty-fourteen 22500
+dotNET twenty-thirteen 48000
+dotNET twenty-twelve   15000
+
+
+-- !query
+SELECT up.* FROM courseEarnings
+UNPIVOT (
+  earningsYear FOR year IN (`2012`, `2013`, `2014`)
+) up
+-- !query schema
+struct<course:string,year:string,earningsYear:int>
+-- !query output
+Java   2012    20000
+Java   2013    30000
+dotNET 2012    15000
+dotNET 2013    48000
+dotNET 2014    22500
+
+
+-- !query
+SELECT up.* FROM courseEarnings
+UNPIVOT (
+  earningsYear FOR year IN (`2012`, `2013`, `2014`)
+) AS up

Review Comment:
   Those parser suites parse the SQL statement and assert the logical plan, but 
the plan is not fully analyzed. Some of the situations tested in `unpivot.sql` 
require full analysis, so they cannot be covered by `UnpivotParserSuite`.
   
   I could add those to `DatasetUnpivotSuite` and assert the result of 
`spark.sql("...")`.
   
   I'll sketch that out so we can see how this looks like.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] EnricoMi commented on a diff in pull request #37407: [SPARK-39876][SQL] Add UNPIVOT to SQL syntax

Reply via email to