[GitHub] [spark] srielau commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

GitBox Mon, 27 Jun 2022 09:49:29 -0700


srielau commented on code in PR #36150:
URL: https://github.com/apache/spark/pull/36150#discussion_r907584961



##########
core/src/main/resources/error/error-classes.json:
##########
@@ -256,6 +256,18 @@
       "Key <keyValue> does not exist. Use `try_element_at` to tolerate 
non-existent key and return NULL instead. If necessary set <config> to 
\"false\" to bypass this error."
     ]
   },
+  "MELT_REQUIRES_VALUE_COLUMNS" : {

Review Comment:
   Given that even pandas using UNPIVOT in the description. Can we use that 
verb instead of 
met?https://pandas.pydata.org/docs/reference/api/pandas.melt.html



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##########
@@ -422,6 +426,15 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
             }
             metrics.foreach(m => checkMetric(m, m))
 
+          // see Analyzer.ResolveMelt
+          case m: Melt if m.childrenResolved && m.ids.forall(_.resolved) && 
m.values.isEmpty =>
+            failAnalysis("MELT_REQUIRES_VALUE_COLUMNS", 
Array(m.ids.mkString(", ")))

Review Comment:
   I think toSQLId() is need to make sure the ids are properly decorated with 
back-ticks. 



##########
core/src/main/resources/error/error-classes.json:
##########
@@ -256,6 +256,18 @@
       "Key <keyValue> does not exist. Use `try_element_at` to tolerate 
non-existent key and return NULL instead. If necessary set <config> to 
\"false\" to bypass this error."
     ]
   },
+  "MELT_REQUIRES_VALUE_COLUMNS" : {
+    "message" : [
+      "At least one non-id column is required to melt. All columns are id 
columns: [<ids>]"

Review Comment:
   I don't understand this restriction. What does it mean?



##########
core/src/main/resources/error/error-classes.json:
##########
@@ -256,6 +256,18 @@
       "Key <keyValue> does not exist. Use `try_element_at` to tolerate 
non-existent key and return NULL instead. If necessary set <config> to 
\"false\" to bypass this error."
     ]
   },
+  "MELT_REQUIRES_VALUE_COLUMNS" : {
+    "message" : [
+      "At least one non-id column is required to melt. All columns are id 
columns: [<ids>]"
+    ],
+    "sqlState" : "42000"
+  },
+  "MELT_VALUE_DATA_TYPE_MISMATCH" : {
+    "message" : [
+      "Melt value columns must have compatible data types, some data types are 
not compatible: [<types>]"

Review Comment:
   I think "share a least-common type" is the term that is well defined.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##########
@@ -422,6 +426,15 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
             }
             metrics.foreach(m => checkMetric(m, m))
 
+          // see Analyzer.ResolveMelt
+          case m: Melt if m.childrenResolved && m.ids.forall(_.resolved) && 
m.values.isEmpty =>
+            failAnalysis("MELT_REQUIRES_VALUE_COLUMNS", 
Array(m.ids.mkString(", ")))
+          // see TypeCoercionBase.MeltCoercion
+          case m: Melt if m.values.nonEmpty && m.values.forall(_.resolved) && 
m.valueType.isEmpty =>
+            failAnalysis("MELT_VALUE_DATA_TYPE_MISMATCH", Array(
+              m.values.map(_.dataType).toSet.mkString(", ")

Review Comment:
   I think the types need to be decorated/cleaned up @MaxGekk 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srielau commented on a diff in pull request #36150: [SPARK-38864][SQL] Add melt / unpivot to Dataset

Reply via email to