[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

via GitHub Thu, 17 Aug 2023 07:34:06 -0700


Hisoka-X commented on code in PR #42220:
URL: https://github.com/apache/spark/pull/42220#discussion_r1297314416



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##########
@@ -238,11 +238,17 @@ object TableOutputResolver {
 
     if (reordered.length == expectedCols.length) {
       if (matchedCols.size < inputCols.length) {
-        val extraCols = inputCols.filterNot(col => 
matchedCols.contains(col.name))
-          .map(col => s"${toSQLId(col.name)}").mkString(", ")
-        throw 
QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(
-          tableName, colPath.quoted, extraCols
-        )
+        if (colPath.isEmpty) {
+          val cannotFindCol = expectedCols.filter(col => 
!matchedCols.contains(col.name)).head.name
+          throw 
QueryCompilationErrors.incompatibleDataToTableCannotFindDataError(tableName,
+            cannotFindCol)
+        } else {
+          val extraCols = inputCols.filterNot(col => 
matchedCols.contains(col.name))
+            .map(col => s"${toSQLId(col.name)}").mkString(", ")
+          throw 
QueryCompilationErrors.incompatibleDataToTableExtraStructFieldsError(

Review Comment:
   These are DS version and error info mapping.
   | API | Type | Error | Info | Where throw |
   | --- | --- | --- | --- | --- |
   | V1  | Top Column | miss field | CANNOT_FIND_DATA | TableOutputResolver:243 
|
   | V1  | Top Column | extra field | TOO_MANY_DATA_COLUMNS | 
TableOutputResolver:51 |
   | V2  | Top Column | extra field | TOO_MANY_DATA_COLUMNS | 
TableOutputResolver:51 |
   | V2  | Top Column | miss field | CANNOT_FIND_DATA | TableOutputResolver:201 
|
   | V2  | Struct | miss field | CANNOT_FIND_DATA | TableOutputResolver:201 |
   | V2  | Struct | extra field | EXTRA_STRUCT_FIELDS | TableOutputResolver:248 
|
   
   There are two type error info: `CANNOT_FIND_DATA` and 
`EXTRA_STRUCT_FIELDS/TOO_MANY_DATA_COLUMNS`.
   
   The key difference between `TableOutputResolver:248` and 
`TableOutputResolver:243` is `Top Column` in 243 only report `CANNOT_FIND_DATA` 
but not `EXTRA_STRUCT_FIELDS`. Because `EXTRA_STRUCT_FIELDS` already be check 
in `TableOutputResolver:51` when field is top level column 
(`TOO_MANY_DATA_COLUMNS` are same like `EXTRA_STRUCT_FIELDS`). But this check 
only for top level column. So we can make sure that when we invoke 
`TableOutputResolver:243`, the error only could be `CANNOT_FIND_DATA`. But 
struct doesn't had check like `TOO_MANY_DATA_COLUMNS`. So we should invoke 
`EXTRA_STRUCT_FIELDS`. And struct's `CANNOT_FIND_DATA` already be checked in 
`TableOutputResolver:201`. So when invoke `TableOutputResolver:248`, only can 
be `EXTRA_STRUCT_FIELDS`. But check `CANNOT_FIND_DATA` in 
`TableOutputResolver:201` will miss on V1 top column because it had default 
value (as `null`).
   
   If we want improve this part logic, I think we should add 
`TOO_MANY_DATA_COLUMNS(EXTRA_STRUCT_FIELDS)` check for struct before 
`reorderColumnsByName`, then we can replace `TableOutputResolver:248` to 
`TableOutputResolver:243`.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42220: [SPARK-44577][SQL] Fix INSERT BY NAME returns nonsensical error message

Reply via email to