Github user skambha commented on a diff in the pull request:
https://github.com/apache/spark/pull/19747#discussion_r151331207
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
---
@@ -507,6 +508,7 @@ private[hive] class HiveClientImpl(
// these properties are still available to the others that share the
same Hive metastore.
// If users explicitly alter these Hive-specific properties through
ALTER TABLE DDL, we respect
// these user-specified values.
+ verifyColumnDataType(table.dataSchema)
--- End diff --
Thanks @gatorsmile for the review. I'll incorporate your other comments
in my next commit.
In the current codeline, another recent PR changed verifyColumnNames to
verifyDataSchema.
The reason I could not put the check in verifyDataSchema ( or the old
verifyColumnNames):
- verifyDataSchema is called in the beginning of the doCreateTable method.
But we cannot error out that early as later on in the doCreateTable method, as
later on in that method, we create the datasource table. If the datasource
table cannot be stored in hive compatible format, it falls back to storing it
in Spark sql specific format which will work fine.
- For e.g If I put the check there, then the create datasource table would
throw an exception right away, which we do not want.
```CREATE TABLE t(q STRUCT<`$a`:INT, col2:STRING>, i1 INT) USING PARQUET```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]