Github user skambha commented on a diff in the pull request: https://github.com/apache/spark/pull/19747#discussion_r151331207 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -507,6 +508,7 @@ private[hive] class HiveClientImpl( // these properties are still available to the others that share the same Hive metastore. // If users explicitly alter these Hive-specific properties through ALTER TABLE DDL, we respect // these user-specified values. + verifyColumnDataType(table.dataSchema) --- End diff -- Thanks @gatorsmile for the review. I'll incorporate your other comments in my next commit. In the current codeline, another recent PR changed verifyColumnNames to verifyDataSchema. The reason I could not put the check in verifyDataSchema ( or the old verifyColumnNames): - verifyDataSchema is called in the beginning of the doCreateTable method. But we cannot error out that early as later on in the doCreateTable method, as later on in that method, we create the datasource table. If the datasource table cannot be stored in hive compatible format, it falls back to storing it in Spark sql specific format which will work fine. - For e.g If I put the check there, then the create datasource table would throw an exception right away, which we do not want. ```CREATE TABLE t(q STRUCT<`$a`:INT, col2:STRING>, i1 INT) USING PARQUET```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org