[GitHub] spark pull request #19747: [Spark-22431][SQL] Ensure that the datatype in th...

skambha Wed, 15 Nov 2017 22:41:47 -0800

Github user skambha commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19747#discussion_r151331207
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
    @@ -507,6 +508,7 @@ private[hive] class HiveClientImpl(
         // these properties are still available to the others that share the 
same Hive metastore.
         // If users explicitly alter these Hive-specific properties through 
ALTER TABLE DDL, we respect
         // these user-specified values.
    +    verifyColumnDataType(table.dataSchema)
    --- End diff --
    
    Thanks @gatorsmile for the review.   I'll incorporate your other comments 
in my next commit.  
    
    In the current codeline, another recent PR changed verifyColumnNames to 
verifyDataSchema.  
    
    The reason I could not put the check in verifyDataSchema ( or the old 
verifyColumnNames):
    - verifyDataSchema is called in the beginning of the doCreateTable method. 
But we cannot error out that early as later on in the doCreateTable method, as 
later on in that method, we create the datasource table.  If the datasource 
table cannot be stored in hive compatible format, it falls back to storing it 
in Spark sql specific format which will work fine. 
    - For e.g  If I put the check there, then the create datasource table would 
throw an exception right away, which we do not want. 
    
    ```CREATE TABLE t(q STRUCT<`$a`:INT, col2:STRING>, i1 INT) USING PARQUET```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19747: [Spark-22431][SQL] Ensure that the datatype in th...

Reply via email to