rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax 
(WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606299173
 
 
   > I don't know who marked comments as resolved
   
   That was me because I thought they were fairly straight-forward and 
definitive answers. We can still discuss them more since that wasn't the case.
   
   > If I skimmed the code correctly, this "requires" end users to add USING 
hive even they add Hive create table specific clause.
   
   That's not quite right.
   
   The parser now creates a `CreateTableStatement` that contains everything 
that was parsed from the SQL string, with minimal interpretation. That is what 
the user requested, assuming that request was well-formed -- meaning that it 
doesn't have duplicate clauses or mixed partition fields.
   
   The next step is to convert that statement to a plan. At the moment, this 
only converts to v2 plans because we no longer use v1 and this was ported from 
our internal version. The interpretation of that statement for v1 -- converting 
to v1 `CreateTable` -- has not yet been added. I plan to implement that like 
this:
   * If `provider` is present from USING, set the `CatalogTable` provider and 
validate that no `SerdeInfo` is set.
   * If `SerdeInfo` is present, validate that `provider` was not set and use a 
Hive `CatalogTable`.
   * If neither `SerdeInfo` nor `provider` is set, use 
`SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED` or similar setting to set 
a default provider or default serde properties.
   
   Speaking of well-formed, I think it might be reasonable to ensure that 
either a provider or serde info is present, but not both. That would raise an 
exception if both `USING` and `STORED AS` or `SERDE` clauses were used. I think 
that's probably reasonable. In that case, we could skip the checks above. What 
do you think?
   
   Coming back to the current implementation for the v2 interface: Spark is 
passing through everything that was parsed because Spark doesn't know what kind 
of table the catalog is going to create. The v2 catalog that we use can create 
Hive tables, Spark datasource tables, and Iceberg tables, so all of the 
information needs to be passed through -- again, with the requirement that the 
SQL was well-formed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to