rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-613177337
 
 
   > If we implement Hive with catalog v2 . . .
   
   Anyone can plug an implementation into the catalog API. Spark cannot dictate 
what those implementations do or what they support, it can only choose how the 
Spark catalogs behave.
   
   What if Spark decided that ROW FORMAT SERDE can only be used with 
SEQUENCEFILE and not TEXTFILE? That's a reasonable choice because sequence file 
is splittable and compressable, but it would be ridiculous for Spark to do that 
kind of validation that is so specific to a source. The idea to validate 
EXTERNAL is the same: it breaks the catalog abstraction and disallows a valid 
Hive configuration.
   
   If Spark chooses to enforce the LOCATION/EXTERNAL relationship, then there 
are correct ways to do it:
   1. Enforce this the Hive connector that ships with Spark, but pass external 
through the API
   2. Remove EXTERNAL from the parser and stop passing it through the API
   
   > How are we going to differentiate SERDEPROPERTIES, TBLPROPERTIES, and 
OPTIONS?
   
   * TBLPROPERTIES are passed as-is, with options added
   * OPTIONS are prefixed with `option.` so they can be recovered
   * SERDEPROPERTIES are prefixed with `option.` for the same reason. I used 
the same prefix because both OPTIONS and SERDEPROPERTIES are stored in Hive's 
serde properties
   
   We could be a lot more strict here. We could throw an error if both OPTIONS 
and SERDEPROPERTIES are set. We could also throw an error if keys in OPTIONS 
conflict with SERDEPROPERTIES, or if the `option.` prefix is used by 
TBLPROPERTIES. I don't think these rules are worth the trouble. As long as we 
document how these are passed, it is fine that they are synonyms. I think it is 
very unlikely that this would cause confusion, let alone a problem.
   
   > Can you post the link to your comment?
   
   https://github.com/apache/spark/pull/28026#issuecomment-608524608

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to