rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax URL: https://github.com/apache/spark/pull/28026#issuecomment-613177337 > If we implement Hive with catalog v2 . . . Anyone can plug an implementation into the catalog API. Spark cannot dictate what those implementations do or what they support, it can only choose how the Spark catalogs behave. What if Spark decided that ROW FORMAT SERDE can only be used with SEQUENCEFILE and not TEXTFILE? That's a reasonable choice because sequence file is splittable and compressable, but it would be ridiculous for Spark to do that kind of validation that is so specific to a source. The idea to validate EXTERNAL is the same: it breaks the catalog abstraction and disallows a valid Hive configuration. If Spark chooses to enforce the LOCATION/EXTERNAL relationship, then there are correct ways to do it: 1. Enforce this the Hive connector that ships with Spark, but pass external through the API 2. Remove EXTERNAL from the parser and stop passing it through the API > How are we going to differentiate SERDEPROPERTIES, TBLPROPERTIES, and OPTIONS? * TBLPROPERTIES are passed as-is, with options added * OPTIONS are prefixed with `option.` so they can be recovered * SERDEPROPERTIES are prefixed with `option.` for the same reason. I used the same prefix because both OPTIONS and SERDEPROPERTIES are stored in Hive's serde properties We could be a lot more strict here. We could throw an error if both OPTIONS and SERDEPROPERTIES are set. We could also throw an error if keys in OPTIONS conflict with SERDEPROPERTIES, or if the `option.` prefix is used by TBLPROPERTIES. I don't think these rules are worth the trouble. As long as we document how these are passed, it is fine that they are synonyms. I think it is very unlikely that this would cause confusion, let alone a problem. > Can you post the link to your comment? https://github.com/apache/spark/pull/28026#issuecomment-608524608
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
