rdblue commented on issue #26750: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/26750#issuecomment-563012602 I find this approach a little awkward because it mixes really different use cases into the same API. One is where you have a metastore as the source of truth for schema and partitioning, and the other is where the implementation is the source of truth. This leads to strange requirements, like throwing an `IllegalArgumentException` to reject a schema or partitioning. That doesn't make much sense when the source of truth is the metastore. And, the API doesn't distinguish between these cases, so an implementation doesn't know whether the table is being created by a `DataFrameWriter` (and should reject partitioning that doesn't match) or if it is created from metastore information (and should use the partitioning from the metastore). That's why I liked the approach of moving the schema and partitioning inference outside of this API. That way, Spark is responsible for determining things like whether schemas "match" and can use more context to make a reasonable choice. Why abandon the other approach? I thought that we were making progress and that the primary blocker was trying to do too much to be reviewed in a single PR.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
