rdblue commented on issue #26750: [SPARK-28948][SQL] Support passing all Table 
metadata in TableProvider
URL: https://github.com/apache/spark/pull/26750#issuecomment-563012602
 
 
   I find this approach a little awkward because it mixes really different use 
cases into the same API. One is where you have a metastore as the source of 
truth for schema and partitioning, and the other is where the implementation is 
the source of truth.
   
   This leads to strange requirements, like throwing an 
`IllegalArgumentException` to reject a schema or partitioning. That doesn't 
make much sense when the source of truth is the metastore. And, the API doesn't 
distinguish between these cases, so an implementation doesn't know whether the 
table is being created by a `DataFrameWriter` (and should reject partitioning 
that doesn't match) or if it is created from metastore information (and should 
use the partitioning from the metastore).
   
   That's why I liked the approach of moving the schema and partitioning 
inference outside of this API. That way, Spark is responsible for determining 
things like whether schemas "match" and can use more context to make a 
reasonable choice.
   
   Why abandon the other approach? I thought that we were making progress and 
that the primary blocker was trying to do too much to be reviewed in a single 
PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to