rdblue commented on issue #25747: [SPARK-29039][SQL] centralize the catalog and table lookup logic URL: https://github.com/apache/spark/pull/25747#issuecomment-531427202 @cloud-fan, overall I like some parts of this PR quite a bit. It is really nice how the rules are collapsed in this PR. The problem is that I'm not sure that the trade-off is worth it. I don't think it is good for SQL statements to be responsible for converting themselves into DSv2 plans. We want to keep the statements that are parsed and the plan implementations separate. And it seems odd to move lots of rule logic into plan nodes themselves. I think that the right approach for keeping this logic in one place is to use extractors in rules. Recent changes have moved further away from that, but I think that's the cleanest way to keep this logic in one place. The reason we have changed this recently is the need to handle v1 tables. But I think we should take a different approach to support those. The only reason why catalog/identifier resolution and table resolution happen at the same time is to support fallback to v1. But we don't need fallback to v1 outside of the `DataSourceResolution` rules. Here's what I would do: 1. Use an extractor to resolve catalog and identifier that matches if and only if the catalog must be v2 -- that is, when the catalog is not `spark_catalog` (built-in session catalog). Rules in analyzer should use this extractor. Note that when v1 goes away, these rules don't need to change. 2. If the `spark_catalog` is responsible for an identifier, the above extractor doesn't match because v1 might be used, depending on the table. Then rule should be in `DataSourceResolution` and will load the table and check its provider or check the provider for create plans. With this approach, catalog/identifier resolution is separate from table resolution. Table resolution would happen in `ResolveTables`. Identifier resolution would happen in the extractor. Both are cleanly separated. What do you think?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
