[GitHub] [spark] rdblue commented on issue #25747: [SPARK-29039][SQL] centralize the catalog and table lookup logic

GitBox Fri, 13 Sep 2019 17:13:03 -0700

rdblue commented on issue #25747: [SPARK-29039][SQL] centralize the catalog and 
table lookup logic
URL: https://github.com/apache/spark/pull/25747#issuecomment-531427202
 
 
   @cloud-fan, overall I like some parts of this PR quite a bit. It is really 
nice how the rules are collapsed in this PR. The problem is that I'm not sure 
that the trade-off is worth it. I don't think it is good for SQL statements to 
be responsible for converting themselves into DSv2 plans. We want to keep the 
statements that are parsed and the plan implementations separate. And it seems 
odd to move lots of rule logic into plan nodes themselves.
   
   I think that the right approach for keeping this logic in one place is to 
use extractors in rules. Recent changes have moved further away from that, but 
I think that's the cleanest way to keep this logic in one place.
   
   The reason we have changed this recently is the need to handle v1 tables. 
But I think we should take a different approach to support those. The only 
reason why catalog/identifier resolution and table resolution happen at the 
same time is to support fallback to v1. But we don't need fallback to v1 
outside of the `DataSourceResolution` rules.
   
   Here's what I would do:
   1. Use an extractor to resolve catalog and identifier that matches if and 
only if the catalog must be v2 -- that is, when the catalog is not 
`spark_catalog` (built-in session catalog). Rules in analyzer should use this 
extractor. Note that when v1 goes away, these rules don't need to change.
   2. If the `spark_catalog` is responsible for an identifier, the above 
extractor doesn't match because v1 might be used, depending on the table. Then 
rule should be in `DataSourceResolution` and will load the table and check its 
provider or check the provider for create plans.
   
   With this approach, catalog/identifier resolution is separate from table 
resolution. Table resolution would happen in `ResolveTables`. Identifier 
resolution would happen in the extractor. Both are cleanly separated.
   
   What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] rdblue commented on issue #25747: [SPARK-29039][SQL] centralize the catalog and table lookup logic

Reply via email to