cloud-fan commented on issue #26214: [SPARK-29558][SQL] ResolveTables and 
ResolveRelations should be order-insensitive
URL: https://github.com/apache/spark/pull/26214#issuecomment-546353998
 
 
   Let me explain more about `ResolveRelations` to make sure we are on the same 
page. It resolves temp views, and it resolves view text (that's why it 
recursively calls the analyzer). Temp views are managed by Spark, view text can 
refer to v2 tables (e.g. `CREATE VIEW v AS SELECT * FROM myCatalog.tbl`), so I 
don't agree we can simply say `ResolveRelations` is a v1 rule. It's the rule to 
resolve relations, including temp view, view and table. I don't think this rule 
is badly designed. It only does one thing: resolve relations. And it will be 
very tricky if we have multiples rules to resolve relations: a lot of effort 
needs to be done to make sure the resolution order is corrected.
   
   `ResolveTables` is kind of a patch to `ResolveRelations`, to add the ability 
of multi-catalog support, and return v2 relation for v2 tables (including v2 
tables from the session catalog). And the patch must be run before 
`ResolveRelations`.
   
   The approach from @brkyvz does make things better. It enforces the order so 
that we won't mess it up by accident in the future. However, this is more like 
a workaround. It's hard for other people to understand the table lookup logic 
now:
   1. Begin `ResolveTables`. Check if it's a temp view. If it is, skip and wait 
for `ResolveRelations` to run
   2. Determine the catalog
   3. If the catalog is not session catalog, lookup table from it using v2 API 
and return v2 plan
   4. If the catalog is session catalog, lookup table from it using v2 API. If 
the table is a v1 table, skip and wait for `ResolveRelations` to run. 
Otherwise, return v2 plan.
   5. Begin `ResolveRelations`. Check if it's a temp view. If it is, return the 
temp view plan
   6. Lookup table from session catalog using v1 API, and return v1 plan.
   
   This PR simplifies it quite a bit:
   1. Begin `ResolveRelations`. Check if it's a temp view. If it is, return the 
temp view plan
   2. Determine the catalog
   3. If the catalog is not session catalog, lookup table from it using v2 API 
and return v2 plan
   4. If the catalog is session catalog, lookup table from it using v2 API. If 
the table is a v1 table, return v1 plan. Otherwise, return v2 plan.
   
   In `V2SessionCatalog`, the `loadTable` method simply calls the v1 catalog 
API to get the table and wrap it with `V1Table`, I don't think it's risky to 
load v1 table with v2 catalog API.
   
   I'm OK to take the approach from @brkyvz to fix the order dependent problem 
first. But We do need to clean up the table lookup logic a little bit. I'm glad 
to see more ideas about it.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to