cloud-fan commented on issue #26214: [SPARK-29558][SQL] ResolveTables and ResolveRelations should be order-insensitive URL: https://github.com/apache/spark/pull/26214#issuecomment-546353998 Let me explain more about `ResolveRelations` to make sure we are on the same page. It resolves temp views, and it resolves view text (that's why it recursively calls the analyzer). Temp views are managed by Spark, view text can refer to v2 tables (e.g. `CREATE VIEW v AS SELECT * FROM myCatalog.tbl`), so I don't agree we can simply say `ResolveRelations` is a v1 rule. It's the rule to resolve relations, including temp view, view and table. I don't think this rule is badly designed. It only does one thing: resolve relations. And it will be very tricky if we have multiples rules to resolve relations: a lot of effort needs to be done to make sure the resolution order is corrected. `ResolveTables` is kind of a patch to `ResolveRelations`, to add the ability of multi-catalog support, and return v2 relation for v2 tables (including v2 tables from the session catalog). And the patch must be run before `ResolveRelations`. The approach from @brkyvz does make things better. It enforces the order so that we won't mess it up by accident in the future. However, this is more like a workaround. It's hard for other people to understand the table lookup logic now: 1. Begin `ResolveTables`. Check if it's a temp view. If it is, skip and wait for `ResolveRelations` to run 2. Determine the catalog 3. If the catalog is not session catalog, lookup table from it using v2 API and return v2 plan 4. If the catalog is session catalog, lookup table from it using v2 API. If the table is a v1 table, skip and wait for `ResolveRelations` to run. Otherwise, return v2 plan. 5. Begin `ResolveRelations`. Check if it's a temp view. If it is, return the temp view plan 6. Lookup table from session catalog using v1 API, and return v1 plan. This PR simplifies it quite a bit: 1. Begin `ResolveRelations`. Check if it's a temp view. If it is, return the temp view plan 2. Determine the catalog 3. If the catalog is not session catalog, lookup table from it using v2 API and return v2 plan 4. If the catalog is session catalog, lookup table from it using v2 API. If the table is a v1 table, return v1 plan. Otherwise, return v2 plan. In `V2SessionCatalog`, the `loadTable` method simply calls the v1 catalog API to get the table and wrap it with `V1Table`, I don't think it's risky to load v1 table with v2 catalog API. I'm OK to take the approach from @brkyvz to fix the order dependent problem first. But We do need to clean up the table lookup logic a little bit. I'm glad to see more ideas about it.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
