cloud-fan commented on issue #26684: [WIP][SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables. URL: https://github.com/apache/spark/pull/26684#issuecomment-558959828 This seems like a hard problem. What we need is: 1. access hive metadata only once when resolving a table. 2. allow having catalog name in the table name for v1 tables. There are two things conflicting: 1. we want to make fewer changes to the v1 code path. we want to still get v1 table through `SessionCatalog.lookupRelation` 2. we want to know the table from session catalog is v1 or v2, through `V2SessionCatalog.loadTable` To do these 2 things together with one Hive metastore access, we have 3 options: 1. In `ResolveTables`, if we see a `V1Table`, we return a v1 relation instead of skipping it. This needs to refactor the view resolution, so that we don't need to resolve view and table recursively in one rule `ResolveRelations`. 2. In `ResolveRelations`, we look up table using v2 API `V2SessionCatalog.loadTable` 3. introduce a cache. This needs to be carefully designed, so that the cache only takes affect between `ResolveTables` and `ResolveRelations`. I think option 2 is the easiest to do at the current stage. cc @rdblue @brkyvz
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
