cloud-fan opened a new pull request #25077: [SPARK-28301][SQL] fix the behavior 
of table name resolution with multi-catalog
URL: https://github.com/apache/spark/pull/25077
 
 
   ## What changes were proposed in this pull request?
   
   Now users can register multiple catalogs in Spark, and the table name 
resolution should be compatible with multi-catalog. The expected behavior is 
simple:
   * For DDL commands that can only deal with tables
       * If the table name has only one name part, then it's a table in the 
default catalog.
       * If the table name has more than one name part like `a.b.c`.
           * if `a` is a registered catalog, then it's a table `c` under 
namespace `b` in catalog `a`.
           * if `a` is not a registered catalog, then it's a table `c` under 
namespace `a.b` in the default catalog.
   * For SELECT/INSERT that can handle both tables and temp views, first check 
if the table name is a temp view or global temp view, otherwise the rule is the 
same as DDL commands.
   
   However, we need to change the expected behavior a little bit because the 
builtin hive catalog hasn't migrated to the new catalog API yet:
   1. If the default catalog config is set, pick it as the default catalog. 
Otherwise pick hive catalog as the default catalog.
   2. If the default catalog config is not set, and the table name has more 
than 2 name parts. We should fail with "no catalog specified for table"
   
   The current behavior of table name resolution is a little confusing:
   * For DDL commands that can only deal with tables
       * If the first part of the table name matches a registered catalog, then 
it's a table in that catalog. (expected)
       * Otherwise, if the table name has less than 3 parts, and the provider 
name is v1, go with the builtin Hive catalog. (This is not expected. By design 
different catalogs can interprete table provider name differently. We should go 
with the default catalog if the config is set, no matter what the table 
provider name is.)
   * For SELECT/INSERT that can handle both tables and temp views
       * If the first part of the table name does not match a registered 
catalog, and it has less than 3 parts, go with the builtin Hive catalog. (This 
is not expected as we need to respect the default catalog config.)
       *  If the first part of the table name does not match a registered 
catalog, and it has more than 2 parts, the query is unresolved. (This is not 
expected as we need to respect the default catalog config.)
   
   This PR fixes the behavior of the table name resolution.
   
   ## How was this patch tested?
   
   new test cases
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to