[GitHub] spark issue #16233: [SPARK-18801][SQL] Add `View` operator to help resolve a...

nsyca Thu, 22 Dec 2016 12:19:44 -0800

Github user nsyca commented on the issue:

    https://github.com/apache/spark/pull/16233
  
    @hvanhovell: Would the issue of set/reset currentDB be resolved if we 
support a table/view as a 2-part name? The 2-part name, or in general, n-part 
name, is used in other databases, SQLServer, Oracle and DB2 call the database 
part, schema name. Hive seems to use the terms DATABASE and SCHEMA 
interchangeably.
    
    If we think of an object is fully qualified by a 2-part name, 
[database].[table]. Every time we resolve a name of a table or a view to an 
object in the catalog, the object is represented by a 2-part name. Of course, 
when we define a new table, the table will need to be recorded in the catalog 
as [database].[table].
    
    If a name "T1" is specified with its database part "DB1", the resolution 
searches an object in the catalog with the fully qualified name "DB1"."T1".
    
    If a name "T2" is not specified with its database part, the resolution gets 
the default database name from the current context, says "DEFAULT-DB", adds to 
the "T2" and searches an object of the name "DEFAULT-DB"."T2".
    
    A view is then recorded as a tuple of
    
    (view_database, view_name, view_definition, qualifier_database)
    
    Example:
    
    ````
    CREATE DATABASE DB1;
    CREATE VIEW MYDB.VIEW1 AS SELECT * FROM T1;
    ````
    The record in the catalog for this view is
    
    (view_database, view_name, view_definition, qualifier_database) =
    ("MYDB", "VIEW1", "SELECT * FROM T1", "DB1")
    
    When resolving the name T1 from the view definition, if T1 is not qualified 
by its database part (which is in this case), the name T1 is then augmented by 
the current database to "DB1"."T1" and the fully qualified name is used for a 
search in the catalog.
    
    With this mechanism, the way to create and resolve a nested view is 
demonstrated below:
    ````
    USE DATABASE MYDB;
    CREATE VIEW OTHERDB.VIEW2 AS SELECT * FROM VIEW1
    ````
    (view_database, view_name, view_definition, qualifier_database) =
    ("OTHERDB", "VIEW2", "SELECT * FROM VIEW1", "MYDB")
    
    To resolve the statement
    ````
    USE DATABASE OTHERDB;
    SELECT * FROM VIEW2;
    ````
    The first statement sets the following state.
    ````
    analysisContext.currentDB = "OTHERDB"
    ````
    We then build an unresolved LogicalPlan from the second statement.
    ````
    'Project [*]
    +- 'UnresolvedRelation `VIEW2`
    ````
    -> Augment VIEW2 with current database OTHERDB and search the catalog
    -> Expand the definition of the view OTHERDB.VIEW2
    ````
    'Project [*]
    +- 'Project [*]
       +- 'UnresolvedRelation `MYDB`.`VIEW1`
    ````
    -> VIEW1 is augmented with the qualifier_database "MYDB" from the view 
record in the catalog
    -> Expand the definition of the view MYDB.VIEW1
    ````
    'Project [*]
    +- 'Project [*]
       +- 'Project [*]
          +- 'UnresolvedRelation `DB1`.`T1`
    ````
    -> Expand the definition of the table DB1.T1
    ````
    Project [... ]
    +- Project [... ]
       +- Project [... ]
          +- SubqueryAlias `DB1`.`T1`
             +- Project [... ]
                +- ...
    ````
    
    From here it then goes back to the very first question: do we plan to 
support 2-part name or n-part name objects in Spark as in other databases?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #16233: [SPARK-18801][SQL] Add `View` operator to help resolve a...

Reply via email to