[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

dongjoon-hyun Tue, 28 Aug 2018 10:13:24 -0700

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22198
  
    @maropu and @dilipbiswal and @gatorsmile .
    
    The complexity comes because this PR duplicates the existing name 
resolution logic. Although we may move `matchedTableIdentifier` to 
`SessionCatalog`, it seems that we had better clarify the purpose of this PR 
again.
    
    From your PR description,
    
    > Currently, spark ignores a database name in multi-part names;
    
    Originally, `ResolveBroadcastHints` is designed to be executed at the first 
batch before `ResolveRelation`. So, it only compare table names. `/*+ 
MAPJOIN(t) */` will broadcast `testDb.t` and `testDb2.t`. We will keep this 
behavior, won't we?
    
    For `/*+ MAPJOIN(testDb.t) */`,
    
    1. What we want at the beginning it only supporting matching `testDb.t` to 
`testDb.t` .
    2. After @dilipbiswal 's 
[comment](https://github.com/apache/spark/pull/22198#issuecomment-415950379), 
this PR aims to a real resolution by `/*+ MAPJOIN(testDb.t) */` to `t`. `t` 
depends on the session (`currentDatabase` and temporary views and global 
temporary views.)
    
    So, until now, we are moving forward to (2), but is (2) really required for 
SPARK-25121? If we choose (1), it will become simpler and consistent with the 
original design choice (matching based on unresolved strings).



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...

Reply via email to