Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22198
@maropu and @dilipbiswal and @gatorsmile .
The complexity comes because this PR duplicates the existing name
resolution logic. Although we may move `matchedTableIdentifier` to
`SessionCatalog`, it seems that we had better clarify the purpose of this PR
again.
From your PR description,
> Currently, spark ignores a database name in multi-part names;
Originally, `ResolveBroadcastHints` is designed to be executed at the first
batch before `ResolveRelation`. So, it only compare table names. `/*+
MAPJOIN(t) */` will broadcast `testDb.t` and `testDb2.t`. We will keep this
behavior, won't we?
For `/*+ MAPJOIN(testDb.t) */`,
1. What we want at the beginning it only supporting matching `testDb.t` to
`testDb.t` .
2. After @dilipbiswal 's
[comment](https://github.com/apache/spark/pull/22198#issuecomment-415950379),
this PR aims to a real resolution by `/*+ MAPJOIN(testDb.t) */` to `t`. `t`
depends on the session (`currentDatabase` and temporary views and global
temporary views.)
So, until now, we are moving forward to (2), but is (2) really required for
SPARK-25121? If we choose (1), it will become simpler and consistent with the
original design choice (matching based on unresolved strings).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]