GitHub user maropu opened a pull request:
https://github.com/apache/spark/pull/22198
[SPARK-25121][SQL] Supports multi-part table names for broadcast hint
resolution
## What changes were proposed in this pull request?
This pr fixed code to respect a database name for broadcast table hint
resolution.
Currently, spark ignores a database name in multi-part names;
```
scala> sql("CREATE DATABASE testDb")
scala> spark.range(10).write.saveAsTable("testDb.t")
// without this patch
scala> spark.range(10).join(spark.table("testDb.t"),
"id").hint("broadcast", "testDb.t").explain
== Physical Plan ==
*(2) Project [id#24L]
+- *(2) BroadcastHashJoin [id#24L], [id#26L], Inner, BuildLeft
:- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint,
false]))
: +- *(1) Range (0, 10, step=1, splits=4)
+- *(2) Project [id#26L]
+- *(2) Filter isnotnull(id#26L)
+- *(2) FileScan parquet testdb.t[id#26L] Batched: true, Format:
Parquet, Location:
InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-2.3.1-bin-hadoop2.7/spark-warehouse...,
PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema:
struct<id:bigint>
// with this patch
scala> spark.range(10).join(spark.table("testDb.t"),
"id").hint("broadcast", "testDb.t").explain
== Physical Plan ==
*(2) Project [id#3L]
+- *(2) BroadcastHashJoin [id#3L], [id#5L], Inner, BuildRight
:- *(2) Range (0, 10, step=1, splits=4)
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint,
true]))
+- *(1) Project [id#5L]
+- *(1) Filter isnotnull(id#5L)
+- *(1) FileScan parquet testdb.t[id#5L] Batched: true, Format:
Parquet, Location:
InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-master/spark-warehouse/testdb.db/t],
PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema:
struct<id:bigint>
```
## How was this patch tested?
Added tests in `DataFrameJoinSuite`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/maropu/spark SPARK-25121
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22198.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22198
----
commit d2be6920ba1cc052e9d5d8364cf48375cea8ba44
Author: Takeshi Yamamuro <yamamuro@...>
Date: 2018-08-23T07:20:51Z
Fix
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]