GitHub user wangyum opened a pull request:
https://github.com/apache/spark/pull/19831
[SPARK-22489][SQL] Wrong Hive table statistics may trigger OOM if enables
join reorder in CBO
## What changes were proposed in this pull request?
How to reproduce:
```basg
bin/spark-shell --conf spark.sql.cbo.enabled=true --conf
spark.sql.cbo.joinReorder.enabled=true
```
```scala
import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec
spark.sql("CREATE TABLE small (c1 bigint) TBLPROPERTIES ('numRows'='3',
'rawDataSize'='600','totalSize'='800')")
// Big table with wrong statistics, numRows=0
spark.sql("CREATE TABLE big (c1 bigint) TBLPROPERTIES ('numRows'='0',
'rawDataSize'='60000000000', 'totalSize'='8000000000000')")
val plan = spark.sql("select * from small t1 join big t2 on (t1.c1 =
t2.c1)").queryExecution.executedPlan
val buildSide =
plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
println(buildSide)
```
The result is `BuildRight`, but the right side is the big table.
For `big` table, `totalSize` or `rawDataSize` > 0, rowCount = 0. At least
one other is wrong here.
https://github.com/apache/spark/blob/ed7352f2191308965a1b2abb6cd075a90b7f7bb7/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L432-L434
This pr to ensure that the `totalSize` or `rawDataSize` > 0, rowCount also
must be > 0.
## How was this patch tested?
unit tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wangyum/spark SPARK-22626
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19831.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19831
----
commit ed7352f2191308965a1b2abb6cd075a90b7f7bb7
Author: Yuming Wang <[email protected]>
Date: 2017-11-28T08:56:16Z
if dataSize > 0, rowCount should bigger than 0.
commit b16f88ef971040e682fafe28f0ff06877814e3df
Author: Yuming Wang <[email protected]>
Date: 2017-11-28T10:33:46Z
add test
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]