GitHub user gatorsmile opened a pull request:
https://github.com/apache/spark/pull/20763
[SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect result caused by the
rule OptimizeMetadataOnlyQuery
This PR is to backport https://github.com/apache/spark/pull/20684 and
https://github.com/apache/spark/pull/20693 to Spark 2.3 branch
---
## What changes were proposed in this pull request?
```Scala
val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e")
Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5")
.write.json(tablePath.getCanonicalPath)
val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5",
"CoL3").distinct()
df.show()
```
It generates a wrong result.
```
[c,e,a]
```
We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect
the attribute order in the original leaf node. This PR is to fix it.
## How was this patch tested?
Added a test case
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gatorsmile/spark backport23523
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20763.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20763
----
commit b47f1d4243ec72eeab69ae619c35bbbd9f9f2e6d
Author: gatorsmile <gatorsmile@...>
Date: 2018-02-27T16:44:25Z
[SPARK-23523][SQL] Fix the incorrect result caused by the rule
OptimizeMetadataOnlyQuery
## What changes were proposed in this pull request?
```Scala
val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e")
Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5")
.write.json(tablePath.getCanonicalPath)
val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5",
"CoL3").distinct()
df.show()
```
It generates a wrong result.
```
[c,e,a]
```
We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect
the attribute order in the original leaf node. This PR is to fix it.
## How was this patch tested?
Added a test case
Author: gatorsmile <[email protected]>
Closes #20684 from gatorsmile/optimizeMetadataOnly.
commit c0ac5ef3a1f00eee44dd50be925f983be852fe96
Author: Xingbo Jiang <xingbo.jiang@...>
Date: 2018-02-28T20:16:26Z
[SPARK-23523][SQL][FOLLOWUP] Minor refactor of OptimizeMetadataOnlyQuery
## What changes were proposed in this pull request?
Inside `OptimizeMetadataOnlyQuery.getPartitionAttrs`, avoid using `zip` to
generate attribute map.
Also include other minor update of comments and format.
## How was this patch tested?
Existing test cases.
Author: Xingbo Jiang <[email protected]>
Closes #20693 from jiangxb1987/SPARK-23523.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]