[GitHub] spark pull request #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorr...

gatorsmile Wed, 07 Mar 2018 15:49:03 -0800

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/20763


    [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect result caused by the 
rule OptimizeMetadataOnlyQuery

    This PR is to backport https://github.com/apache/spark/pull/20684 and 
https://github.com/apache/spark/pull/20693 to Spark 2.3 branch
    
    ---
    
    ## What changes were proposed in this pull request?
    ```Scala
    val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e")
     Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5")
     .write.json(tablePath.getCanonicalPath)
     val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", 
"CoL3").distinct()
     df.show()
    ```
    
    It generates a wrong result.
    ```
    [c,e,a]
    ```
    
    We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect 
the attribute order in the original leaf node. This PR is to fix it.
    
    ## How was this patch tested?
    Added a test case

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark backport23523

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20763.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20763
    
----
commit b47f1d4243ec72eeab69ae619c35bbbd9f9f2e6d
Author: gatorsmile <gatorsmile@...>
Date:   2018-02-27T16:44:25Z

    [SPARK-23523][SQL] Fix the incorrect result caused by the rule 
OptimizeMetadataOnlyQuery
    
    ## What changes were proposed in this pull request?
    ```Scala
    val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e")
     Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5")
     .write.json(tablePath.getCanonicalPath)
     val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", 
"CoL3").distinct()
     df.show()
    ```
    
    It generates a wrong result.
    ```
    [c,e,a]
    ```
    
    We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect 
the attribute order in the original leaf node. This PR is to fix it.
    
    ## How was this patch tested?
    Added a test case
    
    Author: gatorsmile <[email protected]>
    
    Closes #20684 from gatorsmile/optimizeMetadataOnly.

commit c0ac5ef3a1f00eee44dd50be925f983be852fe96
Author: Xingbo Jiang <xingbo.jiang@...>
Date:   2018-02-28T20:16:26Z

    [SPARK-23523][SQL][FOLLOWUP] Minor refactor of OptimizeMetadataOnlyQuery
    
    ## What changes were proposed in this pull request?
    
    Inside `OptimizeMetadataOnlyQuery.getPartitionAttrs`, avoid using `zip` to 
generate attribute map.
    Also include other minor update of comments and format.
    
    ## How was this patch tested?
    
    Existing test cases.
    
    Author: Xingbo Jiang <[email protected]>
    
    Closes #20693 from jiangxb1987/SPARK-23523.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorr...

Reply via email to