[GitHub] spark pull request #22597: [SPARK-25579][SQL] Use quoted attribute names if ...

dongjoon-hyun Mon, 15 Oct 2018 22:01:41 -0700

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22597#discussion_r225397862
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala
 ---
    @@ -383,4 +385,17 @@ class OrcFilterSuite extends OrcTest with 
SharedSQLContext {
           )).get.toString
         }
       }
    +
    +  test("SPARK-25579 ORC PPD should support column names with dot") {
    +    import testImplicits._
    +
    +    withSQLConf(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key -> "true") {
    +      withTempDir { dir =>
    +        val path = new File(dir, "orc").getCanonicalPath
    +        Seq((1, 2), (3, 4)).toDF("col.dot.1", "col.dot.2").write.orc(path)
    +        val df = spark.read.orc(path).where("`col.dot.1` = 1 and 
`col.dot.2` = 2")
    +        checkAnswer(stripSparkFilter(df), Row(1, 2))
    --- End diff --
    
    Thank you for review, @dbtsai ! I ignored PPDs with nested columns here 
because Spark doesn't pushdown in Spark 2.4 and until now. With your PR 
(#22573), Spark 3.0 will support that and we can update this to handle that 
cases, too.
    
    @cloud-fan . Actually, ORC 1.5.0 starts to support PPD with nested columns 
[ORC-323](https://issues.apache.org/jira/browse/ORC-323). So, @dbtsai and I 
discussed about supporting that before. We are going to support ORC PPDs with 
nested columns in Spark 3.0 without regression.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22597: [SPARK-25579][SQL] Use quoted attribute names if ...

Reply via email to