Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22597#discussion_r225397862
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala
---
@@ -383,4 +385,17 @@ class OrcFilterSuite extends OrcTest with
SharedSQLContext {
)).get.toString
}
}
+
+ test("SPARK-25579 ORC PPD should support column names with dot") {
+ import testImplicits._
+
+ withSQLConf(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key -> "true") {
+ withTempDir { dir =>
+ val path = new File(dir, "orc").getCanonicalPath
+ Seq((1, 2), (3, 4)).toDF("col.dot.1", "col.dot.2").write.orc(path)
+ val df = spark.read.orc(path).where("`col.dot.1` = 1 and
`col.dot.2` = 2")
+ checkAnswer(stripSparkFilter(df), Row(1, 2))
--- End diff --
Thank you for review, @dbtsai ! I ignored PPDs with nested columns here
because Spark doesn't pushdown in Spark 2.4 and until now. With your PR
(#22573), Spark 3.0 will support that and we can update this to handle that
cases, too.
@cloud-fan . Actually, ORC 1.5.0 starts to support PPD with nested columns
[ORC-323](https://issues.apache.org/jira/browse/ORC-323). So, @dbtsai and I
discussed about supporting that before. We are going to support ORC PPDs with
nested columns in Spark 3.0 without regression.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]