Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22597#discussion_r225295336
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala
---
@@ -383,4 +385,17 @@ class OrcFilterSuite extends OrcTest with
SharedSQLContext {
)).get.toString
}
}
+
+ test("SPARK-25579 ORC PPD should support column names with dot") {
+ import testImplicits._
+
+ withSQLConf(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key -> "true") {
+ withTempDir { dir =>
+ val path = new File(dir, "orc").getCanonicalPath
+ Seq((1, 2), (3, 4)).toDF("col.dot.1", "col.dot.2").write.orc(path)
--- End diff --
We are using the default parallelism from `TestSparkSession` on two rows
and it generates [separate output
files](https://github.com/apache/spark/pull/22597#discussion_r225004937)
already.
If you are concerning some possibility of flakiness, we are able to
increase the number of rows to `10` and call `repartition(10)` and check
`assert(actual < 10)` as you did
[before](https://github.com/apache/spark/blob/5d726b865948f993911fd5b9730b25cfa94e16c7/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala#L1016-L1040).
Do you want that?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]