Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21623#discussion_r198551889
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
---
@@ -660,6 +661,56 @@ class ParquetFilterSuite extends QueryTest with
ParquetTest with SharedSQLContex
assert(df.where("col > 0").count() === 2)
}
}
+
+ test("filter pushdown - StringStartsWith") {
+ withParquetDataFrame((1 to 4).map(i => Tuple1(i + "str" + i))) {
implicit df =>
--- End diff --
I think that all of these tests go through the `keep` method instead of the
`canDrop` and `inverseCanDrop`. I think those methods need to be tested. You
can do that by constructing a Parquet file with row groups that have
predictable statistics, but that would be difficult. An easier way to do this
is to define the predicate class elsewhere and create a unit test for it that
passes in different statistics values.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]