Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22036#discussion_r208795446
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -204,6 +204,24 @@ class StatisticsCollectionSuite extends
StatisticsCollectionTestBase with Shared
}
}
+ test("SPARK-25028: column stats collection for null partitioning
columns") {
+ val table = "analyze_partition_with_null"
+ withTempDir { dir =>
+ withTable(table) {
+ sql(s"""
+ |CREATE TABLE $table (name string, value string)
+ |USING PARQUET
+ |PARTITIONED BY (name)
+ |LOCATION '${dir.toURI}'""".stripMargin)
+ val df = Seq(("a", null), ("b", null)).toDF("value", "name")
--- End diff --
super nit: better to add a non-null partition value, e.g., `val df =
Seq(("a", null), ("b", null), ("c", "1")).toDF("value", "name")`? btw, why is
this a reverse column order (not "name", "value", but "value", "name")?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]