Cheng Lian created SPARK-19887: ---------------------------------- Summary: __HIVE_DEFAULT_PARTITION__ not interpreted as NULL partition value in partitioned persisted tables Key: SPARK-19887 URL: https://issues.apache.org/jira/browse/SPARK-19887 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0 Reporter: Cheng Lian
The following Spark shell snippet under Spark 2.1 reproduces this issue: {code} val data = Seq( ("p1", 1, 1), ("p2", 2, 2), (null, 3, 3) ) // Correct case: Saving partitioned data to file system. val path = "/tmp/partitioned" data. toDF("a", "b", "c"). write. mode("overwrite"). partitionBy("a", "b"). parquet(path) spark.read.parquet(path).filter($"a".isNotNull).show(truncate = false) // +---+---+---+ // |c |a |b | // +---+---+---+ // |2 |p2 |2 | // |1 |p1 |1 | // +---+---+---+ // Incorrect case: Saving partitioned data as persisted table. data. toDF("a", "b", "c"). write. mode("overwrite"). partitionBy("a", "b"). saveAsTable("test_null") spark.table("test_null").filter($"a".isNotNull).show(truncate = false) // +---+--------------------------+---+ // |c |a |b | // +---+--------------------------+---+ // |3 |__HIVE_DEFAULT_PARTITION__|3 | <-- This line should not be here // |1 |p1 |1 | // |2 |p2 |2 | // +---+--------------------------+---+ {code} Hive-style partitioned table uses magic string {{"__HIVE_DEFAULT_PARTITION__"}} to indicate {{NULL}} partition values in partition directory names. However, in the case persisted partitioned table, this magic string is not interpreted as {{NULL}} but a regular string. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org