yujhe opened a new pull request #26807: [SPARK-30181][SQL] Only add string or integral type column to metastore partition filter URL: https://github.com/apache/spark/pull/26807 ### What changes were proposed in this pull request? The following SQL throws runtime exception since Spark will cast `dt` as string and add the comparison to partition filter. (the hive metastore filter: 'dt >= "2019-12-01 00:00:00"') The reason is that Hive metastore client only support filtering on partition keys of string or integral type. https://github.com/apache/hive/blob/release-1.2.1/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java#L433 ```scala spark.sql("CREATE TABLE timestamp_part (value INT) PARTITIONED BY (dt TIMESTAMP)") val df = Seq( (1, java.sql.Timestamp.valueOf("2019-12-01 00:00:00"), 1), (2, java.sql.Timestamp.valueOf("2019-12-01 01:00:00"), 1) ).toDF("id", "dt", "value") df.write.partitionBy("dt").mode("overwrite").saveAsTable("timestamp_part") spark.sql("select * from timestamp_part where dt >= '2019-12-01 00:00:00'").explain(true) Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:774) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:679) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:677) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:275) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:213) ... Caused by: MetaException(message:Filtering is supported only on partition keys of type string) at org.apache.hadoop.hive.metastore.parser.ExpressionTree$FilterBuilder.setError(ExpressionTree.java:185) at org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.getJdoFilterPushdownParam(ExpressionTree.java:440) at org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilterOverPartitions(ExpressionTree.java:357) ``` This PR avoid adding partition keys to filter which its data type is not equals string or integral. ### Does this PR introduce any user-facing change? Yes, the above SQL will go to get all partitions from metastore without runtime exception. ### How was this patch tested? New test added.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
