yujhe commented on issue #26807: [SPARK-30181][SQL] Only add string or integral type column to metastore partition filter URL: https://github.com/apache/spark/pull/26807#issuecomment-566386817 @cloud-fan I have tested in `master` and `branch-2.4` branch with hive-1.2. `master` branch pass the test, but `branch-2.4` didn't. I think the reason is partition key `dt` was not cast to string type in master branch as branch-2.4 did. We still need to handle the case that partition column may be cast to string or integral type. Here is the plan executed in master branch. ``` scala> spark.sql("select * from timestamp_part where dt >= '2019-12-01 00:00:00'").explain(true) == Parsed Logical Plan == 'Project [*] +- 'Filter ('dt >= 2019-12-01 00:00:00) +- 'UnresolvedRelation [timestamp_part] == Analyzed Logical Plan == id: int, value: int, dt: timestamp Project [id#19, value#20, dt#21] +- Filter (dt#21 >= cast(2019-12-01 00:00:00 as timestamp)) +- SubqueryAlias `default`.`timestamp_part` +- Relation[id#19,value#20,dt#21] parquet == Optimized Logical Plan == Filter (isnotnull(dt#21) AND (dt#21 >= 1575129600000000)) +- Relation[id#19,value#20,dt#21] parquet ``` Here's the exception throwed in branch-2.4 ``` scala> spark.sql("select * from timestamp_part where dt >= '2019-12-01 00:00:00'").explain(true) java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to fal se to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:772) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:681) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:679) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:277) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:215) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:214) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:260) ... Caused by: java.lang.reflect.InvocationTargetException: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:759) ... 94 more Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only on partition keys of type string at org.apache.hadoop.hive.metastore.parser.ExpressionTree$FilterBuilder.setError(ExpressionTree.java:185) at org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.getJdoFilterPushdownParam(ExpressionTree.java:440) at org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilterOverPartitions(ExpressionTree.java:357) ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
