[GitHub] [spark] yujhe opened a new pull request #26807: [SPARK-30181][SQL] Only add string or integral type column to metastore partition filter

GitBox Mon, 09 Dec 2019 01:28:03 -0800

yujhe opened a new pull request #26807: [SPARK-30181][SQL] Only add string or 
integral type column to metastore partition filter
URL: https://github.com/apache/spark/pull/26807
 
 
   ### What changes were proposed in this pull request?
   
   The following SQL throws runtime exception since Spark will cast `dt` as 
string and add the comparison to partition filter. (the hive metastore filter: 
'dt >= "2019-12-01 00:00:00"')
   
   The reason is that Hive metastore client only support filtering on partition 
keys of string or integral type. 
   
https://github.com/apache/hive/blob/release-1.2.1/metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java#L433
   
   ```scala
   spark.sql("CREATE TABLE timestamp_part (value INT) PARTITIONED BY (dt 
TIMESTAMP)")
   val df = Seq(
       (1, java.sql.Timestamp.valueOf("2019-12-01 00:00:00"), 1), 
       (2, java.sql.Timestamp.valueOf("2019-12-01 01:00:00"), 1)
     ).toDF("id", "dt", "value")
   df.write.partitionBy("dt").mode("overwrite").saveAsTable("timestamp_part")
   
   spark.sql("select * from timestamp_part where dt >= '2019-12-01 
00:00:00'").explain(true)
   Caught Hive MetaException attempting to get partition metadata by filter 
from Hive. You can set the Spark configuration setting 
spark.sql.hive.manageFilesourcePartitions to false to work around this problem, 
however this will result in degraded performance. Please report a bug: 
https://issues.apache.org/jira/browse/SPARK
   java.lang.RuntimeException: Caught Hive MetaException attempting to get 
partition metadata by filter from Hive. You can set the Spark configuration 
setting spark.sql.hive.manageFilesourcePartitions to false to work around this 
problem, however this will result in degraded performance. Please report a bug: 
https://issues.apache.org/jira/browse/SPARK
     at 
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:774)
     at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:679)
     at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:677)
     at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:275)
     at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:213)
   ...
   Caused by: MetaException(message:Filtering is supported only on partition 
keys of type string)
     at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$FilterBuilder.setError(ExpressionTree.java:185)
     at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.getJdoFilterPushdownParam(ExpressionTree.java:440)
     at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilterOverPartitions(ExpressionTree.java:357)
   ```
   
   This PR avoid adding partition keys to filter which its data type is not 
equals string or integral.
   
   ### Does this PR introduce any user-facing change?
   Yes, the above SQL will go to get all partitions from metastore without 
runtime exception.
   
   ### How was this patch tested?
   New test added.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] yujhe opened a new pull request #26807: [SPARK-30181][SQL] Only add string or integral type column to metastore partition filter

Reply via email to