[GitHub] [spark] yujhe commented on issue #26807: [SPARK-30181][SQL] Only add string or integral type column to metastore partition filter

GitBox Mon, 16 Dec 2019 21:25:59 -0800

yujhe commented on issue #26807: [SPARK-30181][SQL] Only add string or integral 
type column to metastore partition filter
URL: https://github.com/apache/spark/pull/26807#issuecomment-566386817
 
 
   @cloud-fan I have tested in `master` and `branch-2.4` branch with hive-1.2.
   
   `master` branch pass the test, but `branch-2.4` didn't. 
   I think the reason is partition key `dt` was not cast to string type in 
master branch as branch-2.4 did.
   
   We still need to handle the case that partition column may be cast to string 
or integral type.
   
   Here is the plan executed in master branch.
   ```
   scala> spark.sql("select * from timestamp_part where dt >= '2019-12-01 
00:00:00'").explain(true)
   == Parsed Logical Plan ==
   'Project [*]
   +- 'Filter ('dt >= 2019-12-01 00:00:00)
      +- 'UnresolvedRelation [timestamp_part]
   
   == Analyzed Logical Plan ==
   id: int, value: int, dt: timestamp
   Project [id#19, value#20, dt#21]
   +- Filter (dt#21 >= cast(2019-12-01 00:00:00 as timestamp))
      +- SubqueryAlias `default`.`timestamp_part`
         +- Relation[id#19,value#20,dt#21] parquet
   
   == Optimized Logical Plan ==
   Filter (isnotnull(dt#21) AND (dt#21 >= 1575129600000000))
   +- Relation[id#19,value#20,dt#21] parquet
   ```
   
   Here's the exception throwed in branch-2.4
   ```
   scala> spark.sql("select * from timestamp_part where dt >= '2019-12-01 
00:00:00'").explain(true)                                                       
                                                     
   java.lang.RuntimeException: Caught Hive MetaException attempting to get 
partition metadata by filter from Hive. You can set the Spark configuration 
setting spark.sql.hive.manageFilesourcePartitions to fal
   se to work around this problem, however this will result in degraded 
performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK   
                                                       
     at 
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:772)
                                                                                
                                  
     at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:681)
                                                                                
       
     at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:679)
                                                                                
       
     at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:277)
                                                                                
               
     at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:215)
                                                                                
                                
     at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:214)
                                                                                
                                  
     at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:260)
   ...
   Caused by: java.lang.reflect.InvocationTargetException: 
org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only 
on partition keys of type string
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:498)
     at 
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:759)
     ... 94 more
   Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Filtering is 
supported only on partition keys of type string
     at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$FilterBuilder.setError(ExpressionTree.java:185)
     at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.getJdoFilterPushdownParam(ExpressionTree.java:440)
     at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilterOverPartitions(ExpressionTree.java:357)
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] yujhe commented on issue #26807: [SPARK-30181][SQL] Only add string or integral type column to metastore partition filter

Reply via email to