southernriver opened a new pull request #30708: URL: https://github.com/apache/spark/pull/30708
### What changes were proposed in this pull request? For the current version, partition pruning support is limited to the scene. Let's look at the implementation of the source code: https://github.com/apache/spark/blob/031c5ef280e0cba8c4718a6457a44b6cccb17f46/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala#L840 Hive getPartitionsByFilter() takes a string that represents partition predicates like "str_key=\"value\" and int_key=1 ...", but for normal functions like concat/concat_ws/substr,it does not support. This PR supports `concat/concat_ws` to prune partitions, and this framework is extensible,I'll add more functions like `substring` and combinations of different functions. ### Why are the changes needed? The defect can cause a large number of partitions to be scanned which will increase the amount of data involved in the calculation and increase the pressure of service of metastore. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? manual ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
