cxzl25 opened a new pull request #34431:
URL: https://github.com/apache/spark/pull/34431


   ### What changes were proposed in this pull request?
   Improve partition filtering speed and reduce metastore pressure.
   We can first pull all the partition names, filter by expressions, and then 
obtain detailed information about the corresponding partitions from the 
MetaStore Server.
   
   ### Why are the changes needed?
   When `convertFilters` cannot take effect, cannot filter the queried 
partitions in advance on the hive MetaStore Server. At this time, 
`getAllPartitionsOf` will get all partition details.
   
   When the Hive client cannot use the server filter, it will first obtain the 
values of all partitions, and then filter.
   
   When we have a table with a lot of partitions and there is no way to filter 
it on the MetaStore Server, we will get all the partition details and filter it 
on the client side. This is slow and puts a lot of pressure on the MetaStore 
Server.
   
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Add UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to