LantaoJin opened a new pull request #24774: [SPARK-27899][SQL] Make 
HiveMetastoreClient.getTableObjectsByName available in 
ExternalCatalog/SessionCatalog API
URL: https://github.com/apache/spark/pull/24774
 
 
   ## What changes were proposed in this pull request?
   
   The new Spark ThriftServer SparkGetTablesOperation implemented in 
https://github.com/apache/spark/pull/22794 does a catalog.getTableMetadata 
request for every table. This can get very slow for large schemas (~50ms per 
table with an external Hive metastore).
   Hive ThriftServer GetTablesOperation uses 
HiveMetastoreClient.getTableObjectsByName to get table information in bulk, but 
we don't expose that through our APIs that go through Hive -> HiveClientImpl 
(HiveClient) -> HiveExternalCatalog (ExternalCatalog) -> SessionCatalog.
   
   If we added and exposed getTableObjectsByName through our catalog APIs, we 
could resolve that performance problem in SparkGetTablesOperation.
   
   ## How was this patch tested?
   
   Add UT
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to