Juliusz Sompolski created SPARK-27899:
-----------------------------------------

             Summary: Make HiveMetastoreClient.getTableObjectsByName available 
in ExternalCatalog/SessionCatalog API
                 Key: SPARK-27899
                 URL: https://issues.apache.org/jira/browse/SPARK-27899
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Juliusz Sompolski


The new Spark ThriftServer SparkGetTablesOperation implemented in 
https://github.com/apache/spark/pull/22794 does a catalog.getTableMetadata 
request for every table. This can get very slow for large schemas (~50ms per 
table with an external Hive metastore).
Hive ThriftServer GetTablesOperation uses 
HiveMetastoreClient.getTableObjectsByName to get table information in bulk, but 
we don't expose that through our APIs that go through Hive -> HiveClientImpl 
(HiveClient) -> HiveExternalCatalog (ExternalCatalog) -> SessionCatalog.

If we added and exposed getTableObjectsByName through our catalog APIs, we 
could resolve that performance problem in SparkGetTablesOperation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to