Hello Takanobu, We did not test with vanilla Hive 3.1.3 and Metastore databases can be different, so I don't know why Metastore responses are very slow. I can only share some results of testing CachedStore in Metastore. Please note that we did not use vanilla Hive 3.1.3 and instead used our own fork of Hive 3.1.3 (which applies many additional patches).
1. When CachedStore is enabled, column stats are not computed. As a result, some queries generate very inefficient plans because of wrong/inaccurate stats. Perhaps this is because not all patches for CachedStore have been merged to Hive 3.1.3. For example, these patches are not merged. Or, there might be some way to properly configure CachedStore so that it correctly computes column stats. HIVE-20896: CachedStore fail to cache stats in multiple code paths HIVE-21063: Support statistics in cachedStore for transactional table HIVE-24258: Data mismatch between CachedStore and ObjectStore for constraint So, we decided that CachedStore should not be enabled in Hive 3.1.3. (If anyone is running Hive Metastore 3.1.3 in production with CachedStore enabled, please let us know how you configure it.) 2. Setting metastore.stats.fetch.bitvector=true can also help generate more efficient query plans. --- Sungwoo On Wed, Feb 28, 2024 at 1:40 PM Takanobu Asanuma <tasan...@apache.org> wrote: > Hi Sungwoo Park, > > I'm sorry for the late reply to this old email. > We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and > noticed that the response of the Hive3 MetaStore is very slow. > We suspect that HIVE-14187 might be causing this slowness. > Could you tell me if you have resolved this problem? Are there still any > problems when you enable CachedStore? > > Regards, > - Takanobu > > 2018年6月13日(水) 0:37 Sungwoo Park <glap...@gmail.com>: > >> Hello Hive users, >> >> I am experience a problem with MetaStore in Hive 3.0. >> >> 1. Start MetaStore >> with >> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore. >> >> 2. Generate TPC-DS data. >> >> 3. TPC-DS queries run okay and produce correct results. E.g., from query >> 1: >> +-------------------+ >> | c_customer_id | >> +-------------------+ >> | AAAAAAAAAAAACHAA | >> | AAAAAAAAAAAADCAA | >> | AAAAAAAAAAAADDAA | >> ... >> | AAAAAAAAAAAILIAA | >> +-------------------+ >> 100 rows selected (69.901 seconds) >> >> However, the query compilation takes long ( >> https://issues.apache.org/jira/browse/HIVE-16520). >> >> 4. Now, restart MetaStore with >> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore. >> >> 5. TPC-DS queries run okay, but produce wrong results. E.g, from query 1: >> +----------------+ >> | c_customer_id | >> +----------------+ >> +----------------+ >> No rows selected (37.448 seconds) >> >> What I noticed is that with hive.metastore.rawstore.impl=CachedStore, >> HiveServer2 produces such log messages: >> >> 2018-06-12T23:50:04,223 WARN [b3041385-0290-492f-aef8-c0249de328ad >> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for >> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year >> 2018-06-12T23:50:04,223 INFO [b3041385-0290-492f-aef8-c0249de328ad >> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for >> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year >> 2018-06-12T23:50:04,225 WARN [b3041385-0290-492f-aef8-c0249de328ad >> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for >> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk >> 2018-06-12T23:50:04,225 INFO [b3041385-0290-492f-aef8-c0249de328ad >> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for >> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk >> 2018-06-12T23:50:04,226 WARN [b3041385-0290-492f-aef8-c0249de328ad >> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for >> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk, >> c_customer_id >> 2018-06-12T23:50:04,226 INFO [b3041385-0290-492f-aef8-c0249de328ad >> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for >> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk, >> c_customer_id >> >> 2018-06-12T23:50:05,158 ERROR [b3041385-0290-492f-aef8-c0249de328ad >> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory: >> Invalid column stats: No of nulls > cardinality >> 2018-06-12T23:50:05,159 ERROR [b3041385-0290-492f-aef8-c0249de328ad >> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory: >> Invalid column stats: No of nulls > cardinality >> 2018-06-12T23:50:05,160 ERROR [b3041385-0290-492f-aef8-c0249de328ad >> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory: >> Invalid column stats: No of nulls > cardinality >> >> However, even after computing column stats, queries still return wrong >> results, despite the fact that the above log messages disappear. >> >> I guess I am missing some configuration parameters (because I imported >> hive-site.xml from Hive 2). Any suggestion would be appreciated. >> >> Thanks a lot, >> >> --- Sungwoo Park >> >>