[ https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149997#comment-17149997 ]
zhangbutao edited comment on HIVE-23721 at 7/2/20, 7:12 AM: ------------------------------------------------------------ When set hive.in.test=true; MetaStoreDirectSql.ensureDbInit will check metadate before each sql request. [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L212] However, the two queries do not use the index correctly: [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L280] [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L281] This is because table TAB_COL_STATS and PART_COL_STATS have combined index: [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L742] [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L774] According to the leftmost matching principle of the combined index, we should use "catName == ''" instead of "dbName == ''",because “catName” is the first index column. was (Author: zhangbutao): We set hive.in.test=true; MetaStoreDirectSql.ensureDbInit will check metadate before each sql request. [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L212] However, the two queries do not use the index correctly: [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L280] [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L281] This is because table TAB_COL_STATS and PART_COL_STATS have combined index: [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L742] [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L774] According to the leftmost matching principle of the combined index, we should use "catName == ''" instead of "dbName == ''",because “catName” is the first index column. > MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL > ----------------------------------------------------------- > > Key: HIVE-23721 > URL: https://issues.apache.org/jira/browse/HIVE-23721 > Project: Hive > Issue Type: Bug > Affects Versions: 3.1.2 > Environment: Hadoop 3.1(1700+ nodes) > YARN 3.1 (with timelineserver enabled,https enabled) > Hive 3.1 (15 HS2 instance) > 60000+ YARN Applications every day > Reporter: YulongZ > Assignee: zhangbutao > Priority: Critical > Fix For: 4.0.0 > > Attachments: HIVE-23721.01.patch > > > From Hive3.0,catalog added to hivemeta,many schema of metastore added column > “catName”,and index for table added column “catName”。 > In MetaStoreDirectSql.ensureDbInit() ,two queries below > “ > initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == > ''")); > initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName > == ''")); > ” > should use "catName == ''" instead of "dbName == ''",because “catName” is the > first index column。 > When data of metastore become large,for example, table of > MPartitionColumnStatistics have millions of lines。The > “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore > executed very slowly,and the query “show tables“ for hiveserver2 executed > very slowly too。 -- This message was sent by Atlassian Jira (v8.3.4#803005)