[ 
https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149997#comment-17149997
 ] 

zhangbutao edited comment on HIVE-23721 at 7/2/20, 7:12 AM:
------------------------------------------------------------

When set hive.in.test=true;  MetaStoreDirectSql.ensureDbInit will check 
metadate  before each sql request.

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L212]

 

However, the two queries do not use the index correctly:

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L280]

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L281]

 

This is because table TAB_COL_STATS and PART_COL_STATS have combined index:

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L742]

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L774]

 

According to the leftmost matching principle of the combined index, we should  
use "catName == ''" instead of "dbName == ''",because “catName” is the first 
index column. 


was (Author: zhangbutao):
We set hive.in.test=true;  MetaStoreDirectSql.ensureDbInit will check metadate  
before each sql request.

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L212]

 

However, the two queries do not use the index correctly:

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L280]

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L281]

 

This is because table TAB_COL_STATS and PART_COL_STATS have combined index:

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L742]

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L774]

 

According to the leftmost matching principle of the combined index, we should  
use "catName == ''" instead of "dbName == ''",because “catName” is the first 
index column. 

> MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
> -----------------------------------------------------------
>
>                 Key: HIVE-23721
>                 URL: https://issues.apache.org/jira/browse/HIVE-23721
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.2
>         Environment: Hadoop 3.1(1700+ nodes)
> YARN 3.1 (with timelineserver enabled,https enabled)
> Hive 3.1 (15 HS2 instance)
> 60000+ YARN Applications every day
>            Reporter: YulongZ
>            Assignee: zhangbutao
>            Priority: Critical
>             Fix For: 4.0.0
>
>         Attachments: HIVE-23721.01.patch
>
>
> From Hive3.0,catalog added to hivemeta,many schema of metastore added column 
> “catName”,and index for table added column “catName”。
> In MetaStoreDirectSql.ensureDbInit() ,two queries below
> “
>       initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == 
> ''"));
>       initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName 
> == ''"));
> ”
> should use "catName == ''" instead of "dbName == ''",because “catName” is the 
> first index column。
> When  data of metastore become large,for example, table of 
> MPartitionColumnStatistics have millions of lines。The 
> “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore 
> executed very slowly,and the query “show tables“ for hiveserver2 executed 
> very slowly too。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to