[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore
[ https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=454283&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454283 ] ASF GitHub Bot logged work on HIVE-22015: - Author: ASF GitHub Bot Created on: 03/Jul/20 05:47 Start Date: 03/Jul/20 05:47 Worklog Time Spent: 10m Work Description: adesh-rao commented on a change in pull request #1109: URL: https://github.com/apache/hive/pull/1109#discussion_r449387162 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ## @@ -543,10 +557,30 @@ static void prewarm(RawStore rawStore) { tableColStats = rawStore.getTableColumnStatistics(catName, dbName, tblName, colNames, CacheUtils.HIVE_ENGINE); Deadline.stopTimer(); } + Deadline.startTimer("getPrimaryKeys"); + primaryKeys = rawStore.getPrimaryKeys(catName, dbName, tblName); + Deadline.stopTimer(); + cacheObjects.setPrimaryKeys(primaryKeys); + + Deadline.startTimer("getForeignKeys"); + foreignKeys = rawStore.getForeignKeys(catName, null, null, dbName, tblName); Review comment: Then should we would need store foreign key mappings against parentDb and table for quick access (otherwise we will be scanning all the db/tables in cache)? And this also means we will be keeping two copies, one with parent table and another with foreign table. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 454283) Time Spent: 1h 20m (was: 1h 10m) > [CachedStore] Cache table constraints in CachedStore > > > Key: HIVE-22015 > URL: https://issues.apache.org/jira/browse/HIVE-22015 > Project: Hive > Issue Type: Sub-task >Reporter: Daniel Dai >Assignee: Adesh Kumar Rao >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently table constraints are not cached. Hive will pull all constraints > from tables involved in query, which results multiple db reads (including > get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort > to cache this is small as it's just another table component. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22634) Improperly SemanticException when filter is optimized to False on a partition table
[ https://issues.apache.org/jira/browse/HIVE-22634?focusedWorklogId=454276&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454276 ] ASF GitHub Bot logged work on HIVE-22634: - Author: ASF GitHub Bot Created on: 03/Jul/20 05:20 Start Date: 03/Jul/20 05:20 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #865: URL: https://github.com/apache/hive/pull/865#issuecomment-653355058 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 454276) Time Spent: 20m (was: 10m) > Improperly SemanticException when filter is optimized to False on a partition > table > --- > > Key: HIVE-22634 > URL: https://issues.apache.org/jira/browse/HIVE-22634 > Project: Hive > Issue Type: Improvement >Reporter: EdisonWang >Assignee: EdisonWang >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-22634.patch > > Time Spent: 20m > Remaining Estimate: 0h > > When filter is optimized to False on a partition table, it will throw > improperly SemanticException reporting that there is no partition predicate > found. > The step to reproduce is > {code:java} > set hive.strict.checks.no.partition.filter=true; > CREATE TABLE test(id int, name string)PARTITIONED BY (`date` string); > select * from test where `date` = '20191201' and 1<>1; > {code} > > The above sql will throw "Queries against partitioned tables without a > partition filter" exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
[ https://issues.apache.org/jira/browse/HIVE-23721?focusedWorklogId=454250&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454250 ] ASF GitHub Bot logged work on HIVE-23721: - Author: ASF GitHub Bot Created on: 03/Jul/20 02:51 Start Date: 03/Jul/20 02:51 Worklog Time Spent: 10m Work Description: butaozhang commented on pull request #1202: URL: https://github.com/apache/hive/pull/1202#issuecomment-653309334 Failed tests seem not be relate to this pr , and I can run successfully in my local env. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 454250) Time Spent: 20m (was: 10m) > MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL > --- > > Key: HIVE-23721 > URL: https://issues.apache.org/jira/browse/HIVE-23721 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0, 3.1.2 > Environment: Hadoop 3.1(1700+ nodes) > YARN 3.1 (with timelineserver enabled,https enabled) > Hive 3.1 (15 HS2 instance) > 6+ YARN Applications every day >Reporter: YulongZ >Assignee: zhangbutao >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23721.01.patch > > Time Spent: 20m > Remaining Estimate: 0h > > From Hive3.0,catalog added to hivemeta,many schema of metastore added column > “catName”,and index for table added column “catName”。 > In MetaStoreDirectSql.ensureDbInit() ,two queries below > “ > initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == > ''")); > initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName > == ''")); > ” > should use "catName == ''" instead of "dbName == ''",because “catName” is the > first index column。 > When data of metastore become large,for example, table of > MPartitionColumnStatistics have millions of lines。The > “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore > executed very slowly,and the query “show tables“ for hiveserver2 executed > very slowly too。 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23797) Throw exception when no metastore found in zookeeper
[ https://issues.apache.org/jira/browse/HIVE-23797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150676#comment-17150676 ] Zhihua Deng commented on HIVE-23797: [~ashutosh.bapat] [~anishek] could you please review this changes? thanks > Throw exception when no metastore found in zookeeper > - > > Key: HIVE-23797 > URL: https://issues.apache.org/jira/browse/HIVE-23797 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When enable service discovery for metastore, there is a chance that the > client may find no metastore uris available in zookeeper, such as during > metastores startup or the client wrongly configured the path. This results to > redundant retries and finally MetaException with "Unknown exception" message. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23764) Remove unnecessary getLastFlushLength when checking delete delta files
[ https://issues.apache.org/jira/browse/HIVE-23764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150614#comment-17150614 ] Rajesh Balamohan commented on HIVE-23764: - [~pvary] : We can get this fix committed and revise the other ticket later. > Remove unnecessary getLastFlushLength when checking delete delta files > -- > > Key: HIVE-23764 > URL: https://issues.apache.org/jira/browse/HIVE-23764 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry calls > OrcAcidUtils.getLastFlushLength for every delete delta file. > Even the comment says: > {code} > // NOTE: Calling last flush length below is more for > future-proofing when we have > // streaming deletes. But currently we don't support streaming > deletes, and this can > // be removed if this becomes a performance issue. > {code} > If we have a table with 5 updates (1 base + 5 delta + 5 delete_delta), then > for every base + delta dir we will check all of the delete_delta directories, > and check the getLastFlushLength method which will result in 6*5=30 > unnecessary NN/S3 calls. > We should remove the check as already proposed in the comment. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23797) Throw exception when no metastore found in zookeeper
[ https://issues.apache.org/jira/browse/HIVE-23797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng updated HIVE-23797: --- Summary: Throw exception when no metastore found in zookeeper (was: Throwing exception when no metastore found in zookeeper) > Throw exception when no metastore found in zookeeper > - > > Key: HIVE-23797 > URL: https://issues.apache.org/jira/browse/HIVE-23797 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When enable service discovery for metastore, there is a chance that the > client may find no metastore uris available in zookeeper, such as during > metastores startup or the client wrongly configured the path. This results to > redundant retries and finally MetaException with "Unknown exception" message. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23797) Throwing exception when no metastore found in zookeeper
[ https://issues.apache.org/jira/browse/HIVE-23797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng updated HIVE-23797: --- Summary: Throwing exception when no metastore found in zookeeper (was: Throwing exception when no metastore spec found in zookeeper) > Throwing exception when no metastore found in zookeeper > > > Key: HIVE-23797 > URL: https://issues.apache.org/jira/browse/HIVE-23797 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When enable service discovery for metastore, there is a chance that the > client may find no metastore uris available in zookeeper, such as during > metastores startup or the client wrongly configured the path. This results to > redundant retries and finally MetaException with "Unknown exception" message. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23665) Rewrite last_value to first_value to enable streaming results
[ https://issues.apache.org/jira/browse/HIVE-23665?focusedWorklogId=454094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454094 ] ASF GitHub Bot logged work on HIVE-23665: - Author: ASF GitHub Bot Created on: 02/Jul/20 17:16 Start Date: 02/Jul/20 17:16 Worklog Time Spent: 10m Work Description: jcamachor commented on a change in pull request #1177: URL: https://github.com/apache/hive/pull/1177#discussion_r449158491 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java ## @@ -2439,6 +2440,9 @@ private RelNode applyPostJoinOrderingTransform(RelNode basePlan, RelMetadataProv HiveWindowingFixRule.INSTANCE); } + generatePartialProgram(program, false, HepMatchOrder.DEPTH_FIRST, Review comment: Can you incorporate the rule into the block above? ``` if (profilesCBO.contains(ExtendedCBOProfile.WINDOWING_POSTPROCESSING)) { generatePartialProgram(program, false, HepMatchOrder.DEPTH_FIRST, HiveWindowingLastValueRewrite.INSTANCE, HiveWindowingFixRule.INSTANCE); } ``` ## File path: ql/src/test/results/clientpositive/llap/vector_ptf_part_simple.q.out ## @@ -314,46 +386,46 @@ POSTHOOK: type: QUERY POSTHOOK: Input: default@vector_ptf_part_simple_orc A masked pattern was here p_mfgr p_name p_retailprice rn r dr fv lv c cs -Manufacturer#2 almond aquamarine rose maroon antique 900.66 1 1 1 900.66 2031.98 8 8 -Manufacturer#2 almond aquamarine rose maroon antique 1698.66 2 1 1 900.66 2031.98 8 8 -Manufacturer#2 almond antique violet turquoise frosted 1800.7 3 1 1 900.66 2031.98 8 8 -Manufacturer#2 almond antique violet chocolate turquoise 1690.68 4 1 1 900.66 2031.98 8 8 -Manufacturer#2 almond antique violet turquoise frosted 1800.7 5 1 1 900.66 2031.98 8 8 -Manufacturer#2 almond antique violet turquoise frosted 1800.7 6 1 1 900.66 2031.98 8 8 -Manufacturer#2 almond aquamarine sandy cyan gainsboro 1000.6 7 1 1 900.66 2031.98 8 8 -Manufacturer#2 almond aquamarine midnight light salmon 2031.98 8 1 1 900.66 2031.98 8 8 Manufacturer#3 almond antique forest lavender goldenrod1190.27 1 1 1 1190.27 1190.27 7 8 -Manufacturer#3 almond antique chartreuse khaki white 99.68 2 1 1 1190.27 1190.27 7 8 -Manufacturer#3 almond antique forest lavender goldenrodNULL3 1 1 1190.27 1190.27 7 8 -Manufacturer#3 almond antique metallic orange dim 55.39 4 1 1 1190.27 1190.27 7 8 -Manufacturer#3 almond antique misty red olive 1922.98 5 1 1 1190.27 1190.27 7 8 -Manufacturer#3 almond antique forest lavender goldenrod590.27 6 1 1 1190.27 1190.27 7 8 -Manufacturer#3 almond antique olive coral navajo 1337.29 7 1 1 1190.27 1190.27 7 8 Manufacturer#3 almond antique forest lavender goldenrod1190.27 8 1 1 1190.27 1190.27 7 8 -Manufacturer#4 almond antique gainsboro frosted violet NULL1 1 1 NULL1290.35 4 6 -Manufacturer#4 almond aquamarine floral ivory bisque NULL2 1 1 NULL1290.35 4 6 -Manufacturer#4 almond antique violet mint lemon1375.42 3 1 1 NULL1290.35 4 6 -Manufacturer#4 almond aquamarine yellow dodger mint1844.92 4 1 1 NULL1290.35 4 6 -Manufacturer#4 almond aquamarine floral ivory bisque 1206.26 5 1 1 NULL1290.35 4 6 -Manufacturer#4 almond azure aquamarine papaya violet 1290.35 6 1 1 NULL1290.35 4 6 +Manufacturer#3 almond antique olive coral navajo 1337.29 7 1 1 1190.27 1190.27 7 8 +Manufacturer#3 almond antique forest lavender goldenrod590.27 6 1 1 1190.27 1190.27 7 8 +Manufacturer#3 almond antique misty red olive 1922.98 5 1 1 1190.27 1190.27 7 8 +Manufacturer#3 almond antique metallic orange dim 55.39 4 1 1 1190.27 1190.27 7 8 +Manufacturer#3 almond antique forest lavender goldenrodNULL3 1 1 1190.27 1190.27 7 8 +Manufacturer#3 almond antique chartreuse khaki white 99.68 2 1 1 1190.27 1190.27 7 8 +Manufacturer#1 almond aquamarine pink moccasin thistle 1632.66 1 1 1 1632.66 1632.66 11 12 +Manufacturer#1 almond antique chartreuse lavender yellow 1
[jira] [Work logged] (HIVE-23768) Metastore's update service wrongly strips partition column stats from the cache
[ https://issues.apache.org/jira/browse/HIVE-23768?focusedWorklogId=454089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454089 ] ASF GitHub Bot logged work on HIVE-23768: - Author: ASF GitHub Bot Created on: 02/Jul/20 17:07 Start Date: 02/Jul/20 17:07 Worklog Time Spent: 10m Work Description: jcamachor merged pull request #1186: URL: https://github.com/apache/hive/pull/1186 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 454089) Remaining Estimate: 0h Time Spent: 10m > Metastore's update service wrongly strips partition column stats from the > cache > --- > > Key: HIVE-23768 > URL: https://issues.apache.org/jira/browse/HIVE-23768 > Project: Hive > Issue Type: Bug >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > Metastore's update service wrongly strips partition column stats from the > cache in an attempt to update them. The issue may go unnoticed since missing > stats do not lead to query failures. > However, they can alter significantly the query plan affecting performance. > Moreover, they lead to flakiness since some times the stats are present and > sometimes are not leading to a query that has a different plan overtime. > Normally missing elements from the cache shouldn't be a correctness problem > since we can always fallback to the raw stats. Unfortunately, there are many > interconnections with other parts of the code (e.g., code to obtain aggregate > statistics) where this contract breaks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23768) Metastore's update service wrongly strips partition column stats from the cache
[ https://issues.apache.org/jira/browse/HIVE-23768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150457#comment-17150457 ] Jesus Camacho Rodriguez commented on HIVE-23768: Pushed to master, thanks [~zabetak]! > Metastore's update service wrongly strips partition column stats from the > cache > --- > > Key: HIVE-23768 > URL: https://issues.apache.org/jira/browse/HIVE-23768 > Project: Hive > Issue Type: Bug >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Metastore's update service wrongly strips partition column stats from the > cache in an attempt to update them. The issue may go unnoticed since missing > stats do not lead to query failures. > However, they can alter significantly the query plan affecting performance. > Moreover, they lead to flakiness since some times the stats are present and > sometimes are not leading to a query that has a different plan overtime. > Normally missing elements from the cache shouldn't be a correctness problem > since we can always fallback to the raw stats. Unfortunately, there are many > interconnections with other parts of the code (e.g., code to obtain aggregate > statistics) where this contract breaks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23768) Metastore's update service wrongly strips partition column stats from the cache
[ https://issues.apache.org/jira/browse/HIVE-23768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23768: -- Labels: pull-request-available (was: ) > Metastore's update service wrongly strips partition column stats from the > cache > --- > > Key: HIVE-23768 > URL: https://issues.apache.org/jira/browse/HIVE-23768 > Project: Hive > Issue Type: Bug >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Metastore's update service wrongly strips partition column stats from the > cache in an attempt to update them. The issue may go unnoticed since missing > stats do not lead to query failures. > However, they can alter significantly the query plan affecting performance. > Moreover, they lead to flakiness since some times the stats are present and > sometimes are not leading to a query that has a different plan overtime. > Normally missing elements from the cache shouldn't be a correctness problem > since we can always fallback to the raw stats. Unfortunately, there are many > interconnections with other parts of the code (e.g., code to obtain aggregate > statistics) where this contract breaks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23768) Metastore's update service wrongly strips partition column stats from the cache
[ https://issues.apache.org/jira/browse/HIVE-23768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez resolved HIVE-23768. Fix Version/s: 4.0.0 Resolution: Fixed > Metastore's update service wrongly strips partition column stats from the > cache > --- > > Key: HIVE-23768 > URL: https://issues.apache.org/jira/browse/HIVE-23768 > Project: Hive > Issue Type: Bug >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Metastore's update service wrongly strips partition column stats from the > cache in an attempt to update them. The issue may go unnoticed since missing > stats do not lead to query failures. > However, they can alter significantly the query plan affecting performance. > Moreover, they lead to flakiness since some times the stats are present and > sometimes are not leading to a query that has a different plan overtime. > Normally missing elements from the cache shouldn't be a correctness problem > since we can always fallback to the raw stats. Unfortunately, there are many > interconnections with other parts of the code (e.g., code to obtain aggregate > statistics) where this contract breaks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23768) Metastore's update service wrongly strips partition column stats from the cache
[ https://issues.apache.org/jira/browse/HIVE-23768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150454#comment-17150454 ] Jesus Camacho Rodriguez commented on HIVE-23768: +1 > Metastore's update service wrongly strips partition column stats from the > cache > --- > > Key: HIVE-23768 > URL: https://issues.apache.org/jira/browse/HIVE-23768 > Project: Hive > Issue Type: Bug >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Critical > > Metastore's update service wrongly strips partition column stats from the > cache in an attempt to update them. The issue may go unnoticed since missing > stats do not lead to query failures. > However, they can alter significantly the query plan affecting performance. > Moreover, they lead to flakiness since some times the stats are present and > sometimes are not leading to a query that has a different plan overtime. > Normally missing elements from the cache shouldn't be a correctness problem > since we can always fallback to the raw stats. Unfortunately, there are many > interconnections with other parts of the code (e.g., code to obtain aggregate > statistics) where this contract breaks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore
[ https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=453958&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453958 ] ASF GitHub Bot logged work on HIVE-22015: - Author: ASF GitHub Bot Created on: 02/Jul/20 13:41 Start Date: 02/Jul/20 13:41 Worklog Time Spent: 10m Work Description: sankarh commented on a change in pull request #1109: URL: https://github.com/apache/hive/pull/1109#discussion_r447415363 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ## @@ -2497,26 +2610,87 @@ long getPartsFound() { @Override public List getPrimaryKeys(String catName, String dbName, String tblName) throws MetaException { -// TODO constraintCache -return rawStore.getPrimaryKeys(catName, dbName, tblName); +catName = normalizeIdentifier(catName); +dbName = StringUtils.normalizeIdentifier(dbName); +tblName = StringUtils.normalizeIdentifier(tblName); +if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && rawStore.isActiveTransaction())) { + return rawStore.getPrimaryKeys(catName, dbName, tblName); +} + +Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName); +if (tbl == null) { + // The table containing the primary keys is not yet loaded in cache + return rawStore.getPrimaryKeys(catName, dbName, tblName); +} +List keys = sharedCache.listCachedPrimaryKeys(catName, dbName, tblName); + +return keys; } @Override public List getForeignKeys(String catName, String parentDbName, String parentTblName, String foreignDbName, String foreignTblName) throws MetaException { -// TODO constraintCache -return rawStore.getForeignKeys(catName, parentDbName, parentTblName, foreignDbName, foreignTblName); + // Get correct ForeignDBName and TableName +if (foreignDbName == null || foreignTblName == null) { + return rawStore.getForeignKeys(catName, parentDbName, parentTblName, foreignDbName, foreignTblName); Review comment: This flow is a candidate for improvement as it tries to fetch all foreignkeys of give parent table and vice-versa which is frequent operations. Pls create a follow-up JIRA to use CachedStore for this case too. ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ## @@ -2497,26 +2610,87 @@ long getPartsFound() { @Override public List getPrimaryKeys(String catName, String dbName, String tblName) throws MetaException { -// TODO constraintCache -return rawStore.getPrimaryKeys(catName, dbName, tblName); +catName = normalizeIdentifier(catName); +dbName = StringUtils.normalizeIdentifier(dbName); +tblName = StringUtils.normalizeIdentifier(tblName); +if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && rawStore.isActiveTransaction())) { + return rawStore.getPrimaryKeys(catName, dbName, tblName); +} + +Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName); +if (tbl == null) { + // The table containing the primary keys is not yet loaded in cache + return rawStore.getPrimaryKeys(catName, dbName, tblName); +} +List keys = sharedCache.listCachedPrimaryKeys(catName, dbName, tblName); + +return keys; } @Override public List getForeignKeys(String catName, String parentDbName, String parentTblName, String foreignDbName, String foreignTblName) throws MetaException { -// TODO constraintCache -return rawStore.getForeignKeys(catName, parentDbName, parentTblName, foreignDbName, foreignTblName); + // Get correct ForeignDBName and TableName +if (foreignDbName == null || foreignTblName == null) { Review comment: We should take the same path if parentDbName or parentTblName is null. ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ## @@ -867,6 +909,77 @@ private void updateTableColStats(RawStore rawStore, String catName, String dbNam } } +private void updateTableForeignKeys(RawStore rawStore, String catName, String dbName, String tblName) { + LOG.debug("CachedStore: updating cached foreign keys objects for catalog: {}, database: {}, table: {}", catName, + dbName, tblName); + try { +Deadline.startTimer("getForeignKeys"); +List fks = rawStore.getForeignKeys(catName, null, null, dbName, tblName); +Deadline.stopTimer(); + sharedCache.refreshForeignKeysInCache(StringUtils.normalizeIdentifier(catName), +StringUtils.normalizeIdentifier(dbName), StringUtils.normalizeIdentifier(tblName), fks); +LOG.debug("CachedStore: updated cached foreign keys objects for catalo
[jira] [Work logged] (HIVE-22674) Replace Base64 in serde Package
[ https://issues.apache.org/jira/browse/HIVE-22674?focusedWorklogId=453934&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453934 ] ASF GitHub Bot logged work on HIVE-22674: - Author: ASF GitHub Bot Created on: 02/Jul/20 12:46 Start Date: 02/Jul/20 12:46 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1203: URL: https://github.com/apache/hive/pull/1203 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453934) Remaining Estimate: 0h Time Spent: 10m > Replace Base64 in serde Package > --- > > Key: HIVE-22674 > URL: https://issues.apache.org/jira/browse/HIVE-22674 > Project: Hive > Issue Type: Sub-task >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HIVE-22674.1.patch, HIVE-22674.2.patch, > HIVE-22674.2.patch, HIVE-22674.2.patch, HIVE-22674.2.patch, HIVE-22674.2.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22674) Replace Base64 in serde Package
[ https://issues.apache.org/jira/browse/HIVE-22674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-22674: -- Labels: pull-request-available (was: ) > Replace Base64 in serde Package > --- > > Key: HIVE-22674 > URL: https://issues.apache.org/jira/browse/HIVE-22674 > Project: Hive > Issue Type: Sub-task >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-22674.1.patch, HIVE-22674.2.patch, > HIVE-22674.2.patch, HIVE-22674.2.patch, HIVE-22674.2.patch, HIVE-22674.2.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23797) Throwing exception when no metastore spec found in zookeeper
[ https://issues.apache.org/jira/browse/HIVE-23797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng updated HIVE-23797: --- Issue Type: Improvement (was: Bug) > Throwing exception when no metastore spec found in zookeeper > > > Key: HIVE-23797 > URL: https://issues.apache.org/jira/browse/HIVE-23797 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When enable service discovery for metastore, there is a chance that the > client may find no metastore uris available in zookeeper, such as during > metastores startup or the client wrongly configured the path. This results to > redundant retries and finally MetaException with "Unknown exception" message. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22676) Replace Base64 in hive-service Package
[ https://issues.apache.org/jira/browse/HIVE-22676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HIVE-22676: -- Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to master. Thanks [~pvary] and [~ngangam] (privately) for the review!! > Replace Base64 in hive-service Package > -- > > Key: HIVE-22676 > URL: https://issues.apache.org/jira/browse/HIVE-22676 > Project: Hive > Issue Type: Sub-task >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-22676.1.patch, HIVE-22676.2.patch > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22676) Replace Base64 in hive-service Package
[ https://issues.apache.org/jira/browse/HIVE-22676?focusedWorklogId=453921&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453921 ] ASF GitHub Bot logged work on HIVE-22676: - Author: ASF GitHub Bot Created on: 02/Jul/20 12:32 Start Date: 02/Jul/20 12:32 Worklog Time Spent: 10m Work Description: belugabehr merged pull request #1090: URL: https://github.com/apache/hive/pull/1090 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453921) Time Spent: 40m (was: 0.5h) > Replace Base64 in hive-service Package > -- > > Key: HIVE-22676 > URL: https://issues.apache.org/jira/browse/HIVE-22676 > Project: Hive > Issue Type: Sub-task >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Attachments: HIVE-22676.1.patch, HIVE-22676.2.patch > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cleanup
[ https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=453836&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453836 ] ASF GitHub Bot logged work on HIVE-23727: - Author: ASF GitHub Bot Created on: 02/Jul/20 08:25 Start Date: 02/Jul/20 08:25 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #1149: URL: https://github.com/apache/hive/pull/1149 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453836) Time Spent: 1h 20m (was: 1h 10m) > Improve SQLOperation log handling when cleanup > -- > > Key: HIVE-23727 > URL: https://issues.apache.org/jira/browse/HIVE-23727 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > The SQLOperation checks _if (shouldRunAsync() && state != > OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the > background task. If true, the state should not be OperationState.CANCELED, so > logging under the state == OperationState.CANCELED should never happen. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cleanup
[ https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=453835&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453835 ] ASF GitHub Bot logged work on HIVE-23727: - Author: ASF GitHub Bot Created on: 02/Jul/20 08:25 Start Date: 02/Jul/20 08:25 Worklog Time Spent: 10m Work Description: dengzhhu653 closed pull request #1149: URL: https://github.com/apache/hive/pull/1149 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453835) Time Spent: 1h 10m (was: 1h) > Improve SQLOperation log handling when cleanup > -- > > Key: HIVE-23727 > URL: https://issues.apache.org/jira/browse/HIVE-23727 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > The SQLOperation checks _if (shouldRunAsync() && state != > OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the > background task. If true, the state should not be OperationState.CANCELED, so > logging under the state == OperationState.CANCELED should never happen. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
[ https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-23721: -- Affects Version/s: 4.0.0 > MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL > --- > > Key: HIVE-23721 > URL: https://issues.apache.org/jira/browse/HIVE-23721 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0, 3.1.2 > Environment: Hadoop 3.1(1700+ nodes) > YARN 3.1 (with timelineserver enabled,https enabled) > Hive 3.1 (15 HS2 instance) > 6+ YARN Applications every day >Reporter: YulongZ >Assignee: zhangbutao >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23721.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > From Hive3.0,catalog added to hivemeta,many schema of metastore added column > “catName”,and index for table added column “catName”。 > In MetaStoreDirectSql.ensureDbInit() ,two queries below > “ > initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == > ''")); > initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName > == ''")); > ” > should use "catName == ''" instead of "dbName == ''",because “catName” is the > first index column。 > When data of metastore become large,for example, table of > MPartitionColumnStatistics have millions of lines。The > “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore > executed very slowly,and the query “show tables“ for hiveserver2 executed > very slowly too。 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
[ https://issues.apache.org/jira/browse/HIVE-23721?focusedWorklogId=453816&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453816 ] ASF GitHub Bot logged work on HIVE-23721: - Author: ASF GitHub Bot Created on: 02/Jul/20 07:47 Start Date: 02/Jul/20 07:47 Worklog Time Spent: 10m Work Description: butaozhang opened a new pull request #1202: URL: https://github.com/apache/hive/pull/1202 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 453816) Remaining Estimate: 0h Time Spent: 10m > MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL > --- > > Key: HIVE-23721 > URL: https://issues.apache.org/jira/browse/HIVE-23721 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 > Environment: Hadoop 3.1(1700+ nodes) > YARN 3.1 (with timelineserver enabled,https enabled) > Hive 3.1 (15 HS2 instance) > 6+ YARN Applications every day >Reporter: YulongZ >Assignee: zhangbutao >Priority: Critical > Fix For: 4.0.0 > > Attachments: HIVE-23721.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > From Hive3.0,catalog added to hivemeta,many schema of metastore added column > “catName”,and index for table added column “catName”。 > In MetaStoreDirectSql.ensureDbInit() ,two queries below > “ > initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == > ''")); > initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName > == ''")); > ” > should use "catName == ''" instead of "dbName == ''",because “catName” is the > first index column。 > When data of metastore become large,for example, table of > MPartitionColumnStatistics have millions of lines。The > “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore > executed very slowly,and the query “show tables“ for hiveserver2 executed > very slowly too。 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
[ https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23721: -- Labels: pull-request-available (was: ) > MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL > --- > > Key: HIVE-23721 > URL: https://issues.apache.org/jira/browse/HIVE-23721 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 > Environment: Hadoop 3.1(1700+ nodes) > YARN 3.1 (with timelineserver enabled,https enabled) > Hive 3.1 (15 HS2 instance) > 6+ YARN Applications every day >Reporter: YulongZ >Assignee: zhangbutao >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23721.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > From Hive3.0,catalog added to hivemeta,many schema of metastore added column > “catName”,and index for table added column “catName”。 > In MetaStoreDirectSql.ensureDbInit() ,two queries below > “ > initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == > ''")); > initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName > == ''")); > ” > should use "catName == ''" instead of "dbName == ''",because “catName” is the > first index column。 > When data of metastore become large,for example, table of > MPartitionColumnStatistics have millions of lines。The > “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore > executed very slowly,and the query “show tables“ for hiveserver2 executed > very slowly too。 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
[ https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149997#comment-17149997 ] zhangbutao edited comment on HIVE-23721 at 7/2/20, 7:12 AM: When set hive.in.test=true; MetaStoreDirectSql.ensureDbInit will check metadate before each sql request. [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L212] However, the two queries do not use the index correctly: [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L280] [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L281] This is because table TAB_COL_STATS and PART_COL_STATS have combined index: [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L742] [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L774] According to the leftmost matching principle of the combined index, we should use "catName == ''" instead of "dbName == ''",because “catName” is the first index column. was (Author: zhangbutao): We set hive.in.test=true; MetaStoreDirectSql.ensureDbInit will check metadate before each sql request. [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L212] However, the two queries do not use the index correctly: [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L280] [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L281] This is because table TAB_COL_STATS and PART_COL_STATS have combined index: [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L742] [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L774] According to the leftmost matching principle of the combined index, we should use "catName == ''" instead of "dbName == ''",because “catName” is the first index column. > MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL > --- > > Key: HIVE-23721 > URL: https://issues.apache.org/jira/browse/HIVE-23721 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 > Environment: Hadoop 3.1(1700+ nodes) > YARN 3.1 (with timelineserver enabled,https enabled) > Hive 3.1 (15 HS2 instance) > 6+ YARN Applications every day >Reporter: YulongZ >Assignee: zhangbutao >Priority: Critical > Fix For: 4.0.0 > > Attachments: HIVE-23721.01.patch > > > From Hive3.0,catalog added to hivemeta,many schema of metastore added column > “catName”,and index for table added column “catName”。 > In MetaStoreDirectSql.ensureDbInit() ,two queries below > “ > initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == > ''")); > initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName > == ''")); > ” > should use "catName == ''" instead of "dbName == ''",because “catName” is the > first index column。 > When data of metastore become large,for example, table of > MPartitionColumnStatistics have millions of lines。The > “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore > executed very slowly,and the query “show tables“ for hiveserver2 executed > very slowly too。 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
[ https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149997#comment-17149997 ] zhangbutao commented on HIVE-23721: --- We set hive.in.test=true; MetaStoreDirectSql.ensureDbInit will check metadate before each sql request. [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L212] However, the two queries do not use the index correctly: [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L280] [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L281] This is because table TAB_COL_STATS and PART_COL_STATS have combined index: [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L742] [https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L774] According to the leftmost matching principle of the combined index, we should use "catName == ''" instead of "dbName == ''",because “catName” is the first index column. > MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL > --- > > Key: HIVE-23721 > URL: https://issues.apache.org/jira/browse/HIVE-23721 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.2 > Environment: Hadoop 3.1(1700+ nodes) > YARN 3.1 (with timelineserver enabled,https enabled) > Hive 3.1 (15 HS2 instance) > 6+ YARN Applications every day >Reporter: YulongZ >Assignee: zhangbutao >Priority: Critical > Fix For: 4.0.0 > > Attachments: HIVE-23721.01.patch > > > From Hive3.0,catalog added to hivemeta,many schema of metastore added column > “catName”,and index for table added column “catName”。 > In MetaStoreDirectSql.ensureDbInit() ,two queries below > “ > initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == > ''")); > initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName > == ''")); > ” > should use "catName == ''" instead of "dbName == ''",because “catName” is the > first index column。 > When data of metastore become large,for example, table of > MPartitionColumnStatistics have millions of lines。The > “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore > executed very slowly,and the query “show tables“ for hiveserver2 executed > very slowly too。 -- This message was sent by Atlassian Jira (v8.3.4#803005)