[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=454283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454283
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 03/Jul/20 05:47
Start Date: 03/Jul/20 05:47
Worklog Time Spent: 10m 
  Work Description: adesh-rao commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r449387162



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -543,10 +557,30 @@ static void prewarm(RawStore rawStore) {
 tableColStats = rawStore.getTableColumnStatistics(catName, 
dbName, tblName, colNames, CacheUtils.HIVE_ENGINE);
 Deadline.stopTimer();
   }
+  Deadline.startTimer("getPrimaryKeys");
+  primaryKeys = rawStore.getPrimaryKeys(catName, dbName, tblName);
+  Deadline.stopTimer();
+  cacheObjects.setPrimaryKeys(primaryKeys);
+
+  Deadline.startTimer("getForeignKeys");
+  foreignKeys = rawStore.getForeignKeys(catName, null, null, 
dbName, tblName);

Review comment:
   Then should we would need store foreign key mappings against parentDb 
and table for quick access (otherwise we will be scanning all the db/tables in 
cache)? 
   
   And this also means we will be keeping two copies, one with parent table and 
another with foreign table.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 454283)
Time Spent: 1h 20m  (was: 1h 10m)

> [CachedStore] Cache table constraints in CachedStore
> 
>
> Key: HIVE-22015
> URL: https://issues.apache.org/jira/browse/HIVE-22015
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Daniel Dai
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently table constraints are not cached. Hive will pull all constraints 
> from tables involved in query, which results multiple db reads (including 
> get_primary_keys, get_foreign_keys, get_unique_constraints, etc). The effort 
> to cache this is small as it's just another table component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22634) Improperly SemanticException when filter is optimized to False on a partition table

2020-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22634?focusedWorklogId=454276=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454276
 ]

ASF GitHub Bot logged work on HIVE-22634:
-

Author: ASF GitHub Bot
Created on: 03/Jul/20 05:20
Start Date: 03/Jul/20 05:20
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #865:
URL: https://github.com/apache/hive/pull/865#issuecomment-653355058


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 454276)
Time Spent: 20m  (was: 10m)

> Improperly SemanticException when filter is optimized to False on a partition 
> table
> ---
>
> Key: HIVE-22634
> URL: https://issues.apache.org/jira/browse/HIVE-22634
> Project: Hive
>  Issue Type: Improvement
>Reporter: EdisonWang
>Assignee: EdisonWang
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-22634.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When filter is optimized to False on a partition table, it will throw 
> improperly SemanticException reporting that there is no partition predicate 
> found.
> The step to reproduce is
> {code:java}
> set hive.strict.checks.no.partition.filter=true;
> CREATE TABLE test(id int, name string)PARTITIONED BY (`date` string);
> select * from test where `date` = '20191201' and 1<>1;
> {code}
>  
> The above sql will throw "Queries against partitioned tables without a 
> partition filter"  exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL

2020-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23721?focusedWorklogId=454250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454250
 ]

ASF GitHub Bot logged work on HIVE-23721:
-

Author: ASF GitHub Bot
Created on: 03/Jul/20 02:51
Start Date: 03/Jul/20 02:51
Worklog Time Spent: 10m 
  Work Description: butaozhang commented on pull request #1202:
URL: https://github.com/apache/hive/pull/1202#issuecomment-653309334


   Failed tests seem not  be relate to this pr , and I can run  successfully in 
my local env.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 454250)
Time Spent: 20m  (was: 10m)

> MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
> ---
>
> Key: HIVE-23721
> URL: https://issues.apache.org/jira/browse/HIVE-23721
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 3.1.2
> Environment: Hadoop 3.1(1700+ nodes)
> YARN 3.1 (with timelineserver enabled,https enabled)
> Hive 3.1 (15 HS2 instance)
> 6+ YARN Applications every day
>Reporter: YulongZ
>Assignee: zhangbutao
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23721.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> From Hive3.0,catalog added to hivemeta,many schema of metastore added column 
> “catName”,and index for table added column “catName”。
> In MetaStoreDirectSql.ensureDbInit() ,two queries below
> “
>   initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == 
> ''"));
>   initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName 
> == ''"));
> ”
> should use "catName == ''" instead of "dbName == ''",because “catName” is the 
> first index column。
> When  data of metastore become large,for example, table of 
> MPartitionColumnStatistics have millions of lines。The 
> “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore 
> executed very slowly,and the query “show tables“ for hiveserver2 executed 
> very slowly too。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23797) Throw exception when no metastore found in zookeeper

2020-07-02 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150676#comment-17150676
 ] 

Zhihua Deng commented on HIVE-23797:


[~ashutosh.bapat]  [~anishek]  could you please review this changes? thanks

> Throw exception when no metastore  found in zookeeper
> -
>
> Key: HIVE-23797
> URL: https://issues.apache.org/jira/browse/HIVE-23797
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When enable service discovery for metastore, there is a chance that the 
> client may find no metastore uris available in zookeeper, such as during 
> metastores startup or the client wrongly configured the path. This results to 
> redundant retries and finally MetaException with "Unknown exception" message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23764) Remove unnecessary getLastFlushLength when checking delete delta files

2020-07-02 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150614#comment-17150614
 ] 

Rajesh Balamohan commented on HIVE-23764:
-

[~pvary] : We can get this fix committed and revise the other ticket later.

> Remove unnecessary getLastFlushLength when checking delete delta files
> --
>
> Key: HIVE-23764
> URL: https://issues.apache.org/jira/browse/HIVE-23764
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry calls 
> OrcAcidUtils.getLastFlushLength for every delete delta file.
> Even the comment says:
> {code}
>   // NOTE: Calling last flush length below is more for 
> future-proofing when we have
>   // streaming deletes. But currently we don't support streaming 
> deletes, and this can
>   // be removed if this becomes a performance issue.
> {code}
> If we have a table with 5 updates (1 base + 5 delta + 5 delete_delta), then 
> for every base + delta dir we will check all of the delete_delta directories, 
> and check the getLastFlushLength method which will result in 6*5=30 
> unnecessary NN/S3 calls.
> We should remove the check as already proposed in the comment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23797) Throw exception when no metastore found in zookeeper

2020-07-02 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-23797:
---
Summary: Throw exception when no metastore  found in zookeeper  (was: 
Throwing exception when no metastore  found in zookeeper)

> Throw exception when no metastore  found in zookeeper
> -
>
> Key: HIVE-23797
> URL: https://issues.apache.org/jira/browse/HIVE-23797
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When enable service discovery for metastore, there is a chance that the 
> client may find no metastore uris available in zookeeper, such as during 
> metastores startup or the client wrongly configured the path. This results to 
> redundant retries and finally MetaException with "Unknown exception" message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23797) Throwing exception when no metastore found in zookeeper

2020-07-02 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-23797:
---
Summary: Throwing exception when no metastore  found in zookeeper  (was: 
Throwing exception when no metastore spec found in zookeeper)

> Throwing exception when no metastore  found in zookeeper
> 
>
> Key: HIVE-23797
> URL: https://issues.apache.org/jira/browse/HIVE-23797
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When enable service discovery for metastore, there is a chance that the 
> client may find no metastore uris available in zookeeper, such as during 
> metastores startup or the client wrongly configured the path. This results to 
> redundant retries and finally MetaException with "Unknown exception" message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23665) Rewrite last_value to first_value to enable streaming results

2020-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23665?focusedWorklogId=454094=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454094
 ]

ASF GitHub Bot logged work on HIVE-23665:
-

Author: ASF GitHub Bot
Created on: 02/Jul/20 17:16
Start Date: 02/Jul/20 17:16
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1177:
URL: https://github.com/apache/hive/pull/1177#discussion_r449158491



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##
@@ -2439,6 +2440,9 @@ private RelNode applyPostJoinOrderingTransform(RelNode 
basePlan, RelMetadataProv
 HiveWindowingFixRule.INSTANCE);
   }
 
+  generatePartialProgram(program, false, HepMatchOrder.DEPTH_FIRST,

Review comment:
   Can you incorporate the rule into the block above?
   
   ```
 if (profilesCBO.contains(ExtendedCBOProfile.WINDOWING_POSTPROCESSING)) 
{
   generatePartialProgram(program, false, HepMatchOrder.DEPTH_FIRST,
   HiveWindowingLastValueRewrite.INSTANCE, 
HiveWindowingFixRule.INSTANCE);
 }
   ```

##
File path: ql/src/test/results/clientpositive/llap/vector_ptf_part_simple.q.out
##
@@ -314,46 +386,46 @@ POSTHOOK: type: QUERY
 POSTHOOK: Input: default@vector_ptf_part_simple_orc
  A masked pattern was here 
 p_mfgr p_name  p_retailprice   rn  r   dr  fv  lv  c   
cs
-Manufacturer#2 almond aquamarine rose maroon antique   900.66  1   1   
1   900.66  2031.98 8   8
-Manufacturer#2 almond aquamarine rose maroon antique   1698.66 2   1   
1   900.66  2031.98 8   8
-Manufacturer#2 almond antique violet turquoise frosted 1800.7  3   1   
1   900.66  2031.98 8   8
-Manufacturer#2 almond antique violet chocolate turquoise   1690.68 4   
1   1   900.66  2031.98 8   8
-Manufacturer#2 almond antique violet turquoise frosted 1800.7  5   1   
1   900.66  2031.98 8   8
-Manufacturer#2 almond antique violet turquoise frosted 1800.7  6   1   
1   900.66  2031.98 8   8
-Manufacturer#2 almond aquamarine sandy cyan gainsboro  1000.6  7   1   
1   900.66  2031.98 8   8
-Manufacturer#2 almond aquamarine midnight light salmon 2031.98 8   1   
1   900.66  2031.98 8   8
 Manufacturer#3 almond antique forest lavender goldenrod1190.27 1   
1   1   1190.27 1190.27 7   8
-Manufacturer#3 almond antique chartreuse khaki white   99.68   2   1   
1   1190.27 1190.27 7   8
-Manufacturer#3 almond antique forest lavender goldenrodNULL3   
1   1   1190.27 1190.27 7   8
-Manufacturer#3 almond antique metallic orange dim  55.39   4   1   
1   1190.27 1190.27 7   8
-Manufacturer#3 almond antique misty red olive  1922.98 5   1   1   
1190.27 1190.27 7   8
-Manufacturer#3 almond antique forest lavender goldenrod590.27  6   
1   1   1190.27 1190.27 7   8
-Manufacturer#3 almond antique olive coral navajo   1337.29 7   1   
1   1190.27 1190.27 7   8
 Manufacturer#3 almond antique forest lavender goldenrod1190.27 8   
1   1   1190.27 1190.27 7   8
-Manufacturer#4 almond antique gainsboro frosted violet NULL1   1   
1   NULL1290.35 4   6
-Manufacturer#4 almond aquamarine floral ivory bisque   NULL2   1   
1   NULL1290.35 4   6
-Manufacturer#4 almond antique violet mint lemon1375.42 3   1   
1   NULL1290.35 4   6
-Manufacturer#4 almond aquamarine yellow dodger mint1844.92 4   1   
1   NULL1290.35 4   6
-Manufacturer#4 almond aquamarine floral ivory bisque   1206.26 5   1   
1   NULL1290.35 4   6
-Manufacturer#4 almond azure aquamarine papaya violet   1290.35 6   1   
1   NULL1290.35 4   6
+Manufacturer#3 almond antique olive coral navajo   1337.29 7   1   
1   1190.27 1190.27 7   8
+Manufacturer#3 almond antique forest lavender goldenrod590.27  6   
1   1   1190.27 1190.27 7   8
+Manufacturer#3 almond antique misty red olive  1922.98 5   1   1   
1190.27 1190.27 7   8
+Manufacturer#3 almond antique metallic orange dim  55.39   4   1   
1   1190.27 1190.27 7   8
+Manufacturer#3 almond antique forest lavender goldenrodNULL3   
1   1   1190.27 1190.27 7   8
+Manufacturer#3 almond antique chartreuse khaki white   99.68   2   1   
1   1190.27 1190.27 7   8
+Manufacturer#1 almond aquamarine pink moccasin thistle 1632.66 1   1   
1   1632.66 1632.66 11  12
+Manufacturer#1 almond antique chartreuse lavender yellow   

[jira] [Work logged] (HIVE-23768) Metastore's update service wrongly strips partition column stats from the cache

2020-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23768?focusedWorklogId=454089=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-454089
 ]

ASF GitHub Bot logged work on HIVE-23768:
-

Author: ASF GitHub Bot
Created on: 02/Jul/20 17:07
Start Date: 02/Jul/20 17:07
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #1186:
URL: https://github.com/apache/hive/pull/1186


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 454089)
Remaining Estimate: 0h
Time Spent: 10m

> Metastore's update service wrongly strips partition column stats from the 
> cache
> ---
>
> Key: HIVE-23768
> URL: https://issues.apache.org/jira/browse/HIVE-23768
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Metastore's update service wrongly strips partition column stats from the 
> cache in an attempt to update them. The issue may go unnoticed since missing 
> stats do not lead to query failures. 
> However, they can alter significantly the query plan affecting performance. 
> Moreover, they lead to flakiness since some times the stats are present and 
> sometimes are not leading to a query that has a different plan overtime. 
> Normally missing elements from the cache shouldn't be a correctness problem 
> since we can always fallback to the raw stats. Unfortunately, there are many 
> interconnections with other parts of the code (e.g., code to obtain aggregate 
> statistics) where this contract breaks.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23768) Metastore's update service wrongly strips partition column stats from the cache

2020-07-02 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150457#comment-17150457
 ] 

Jesus Camacho Rodriguez commented on HIVE-23768:


Pushed to master, thanks [~zabetak]!

> Metastore's update service wrongly strips partition column stats from the 
> cache
> ---
>
> Key: HIVE-23768
> URL: https://issues.apache.org/jira/browse/HIVE-23768
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Metastore's update service wrongly strips partition column stats from the 
> cache in an attempt to update them. The issue may go unnoticed since missing 
> stats do not lead to query failures. 
> However, they can alter significantly the query plan affecting performance. 
> Moreover, they lead to flakiness since some times the stats are present and 
> sometimes are not leading to a query that has a different plan overtime. 
> Normally missing elements from the cache shouldn't be a correctness problem 
> since we can always fallback to the raw stats. Unfortunately, there are many 
> interconnections with other parts of the code (e.g., code to obtain aggregate 
> statistics) where this contract breaks.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23768) Metastore's update service wrongly strips partition column stats from the cache

2020-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23768:
--
Labels: pull-request-available  (was: )

> Metastore's update service wrongly strips partition column stats from the 
> cache
> ---
>
> Key: HIVE-23768
> URL: https://issues.apache.org/jira/browse/HIVE-23768
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Metastore's update service wrongly strips partition column stats from the 
> cache in an attempt to update them. The issue may go unnoticed since missing 
> stats do not lead to query failures. 
> However, they can alter significantly the query plan affecting performance. 
> Moreover, they lead to flakiness since some times the stats are present and 
> sometimes are not leading to a query that has a different plan overtime. 
> Normally missing elements from the cache shouldn't be a correctness problem 
> since we can always fallback to the raw stats. Unfortunately, there are many 
> interconnections with other parts of the code (e.g., code to obtain aggregate 
> statistics) where this contract breaks.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23768) Metastore's update service wrongly strips partition column stats from the cache

2020-07-02 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-23768.

Fix Version/s: 4.0.0
   Resolution: Fixed

> Metastore's update service wrongly strips partition column stats from the 
> cache
> ---
>
> Key: HIVE-23768
> URL: https://issues.apache.org/jira/browse/HIVE-23768
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Metastore's update service wrongly strips partition column stats from the 
> cache in an attempt to update them. The issue may go unnoticed since missing 
> stats do not lead to query failures. 
> However, they can alter significantly the query plan affecting performance. 
> Moreover, they lead to flakiness since some times the stats are present and 
> sometimes are not leading to a query that has a different plan overtime. 
> Normally missing elements from the cache shouldn't be a correctness problem 
> since we can always fallback to the raw stats. Unfortunately, there are many 
> interconnections with other parts of the code (e.g., code to obtain aggregate 
> statistics) where this contract breaks.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23768) Metastore's update service wrongly strips partition column stats from the cache

2020-07-02 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150454#comment-17150454
 ] 

Jesus Camacho Rodriguez commented on HIVE-23768:


+1

> Metastore's update service wrongly strips partition column stats from the 
> cache
> ---
>
> Key: HIVE-23768
> URL: https://issues.apache.org/jira/browse/HIVE-23768
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Critical
>
> Metastore's update service wrongly strips partition column stats from the 
> cache in an attempt to update them. The issue may go unnoticed since missing 
> stats do not lead to query failures. 
> However, they can alter significantly the query plan affecting performance. 
> Moreover, they lead to flakiness since some times the stats are present and 
> sometimes are not leading to a query that has a different plan overtime. 
> Normally missing elements from the cache shouldn't be a correctness problem 
> since we can always fallback to the raw stats. Unfortunately, there are many 
> interconnections with other parts of the code (e.g., code to obtain aggregate 
> statistics) where this contract breaks.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=453958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453958
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 02/Jul/20 13:41
Start Date: 02/Jul/20 13:41
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r447415363



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2497,26 +2610,87 @@ long getPartsFound() {
 
   @Override public List getPrimaryKeys(String catName, String 
dbName, String tblName)
   throws MetaException {
-// TODO constraintCache
-return rawStore.getPrimaryKeys(catName, dbName, tblName);
+catName = normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the primary keys is not yet loaded in cache
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+List keys = sharedCache.listCachedPrimaryKeys(catName, 
dbName, tblName);
+
+return keys;
   }
 
   @Override public List getForeignKeys(String catName, String 
parentDbName, String parentTblName,
   String foreignDbName, String foreignTblName) throws MetaException {
-// TODO constraintCache
-return rawStore.getForeignKeys(catName, parentDbName, parentTblName, 
foreignDbName, foreignTblName);
+ // Get correct ForeignDBName and TableName
+if (foreignDbName == null || foreignTblName == null) {
+  return rawStore.getForeignKeys(catName, parentDbName, parentTblName, 
foreignDbName, foreignTblName);

Review comment:
   This flow is a candidate for improvement as it tries to fetch all 
foreignkeys of give parent table and vice-versa which is frequent operations. 
Pls create a follow-up JIRA to use CachedStore for this case too.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2497,26 +2610,87 @@ long getPartsFound() {
 
   @Override public List getPrimaryKeys(String catName, String 
dbName, String tblName)
   throws MetaException {
-// TODO constraintCache
-return rawStore.getPrimaryKeys(catName, dbName, tblName);
+catName = normalizeIdentifier(catName);
+dbName = StringUtils.normalizeIdentifier(dbName);
+tblName = StringUtils.normalizeIdentifier(tblName);
+if (!shouldCacheTable(catName, dbName, tblName) || (canUseEvents && 
rawStore.isActiveTransaction())) {
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+
+Table tbl = sharedCache.getTableFromCache(catName, dbName, tblName);
+if (tbl == null) {
+  // The table containing the primary keys is not yet loaded in cache
+  return rawStore.getPrimaryKeys(catName, dbName, tblName);
+}
+List keys = sharedCache.listCachedPrimaryKeys(catName, 
dbName, tblName);
+
+return keys;
   }
 
   @Override public List getForeignKeys(String catName, String 
parentDbName, String parentTblName,
   String foreignDbName, String foreignTblName) throws MetaException {
-// TODO constraintCache
-return rawStore.getForeignKeys(catName, parentDbName, parentTblName, 
foreignDbName, foreignTblName);
+ // Get correct ForeignDBName and TableName
+if (foreignDbName == null || foreignTblName == null) {

Review comment:
   We should take the same path if parentDbName or parentTblName is null.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -867,6 +909,77 @@ private void updateTableColStats(RawStore rawStore, String 
catName, String dbNam
   }
 }
 
+private void updateTableForeignKeys(RawStore rawStore, String catName, 
String dbName, String tblName) {
+  LOG.debug("CachedStore: updating cached foreign keys objects for 
catalog: {}, database: {}, table: {}", catName,
+  dbName, tblName);
+  try {
+Deadline.startTimer("getForeignKeys");
+List fks = rawStore.getForeignKeys(catName, null, null, 
dbName, tblName);
+Deadline.stopTimer();
+
sharedCache.refreshForeignKeysInCache(StringUtils.normalizeIdentifier(catName),
+StringUtils.normalizeIdentifier(dbName), 
StringUtils.normalizeIdentifier(tblName), fks);
+LOG.debug("CachedStore: updated cached foreign keys objects for 
catalog: 

[jira] [Work logged] (HIVE-22674) Replace Base64 in serde Package

2020-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22674?focusedWorklogId=453934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453934
 ]

ASF GitHub Bot logged work on HIVE-22674:
-

Author: ASF GitHub Bot
Created on: 02/Jul/20 12:46
Start Date: 02/Jul/20 12:46
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1203:
URL: https://github.com/apache/hive/pull/1203


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453934)
Remaining Estimate: 0h
Time Spent: 10m

> Replace Base64 in serde Package
> ---
>
> Key: HIVE-22674
> URL: https://issues.apache.org/jira/browse/HIVE-22674
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-22674.1.patch, HIVE-22674.2.patch, 
> HIVE-22674.2.patch, HIVE-22674.2.patch, HIVE-22674.2.patch, HIVE-22674.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22674) Replace Base64 in serde Package

2020-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-22674:
--
Labels: pull-request-available  (was: )

> Replace Base64 in serde Package
> ---
>
> Key: HIVE-22674
> URL: https://issues.apache.org/jira/browse/HIVE-22674
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-22674.1.patch, HIVE-22674.2.patch, 
> HIVE-22674.2.patch, HIVE-22674.2.patch, HIVE-22674.2.patch, HIVE-22674.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23797) Throwing exception when no metastore spec found in zookeeper

2020-07-02 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-23797:
---
Issue Type: Improvement  (was: Bug)

> Throwing exception when no metastore spec found in zookeeper
> 
>
> Key: HIVE-23797
> URL: https://issues.apache.org/jira/browse/HIVE-23797
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When enable service discovery for metastore, there is a chance that the 
> client may find no metastore uris available in zookeeper, such as during 
> metastores startup or the client wrongly configured the path. This results to 
> redundant retries and finally MetaException with "Unknown exception" message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22676) Replace Base64 in hive-service Package

2020-07-02 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-22676:
--
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master.  Thanks [~pvary] and [~ngangam] (privately) for the review!!

> Replace Base64 in hive-service Package
> --
>
> Key: HIVE-22676
> URL: https://issues.apache.org/jira/browse/HIVE-22676
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-22676.1.patch, HIVE-22676.2.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22676) Replace Base64 in hive-service Package

2020-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22676?focusedWorklogId=453921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453921
 ]

ASF GitHub Bot logged work on HIVE-22676:
-

Author: ASF GitHub Bot
Created on: 02/Jul/20 12:32
Start Date: 02/Jul/20 12:32
Worklog Time Spent: 10m 
  Work Description: belugabehr merged pull request #1090:
URL: https://github.com/apache/hive/pull/1090


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453921)
Time Spent: 40m  (was: 0.5h)

> Replace Base64 in hive-service Package
> --
>
> Key: HIVE-22676
> URL: https://issues.apache.org/jira/browse/HIVE-22676
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-22676.1.patch, HIVE-22676.2.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cleanup

2020-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=453836=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453836
 ]

ASF GitHub Bot logged work on HIVE-23727:
-

Author: ASF GitHub Bot
Created on: 02/Jul/20 08:25
Start Date: 02/Jul/20 08:25
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #1149:
URL: https://github.com/apache/hive/pull/1149


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453836)
Time Spent: 1h 20m  (was: 1h 10m)

> Improve SQLOperation log handling when cleanup
> --
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cleanup

2020-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=453835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453835
 ]

ASF GitHub Bot logged work on HIVE-23727:
-

Author: ASF GitHub Bot
Created on: 02/Jul/20 08:25
Start Date: 02/Jul/20 08:25
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 closed pull request #1149:
URL: https://github.com/apache/hive/pull/1149


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453835)
Time Spent: 1h 10m  (was: 1h)

> Improve SQLOperation log handling when cleanup
> --
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL

2020-07-02 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-23721:
--
Affects Version/s: 4.0.0

> MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
> ---
>
> Key: HIVE-23721
> URL: https://issues.apache.org/jira/browse/HIVE-23721
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 3.1.2
> Environment: Hadoop 3.1(1700+ nodes)
> YARN 3.1 (with timelineserver enabled,https enabled)
> Hive 3.1 (15 HS2 instance)
> 6+ YARN Applications every day
>Reporter: YulongZ
>Assignee: zhangbutao
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23721.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> From Hive3.0,catalog added to hivemeta,many schema of metastore added column 
> “catName”,and index for table added column “catName”。
> In MetaStoreDirectSql.ensureDbInit() ,two queries below
> “
>   initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == 
> ''"));
>   initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName 
> == ''"));
> ”
> should use "catName == ''" instead of "dbName == ''",because “catName” is the 
> first index column。
> When  data of metastore become large,for example, table of 
> MPartitionColumnStatistics have millions of lines。The 
> “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore 
> executed very slowly,and the query “show tables“ for hiveserver2 executed 
> very slowly too。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL

2020-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23721?focusedWorklogId=453816=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453816
 ]

ASF GitHub Bot logged work on HIVE-23721:
-

Author: ASF GitHub Bot
Created on: 02/Jul/20 07:47
Start Date: 02/Jul/20 07:47
Worklog Time Spent: 10m 
  Work Description: butaozhang opened a new pull request #1202:
URL: https://github.com/apache/hive/pull/1202


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453816)
Remaining Estimate: 0h
Time Spent: 10m

> MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
> ---
>
> Key: HIVE-23721
> URL: https://issues.apache.org/jira/browse/HIVE-23721
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
> Environment: Hadoop 3.1(1700+ nodes)
> YARN 3.1 (with timelineserver enabled,https enabled)
> Hive 3.1 (15 HS2 instance)
> 6+ YARN Applications every day
>Reporter: YulongZ
>Assignee: zhangbutao
>Priority: Critical
> Fix For: 4.0.0
>
> Attachments: HIVE-23721.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> From Hive3.0,catalog added to hivemeta,many schema of metastore added column 
> “catName”,and index for table added column “catName”。
> In MetaStoreDirectSql.ensureDbInit() ,two queries below
> “
>   initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == 
> ''"));
>   initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName 
> == ''"));
> ”
> should use "catName == ''" instead of "dbName == ''",because “catName” is the 
> first index column。
> When  data of metastore become large,for example, table of 
> MPartitionColumnStatistics have millions of lines。The 
> “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore 
> executed very slowly,and the query “show tables“ for hiveserver2 executed 
> very slowly too。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL

2020-07-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23721:
--
Labels: pull-request-available  (was: )

> MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
> ---
>
> Key: HIVE-23721
> URL: https://issues.apache.org/jira/browse/HIVE-23721
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
> Environment: Hadoop 3.1(1700+ nodes)
> YARN 3.1 (with timelineserver enabled,https enabled)
> Hive 3.1 (15 HS2 instance)
> 6+ YARN Applications every day
>Reporter: YulongZ
>Assignee: zhangbutao
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23721.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> From Hive3.0,catalog added to hivemeta,many schema of metastore added column 
> “catName”,and index for table added column “catName”。
> In MetaStoreDirectSql.ensureDbInit() ,two queries below
> “
>   initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == 
> ''"));
>   initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName 
> == ''"));
> ”
> should use "catName == ''" instead of "dbName == ''",because “catName” is the 
> first index column。
> When  data of metastore become large,for example, table of 
> MPartitionColumnStatistics have millions of lines。The 
> “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore 
> executed very slowly,and the query “show tables“ for hiveserver2 executed 
> very slowly too。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL

2020-07-02 Thread zhangbutao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149997#comment-17149997
 ] 

zhangbutao edited comment on HIVE-23721 at 7/2/20, 7:12 AM:


When set hive.in.test=true;  MetaStoreDirectSql.ensureDbInit will check 
metadate  before each sql request.

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L212]

 

However, the two queries do not use the index correctly:

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L280]

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L281]

 

This is because table TAB_COL_STATS and PART_COL_STATS have combined index:

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L742]

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L774]

 

According to the leftmost matching principle of the combined index, we should  
use "catName == ''" instead of "dbName == ''",because “catName” is the first 
index column. 


was (Author: zhangbutao):
We set hive.in.test=true;  MetaStoreDirectSql.ensureDbInit will check metadate  
before each sql request.

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L212]

 

However, the two queries do not use the index correctly:

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L280]

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L281]

 

This is because table TAB_COL_STATS and PART_COL_STATS have combined index:

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L742]

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L774]

 

According to the leftmost matching principle of the combined index, we should  
use "catName == ''" instead of "dbName == ''",because “catName” is the first 
index column. 

> MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
> ---
>
> Key: HIVE-23721
> URL: https://issues.apache.org/jira/browse/HIVE-23721
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
> Environment: Hadoop 3.1(1700+ nodes)
> YARN 3.1 (with timelineserver enabled,https enabled)
> Hive 3.1 (15 HS2 instance)
> 6+ YARN Applications every day
>Reporter: YulongZ
>Assignee: zhangbutao
>Priority: Critical
> Fix For: 4.0.0
>
> Attachments: HIVE-23721.01.patch
>
>
> From Hive3.0,catalog added to hivemeta,many schema of metastore added column 
> “catName”,and index for table added column “catName”。
> In MetaStoreDirectSql.ensureDbInit() ,two queries below
> “
>   initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == 
> ''"));
>   initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName 
> == ''"));
> ”
> should use "catName == ''" instead of "dbName == ''",because “catName” is the 
> first index column。
> When  data of metastore become large,for example, table of 
> MPartitionColumnStatistics have millions of lines。The 
> “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore 
> executed very slowly,and the query “show tables“ for hiveserver2 executed 
> very slowly too。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL

2020-07-02 Thread zhangbutao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149997#comment-17149997
 ] 

zhangbutao commented on HIVE-23721:
---

We set hive.in.test=true;  MetaStoreDirectSql.ensureDbInit will check metadate  
before each sql request.

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L212]

 

However, the two queries do not use the index correctly:

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L280]

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L281]

 

This is because table TAB_COL_STATS and PART_COL_STATS have combined index:

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L742]

[https://github.com/apache/hive/blob/4942a7c0b4be3a5b0c889a89b903e9a70c57d494/standalone-metastore/metastore-server/src/main/sql/mysql/hive-schema-4.0.0.mysql.sql#L774]

 

According to the leftmost matching principle of the combined index, we should  
use "catName == ''" instead of "dbName == ''",because “catName” is the first 
index column. 

> MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
> ---
>
> Key: HIVE-23721
> URL: https://issues.apache.org/jira/browse/HIVE-23721
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
> Environment: Hadoop 3.1(1700+ nodes)
> YARN 3.1 (with timelineserver enabled,https enabled)
> Hive 3.1 (15 HS2 instance)
> 6+ YARN Applications every day
>Reporter: YulongZ
>Assignee: zhangbutao
>Priority: Critical
> Fix For: 4.0.0
>
> Attachments: HIVE-23721.01.patch
>
>
> From Hive3.0,catalog added to hivemeta,many schema of metastore added column 
> “catName”,and index for table added column “catName”。
> In MetaStoreDirectSql.ensureDbInit() ,two queries below
> “
>   initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == 
> ''"));
>   initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName 
> == ''"));
> ”
> should use "catName == ''" instead of "dbName == ''",because “catName” is the 
> first index column。
> When  data of metastore become large,for example, table of 
> MPartitionColumnStatistics have millions of lines。The 
> “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore 
> executed very slowly,and the query “show tables“ for hiveserver2 executed 
> very slowly too。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL

2020-07-02 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-23721:
-

Assignee: zhangbutao

> MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
> ---
>
> Key: HIVE-23721
> URL: https://issues.apache.org/jira/browse/HIVE-23721
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
> Environment: Hadoop 3.1(1700+ nodes)
> YARN 3.1 (with timelineserver enabled,https enabled)
> Hive 3.1 (15 HS2 instance)
> 6+ YARN Applications every day
>Reporter: YulongZ
>Assignee: zhangbutao
>Priority: Critical
> Attachments: HIVE-23721.01.patch
>
>
> From Hive3.0,catalog added to hivemeta,many schema of metastore added column 
> “catName”,and index for table added column “catName”。
> In MetaStoreDirectSql.ensureDbInit() ,two queries below
> “
>   initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == 
> ''"));
>   initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName 
> == ''"));
> ”
> should use "catName == ''" instead of "dbName == ''",because “catName” is the 
> first index column。
> When  data of metastore become large,for example, table of 
> MPartitionColumnStatistics have millions of lines。The 
> “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore 
> executed very slowly,and the query “show tables“ for hiveserver2 executed 
> very slowly too。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23721) MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL

2020-07-02 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-23721:
--
   Attachment: HIVE-23721.01.patch
Fix Version/s: 4.0.0
   Status: Patch Available  (was: Open)

> MetaStoreDirectSql.ensureDbInit() need to optimize QuerySQL
> ---
>
> Key: HIVE-23721
> URL: https://issues.apache.org/jira/browse/HIVE-23721
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2
> Environment: Hadoop 3.1(1700+ nodes)
> YARN 3.1 (with timelineserver enabled,https enabled)
> Hive 3.1 (15 HS2 instance)
> 6+ YARN Applications every day
>Reporter: YulongZ
>Assignee: zhangbutao
>Priority: Critical
> Fix For: 4.0.0
>
> Attachments: HIVE-23721.01.patch
>
>
> From Hive3.0,catalog added to hivemeta,many schema of metastore added column 
> “catName”,and index for table added column “catName”。
> In MetaStoreDirectSql.ensureDbInit() ,two queries below
> “
>   initQueries.add(pm.newQuery(MTableColumnStatistics.class, "dbName == 
> ''"));
>   initQueries.add(pm.newQuery(MPartitionColumnStatistics.class, "dbName 
> == ''"));
> ”
> should use "catName == ''" instead of "dbName == ''",because “catName” is the 
> first index column。
> When  data of metastore become large,for example, table of 
> MPartitionColumnStatistics have millions of lines。The 
> “newQuery(MPartitionColumnStatistics.class, "dbName == ''")” for metastore 
> executed very slowly,and the query “show tables“ for hiveserver2 executed 
> very slowly too。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)