[jira] [Work logged] (HIVE-25113) Connection starvation in TxnHandler.getValidWriteIds

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25113?focusedWorklogId=596440&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-596440
 ]

ASF GitHub Bot logged work on HIVE-25113:
-

Author: ASF GitHub Bot
Created on: 14/May/21 00:49
Start Date: 14/May/21 00:49
Worklog Time Spent: 10m 
  Work Description: kishendas commented on pull request #2272:
URL: https://github.com/apache/hive/pull/2272#issuecomment-840920625


   Can you write simple tests to ensure these methods are reusing the existing 
connection, if its not null ? 
   May be you can mock Connection ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 596440)
Time Spent: 20m  (was: 10m)

> Connection starvation in TxnHandler.getValidWriteIds
> 
>
> Key: HIVE-25113
> URL: https://issues.apache.org/jira/browse/HIVE-25113
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>  
> The current code looks like below.
> {code:java}
> dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
> validTxnList = TxnUtils.createValidReadTxnList(getOpenTxns(), 0);
> {code}
> In the function getOpenTxns, it will request another connection from pool. 
> That is, this thread already held a connection, however, it would request for 
> another connection. When there are more than 10 (default connection pool 
> size) simultaneous getValidWriteIds requests, it can cause a starvation 
> problem. In that situation, each thread holds a connection and waits for 
> another connection. Then, we will see the following exception after timeout.
> {code:java}
> metastore.RetryingHMSHandler: MetaException(message:Unable to select from 
> transaction database, java.sql.SQLTransientConnectionException: HikariPool-3 
> - Connection is not available, request timed out after 3ms.{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24596) Explain ddl for debugging

2021-05-13 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan resolved HIVE-24596.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Explain ddl for debugging
> -
>
> Key: HIVE-24596
> URL: https://issues.apache.org/jira/browse/HIVE-24596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: output, query, table_definitions
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> For debugging query issues, basic details like table schema, statistics, 
> partition details, query plans are needed.
> It would be good to have "explain ddl" support, which can generate these 
> details. This can help in recreating the schema and planner issues without 
> sample data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24596) Explain ddl for debugging

2021-05-13 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344191#comment-17344191
 ] 

Rajesh Balamohan commented on HIVE-24596:
-

Merged PR to master branch. Thanks [~harshit.gupta].

> Explain ddl for debugging
> -
>
> Key: HIVE-24596
> URL: https://issues.apache.org/jira/browse/HIVE-24596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
> Attachments: output, query, table_definitions
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> For debugging query issues, basic details like table schema, statistics, 
> partition details, query plans are needed.
> It would be good to have "explain ddl" support, which can generate these 
> details. This can help in recreating the schema and planner issues without 
> sample data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24596) Explain ddl for debugging

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24596?focusedWorklogId=596409&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-596409
 ]

ASF GitHub Bot logged work on HIVE-24596:
-

Author: ASF GitHub Bot
Created on: 13/May/21 23:41
Start Date: 13/May/21 23:41
Worklog Time Spent: 10m 
  Work Description: rbalamohan merged pull request #2033:
URL: https://github.com/apache/hive/pull/2033


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 596409)
Time Spent: 3h  (was: 2h 50m)

> Explain ddl for debugging
> -
>
> Key: HIVE-24596
> URL: https://issues.apache.org/jira/browse/HIVE-24596
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
> Attachments: output, query, table_definitions
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> For debugging query issues, basic details like table schema, statistics, 
> partition details, query plans are needed.
> It would be good to have "explain ddl" support, which can generate these 
> details. This can help in recreating the schema and planner issues without 
> sample data.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25114) Optmize get_tables() api call in HMS

2021-05-13 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala reassigned HIVE-25114:



> Optmize get_tables() api call in HMS
> 
>
> Key: HIVE-25114
> URL: https://issues.apache.org/jira/browse/HIVE-25114
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> Optmize get_tables() call in HMS api. There should only be one call to object 
> store instead of 2 calls to return the table objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21643) Fix Broken support for ISO Time with Zone in Hive UDFs

2021-05-13 Thread Bradley Peterson (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344093#comment-17344093
 ] 

Bradley Peterson commented on HIVE-21643:
-

This PR was closed without review. Is there any plan to support parsing ISO 
timestamps with timezone in Hive 3.1?

> Fix Broken support for ISO Time with Zone in Hive UDFs
> --
>
> Key: HIVE-21643
> URL: https://issues.apache.org/jira/browse/HIVE-21643
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 3.1.0, 3.0.0, 3.1.1
>Reporter: RAJKAMAL
>Assignee: Navya Sruthi Sunkarapalli
>Priority: Major
>  Labels: patch-available, pull-request-available
> Attachments: HIVE-21643.1.patch, Hive-21643.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The followings UDFs date_format and to_date used to support ISO dates with 
> timezone and the support has been broken since Hive 3.x release.
> Example:
> date_format('2017-03-16T00:10:42Z', 'y')
> date_format('2017-03-16T00:10:42+01:00', 'y')
> date_format('2017-03-16T00:10:42-01:00', 'y')
> to_date('2015-04-11T01:30:45Z')
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25113) Connection starvation in TxnHandler.getValidWriteIds

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25113:
--
Labels: pull-request-available  (was: )

> Connection starvation in TxnHandler.getValidWriteIds
> 
>
> Key: HIVE-25113
> URL: https://issues.apache.org/jira/browse/HIVE-25113
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> The current code looks like below.
> {code:java}
> dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
> validTxnList = TxnUtils.createValidReadTxnList(getOpenTxns(), 0);
> {code}
> In the function getOpenTxns, it will request another connection from pool. 
> That is, this thread already held a connection, however, it would request for 
> another connection. When there are more than 10 (default connection pool 
> size) simultaneous getValidWriteIds requests, it can cause a starvation 
> problem. In that situation, each thread holds a connection and waits for 
> another connection. Then, we will see the following exception after timeout.
> {code:java}
> metastore.RetryingHMSHandler: MetaException(message:Unable to select from 
> transaction database, java.sql.SQLTransientConnectionException: HikariPool-3 
> - Connection is not available, request timed out after 3ms.{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25113) Connection starvation in TxnHandler.getValidWriteIds

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25113?focusedWorklogId=596328&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-596328
 ]

ASF GitHub Bot logged work on HIVE-25113:
-

Author: ASF GitHub Bot
Created on: 13/May/21 19:33
Start Date: 13/May/21 19:33
Worklog Time Spent: 10m 
  Work Description: hsnusonic opened a new pull request #2272:
URL: https://github.com/apache/hive/pull/2272


   
   
   
   ### What changes were proposed in this pull request?
   Pass dbConn to the function getOpenTxns so that it doesn't request another 
connection from pool.
   
   
   
   ### Why are the changes needed?
   If we let a thread holds more than one connection, a starvation problem 
might emerge when there are multiple simultaneous requests.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Existing tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 596328)
Remaining Estimate: 0h
Time Spent: 10m

> Connection starvation in TxnHandler.getValidWriteIds
> 
>
> Key: HIVE-25113
> URL: https://issues.apache.org/jira/browse/HIVE-25113
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> The current code looks like below.
> {code:java}
> dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
> validTxnList = TxnUtils.createValidReadTxnList(getOpenTxns(), 0);
> {code}
> In the function getOpenTxns, it will request another connection from pool. 
> That is, this thread already held a connection, however, it would request for 
> another connection. When there are more than 10 (default connection pool 
> size) simultaneous getValidWriteIds requests, it can cause a starvation 
> problem. In that situation, each thread holds a connection and waits for 
> another connection. Then, we will see the following exception after timeout.
> {code:java}
> metastore.RetryingHMSHandler: MetaException(message:Unable to select from 
> transaction database, java.sql.SQLTransientConnectionException: HikariPool-3 
> - Connection is not available, request timed out after 3ms.{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-25113) Connection starvation in TxnHandler.getValidWriteIds

2021-05-13 Thread Yu-Wen Lai (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25113 started by Yu-Wen Lai.
-
> Connection starvation in TxnHandler.getValidWriteIds
> 
>
> Key: HIVE-25113
> URL: https://issues.apache.org/jira/browse/HIVE-25113
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>
>  
> The current code looks like below.
> {code:java}
> dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
> validTxnList = TxnUtils.createValidReadTxnList(getOpenTxns(), 0);
> {code}
> In the function getOpenTxns, it will request another connection from pool. 
> That is, this thread already held a connection, however, it would request for 
> another connection. When there are more than 10 (default connection pool 
> size) simultaneous getValidWriteIds requests, it can cause a starvation 
> problem. In that situation, each thread holds a connection and waits for 
> another connection. Then, we will see the following exception after timeout.
> {code:java}
> metastore.RetryingHMSHandler: MetaException(message:Unable to select from 
> transaction database, java.sql.SQLTransientConnectionException: HikariPool-3 
> - Connection is not available, request timed out after 3ms.{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25113) Connection starvation in TxnHandler.getValidWriteIds

2021-05-13 Thread Yu-Wen Lai (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu-Wen Lai reassigned HIVE-25113:
-


> Connection starvation in TxnHandler.getValidWriteIds
> 
>
> Key: HIVE-25113
> URL: https://issues.apache.org/jira/browse/HIVE-25113
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>
>  
> The current code looks like below.
> {code:java}
> dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
> validTxnList = TxnUtils.createValidReadTxnList(getOpenTxns(), 0);
> {code}
> In the function getOpenTxns, it will request another connection from pool. 
> That is, this thread already held a connection, however, it would request for 
> another connection. When there are more than 10 (default connection pool 
> size) simultaneous getValidWriteIds requests, it can cause a starvation 
> problem. In that situation, each thread holds a connection and waits for 
> another connection. Then, we will see the following exception after timeout.
> {code:java}
> metastore.RetryingHMSHandler: MetaException(message:Unable to select from 
> transaction database, java.sql.SQLTransientConnectionException: HikariPool-3 
> - Connection is not available, request timed out after 3ms.{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25107) Classpath logging should be on DEBUG level

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25107:
--
Labels: pull-request-available  (was: )

> Classpath logging should be on DEBUG level
> --
>
> Key: HIVE-25107
> URL: https://issues.apache.org/jira/browse/HIVE-25107
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is since HIVE-21584
> I have a *72M* llap executor log file, then I grepped for only "thread class 
> path", piped into a separate file and a result was a *22M* file...1/3-1/4 of 
> the file was classpath info which is not usable for most of the time. This 
> overwhelming amount of classpath info is not needed, assuming that classpath 
> issues are reproducible with more or less effort, user should be responsible 
> for turning on this expensive logging on demand. Not to mention performance 
> implications which cannot be ignored beyond a certain amount of log messages.
> https://github.com/apache/hive/commit/a234475faa2cab2606f2a74eb9ca071f006998e2#diff-44b2ff3a3c4a6cfcaed0fcb40b74031844f8586e40a6f8261637e5ebcd558b73R4577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25107) Classpath logging should be on DEBUG level

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25107?focusedWorklogId=596266&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-596266
 ]

ASF GitHub Bot logged work on HIVE-25107:
-

Author: ASF GitHub Bot
Created on: 13/May/21 17:15
Start Date: 13/May/21 17:15
Worklog Time Spent: 10m 
  Work Description: abstractdog opened a new pull request #2271:
URL: https://github.com/apache/hive/pull/2271


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 596266)
Remaining Estimate: 0h
Time Spent: 10m

> Classpath logging should be on DEBUG level
> --
>
> Key: HIVE-25107
> URL: https://issues.apache.org/jira/browse/HIVE-25107
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is since HIVE-21584
> I have a *72M* llap executor log file, then I grepped for only "thread class 
> path", piped into a separate file and a result was a *22M* file...1/3-1/4 of 
> the file was classpath info which is not usable for most of the time. This 
> overwhelming amount of classpath info is not needed, assuming that classpath 
> issues are reproducible with more or less effort, user should be responsible 
> for turning on this expensive logging on demand. Not to mention performance 
> implications which cannot be ignored beyond a certain amount of log messages.
> https://github.com/apache/hive/commit/a234475faa2cab2606f2a74eb9ca071f006998e2#diff-44b2ff3a3c4a6cfcaed0fcb40b74031844f8586e40a6f8261637e5ebcd558b73R4577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25107) Classpath logging should be on DEBUG level

2021-05-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-25107:

Summary: Classpath logging should be on DEBUG level  (was: Thread classpath 
logging should be on DEBUG level)

> Classpath logging should be on DEBUG level
> --
>
> Key: HIVE-25107
> URL: https://issues.apache.org/jira/browse/HIVE-25107
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> This is since HIVE-21584
> I have a *72M* llap executor log file, then I grepped for only "thread class 
> path", piped into a separate file and a result was a *22M* file...1/3-1/4 of 
> the file was classpath info which is not usable for most of the time. This 
> overwhelming amount of classpath info is not needed, assuming that classpath 
> issues are reproducible with more or less effort, user should be responsible 
> for turning on this expensive logging on demand. Not to mention performance 
> implications which cannot be ignored beyond a certain amount of log messages.
> https://github.com/apache/hive/commit/a234475faa2cab2606f2a74eb9ca071f006998e2#diff-44b2ff3a3c4a6cfcaed0fcb40b74031844f8586e40a6f8261637e5ebcd558b73R4577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25107) Thread classpath logging should be on DEBUG level

2021-05-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-25107:

Summary: Thread classpath logging should be on DEBUG level  (was: Thread 
classpath logging should be optional/DEBUG)

> Thread classpath logging should be on DEBUG level
> -
>
> Key: HIVE-25107
> URL: https://issues.apache.org/jira/browse/HIVE-25107
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> This is since HIVE-21584
> I have a *72M* llap executor log file, then I grepped for only "thread class 
> path", piped into a separate file and a result was a *22M* file...1/3-1/4 of 
> the file was classpath info which is not usable for most of the time. This 
> overwhelming amount of classpath info is not needed, assuming that classpath 
> issues are reproducible with more or less effort, user should be responsible 
> for turning on this expensive logging on demand. Not to mention performance 
> implications which cannot be ignored beyond a certain amount of log messages.
> https://github.com/apache/hive/commit/a234475faa2cab2606f2a74eb9ca071f006998e2#diff-44b2ff3a3c4a6cfcaed0fcb40b74031844f8586e40a6f8261637e5ebcd558b73R4577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25112) Simplify TXN Compactor Heartbeat Thread

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25112?focusedWorklogId=596229&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-596229
 ]

ASF GitHub Bot logged work on HIVE-25112:
-

Author: ASF GitHub Bot
Created on: 13/May/21 16:22
Start Date: 13/May/21 16:22
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2270:
URL: https://github.com/apache/hive/pull/2270#discussion_r631935634



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -250,49 +251,48 @@ void gatherStats() {
 
   static final class CompactionHeartbeater extends Thread {
 static final private Logger LOG = 
LoggerFactory.getLogger(CompactionHeartbeater.class);
-private final AtomicBoolean stop = new AtomicBoolean();
 private final CompactionTxn compactionTxn;
 private final String tableName;
 private final HiveConf conf;
-private final long interval;
+private final long txnTimeout;
+
 public CompactionHeartbeater(CompactionTxn compactionTxn, String 
tableName, HiveConf conf) {
-  this.tableName = tableName;
-  this.compactionTxn = compactionTxn;
-  this.conf = conf;
+  this.tableName = Objects.requireNonNull(tableName);
+  this.compactionTxn = Objects.requireNonNull(compactionTxn);
+  this.conf = Objects.requireNonNull(conf);
+
+  this.txnTimeout = MetastoreConf.getTimeVar(conf, 
MetastoreConf.ConfVars.TXN_TIMEOUT, TimeUnit.MILLISECONDS);
 
-  this.interval =
-  MetastoreConf.getTimeVar(conf, MetastoreConf.ConfVars.TXN_TIMEOUT, 
TimeUnit.MILLISECONDS) / 2;
   setDaemon(true);
   setPriority(MIN_PRIORITY);
   setName("CompactionHeartbeater-" + compactionTxn.getTxnId());
 }
+
 @Override
 public void run() {
+  LOG.debug("Heartbeating compaction transaction id {} for table: {}", 
compactionTxn, tableName);
+
   IMetaStoreClient msc = null;
   try {
-// We need to create our own metastore client since the thrifts clients
-// are not thread safe.
+// Create a metastore client for each thread since it is not thread 
safe
 msc = HiveMetaStoreUtils.getHiveMetastoreClient(conf);
-LOG.debug("Heartbeating compaction transaction id {} for table: {}", 
compactionTxn, tableName);
-while(!stop.get()) {
+while (true) {
   msc.heartbeat(compactionTxn.getTxnId(), 0);
-  Thread.sleep(interval);
+
+  // Send a heart beat before a timeout occurs. Scale the interval 
based
+  // on the server's transaction timeout allowance
+  Thread.sleep(txnTimeout / 2);
 }
+  } catch (InterruptedException ie) {
+LOG.debug("Successfully stop the heartbeating the transaction {}", 
this.compactionTxn);
   } catch (Exception e) {
-LOG.error("Error while heartbeating txn {} in {}, error: ", 
compactionTxn, Thread.currentThread().getName(), e.getMessage());

Review comment:
   The thread name will be captured by the logging framework, and the 
message too.  Just need to pass in the entire Exception object.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 596229)
Time Spent: 20m  (was: 10m)

> Simplify TXN Compactor Heartbeat Thread
> ---
>
> Key: HIVE-25112
> URL: https://issues.apache.org/jira/browse/HIVE-25112
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Simplify the Thread structure.  Threads do not need a "start"/"stop" state, 
> they already have it.  It is running/interrupted and it is designed to work 
> this way with thread pools and forced exits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25112) Simplify TXN Compactor Heartbeat Thread

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25112?focusedWorklogId=596228&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-596228
 ]

ASF GitHub Bot logged work on HIVE-25112:
-

Author: ASF GitHub Bot
Created on: 13/May/21 16:21
Start Date: 13/May/21 16:21
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #2270:
URL: https://github.com/apache/hive/pull/2270


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 596228)
Remaining Estimate: 0h
Time Spent: 10m

> Simplify TXN Compactor Heartbeat Thread
> ---
>
> Key: HIVE-25112
> URL: https://issues.apache.org/jira/browse/HIVE-25112
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Simplify the Thread structure.  Threads do not need a "start"/"stop" state, 
> they already have it.  It is running/interrupted and it is designed to work 
> this way with thread pools and forced exits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25112) Simplify TXN Compactor Heartbeat Thread

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25112:
--
Labels: pull-request-available  (was: )

> Simplify TXN Compactor Heartbeat Thread
> ---
>
> Key: HIVE-25112
> URL: https://issues.apache.org/jira/browse/HIVE-25112
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Simplify the Thread structure.  Threads do not need a "start"/"stop" state, 
> they already have it.  It is running/interrupted and it is designed to work 
> this way with thread pools and forced exits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25112) Simplify TXN Compactor Heartbeat Thread

2021-05-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25112:
-


> Simplify TXN Compactor Heartbeat Thread
> ---
>
> Key: HIVE-25112
> URL: https://issues.apache.org/jira/browse/HIVE-25112
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> Simplify the Thread structure.  Threads do not need a "start"/"stop" state, 
> they already have it.  It is running/interrupted and it is designed to work 
> this way with thread pools and forced exits.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25111) Metastore Catalog Methods JDO Persistence

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25111?focusedWorklogId=596187&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-596187
 ]

ASF GitHub Bot logged work on HIVE-25111:
-

Author: ASF GitHub Bot
Created on: 13/May/21 15:08
Start Date: 13/May/21 15:08
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2269:
URL: https://github.com/apache/hive/pull/2269#discussion_r631883645



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
##
@@ -160,10 +160,11 @@
* @param catName name of the catalog to alter.
* @param cat new version of the catalog.
* @throws MetaException something went wrong, usually in the database.
+   * @throws NoSuchObjectException no catalog of this name exists.
* @throws InvalidOperationException attempt to change something about the 
catalog that is not
* changeable, like the name.
*/
-  void alterCatalog(String catName, Catalog cat) throws MetaException, 
InvalidOperationException;
+  void alterCatalog(String catName, Catalog cat) throws MetaException, 
NoSuchObjectException, InvalidOperationException;

Review comment:
   This should also throw NoSuchObjectException if user is attempting to 
alter a non-existing catalog




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 596187)
Time Spent: 0.5h  (was: 20m)

> Metastore Catalog Methods JDO Persistence
> -
>
> Key: HIVE-25111
> URL: https://issues.apache.org/jira/browse/HIVE-25111
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25111) Metastore Catalog Methods JDO Persistence

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25111:
--
Labels: pull-request-available  (was: )

> Metastore Catalog Methods JDO Persistence
> -
>
> Key: HIVE-25111
> URL: https://issues.apache.org/jira/browse/HIVE-25111
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25111) Metastore Catalog Methods JDO Persistence

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25111?focusedWorklogId=596185&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-596185
 ]

ASF GitHub Bot logged work on HIVE-25111:
-

Author: ASF GitHub Bot
Created on: 13/May/21 15:07
Start Date: 13/May/21 15:07
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #2269:
URL: https://github.com/apache/hive/pull/2269#discussion_r631882813



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##
@@ -642,112 +642,140 @@ public void rollbackTransaction() {
   @Override
   public void createCatalog(Catalog cat) throws MetaException {
 LOG.debug("Creating catalog {}", cat);
-boolean committed = false;
+
 MCatalog mCat = catToMCat(cat);
+
+final Transaction tx = pm.currentTransaction();
+
 try {
-  openTransaction();
+  tx.begin();
   pm.makePersistent(mCat);
-  committed = commitTransaction();
+  tx.commit();
 } finally {
-  if (!committed) {
-rollbackTransaction();
+  if (tx.isActive()) {
+tx.rollback();
   }
 }
   }
 
   @Override
   public void alterCatalog(String catName, Catalog cat)
-  throws MetaException, InvalidOperationException {
+  throws MetaException, InvalidOperationException, NoSuchObjectException {

Review comment:
   This should also throw NoSuchObjectException if user is attempting to 
alter a non-existing catalog




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 596185)
Time Spent: 20m  (was: 10m)

> Metastore Catalog Methods JDO Persistence
> -
>
> Key: HIVE-25111
> URL: https://issues.apache.org/jira/browse/HIVE-25111
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25111) Metastore Catalog Methods JDO Persistence

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25111?focusedWorklogId=596181&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-596181
 ]

ASF GitHub Bot logged work on HIVE-25111:
-

Author: ASF GitHub Bot
Created on: 13/May/21 15:07
Start Date: 13/May/21 15:07
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #2269:
URL: https://github.com/apache/hive/pull/2269


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 596181)
Remaining Estimate: 0h
Time Spent: 10m

> Metastore Catalog Methods JDO Persistence
> -
>
> Key: HIVE-25111
> URL: https://issues.apache.org/jira/browse/HIVE-25111
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25111) Metastore Catalog Methods JDO Persistence

2021-05-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25111:
-


> Metastore Catalog Methods JDO Persistence
> -
>
> Key: HIVE-25111
> URL: https://issues.apache.org/jira/browse/HIVE-25111
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25111) Metastore Catalog Methods JDO Persistence

2021-05-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-25111:
--
Priority: Minor  (was: Major)

> Metastore Catalog Methods JDO Persistence
> -
>
> Key: HIVE-25111
> URL: https://issues.apache.org/jira/browse/HIVE-25111
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25110) Upgrade JDO Persistence to Use DN5 Features

2021-05-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-25110:
-


> Upgrade JDO Persistence to Use DN5 Features
> ---
>
> Key: HIVE-25110
> URL: https://issues.apache.org/jira/browse/HIVE-25110
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Standalone Metastore
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> Hive has updated DataNucealus for Hive v4 but is not taking advantage of new 
> features and paradigms.  There's a ton of code in Hive that can be removed in 
> favor or relying on the underlying libraries using their best practices.
>  
> https://www.datanucleus.org/products/accessplatform_5_2/index.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24802) Show operation log at webui

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24802?focusedWorklogId=596130&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-596130
 ]

ASF GitHub Bot logged work on HIVE-24802:
-

Author: ASF GitHub Bot
Created on: 13/May/21 13:51
Start Date: 13/May/21 13:51
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #1998:
URL: https://github.com/apache/hive/pull/1998


   
   
   ### What changes were proposed in this pull request?
   
   
   
   
   ### Why are the changes needed?
   Currently we provide getQueryLog in HiveStatement to fetch the operation 
log,  and the operation log would be deleted on operation closing(delay for the 
canceled operation).  Sometimes it's would be not easy for the user(jdbc) or 
administrators to deep into the details of the finished(failed) operation, so 
we present the operation log on webui and keep the operation log for some time 
for latter analysis.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   default is disabled,  should set hive.server2.historic.operation.log.enabled 
= true to enable it.
   
   
   
   ### How was this patch tested?
   unit test / local machine
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 596130)
Time Spent: 2h 50m  (was: 2h 40m)

> Show operation log at webui
> ---
>
> Key: HIVE-24802
> URL: https://issues.apache.org/jira/browse/HIVE-24802
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: operationlog.png
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently we provide getQueryLog in HiveStatement to fetch the operation log, 
>  and the operation log would be deleted on operation closing(delay for the 
> canceled operation).  Sometimes it's would be not easy for the user(jdbc) or 
> administrators to deep into the details of the finished(failed) operation, so 
> we present the operation log on webui and keep the operation log for some 
> time for latter analysis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24802) Show operation log at webui

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24802?focusedWorklogId=596129&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-596129
 ]

ASF GitHub Bot logged work on HIVE-24802:
-

Author: ASF GitHub Bot
Created on: 13/May/21 13:50
Start Date: 13/May/21 13:50
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 closed pull request #1998:
URL: https://github.com/apache/hive/pull/1998


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 596129)
Time Spent: 2h 40m  (was: 2.5h)

> Show operation log at webui
> ---
>
> Key: HIVE-24802
> URL: https://issues.apache.org/jira/browse/HIVE-24802
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: operationlog.png
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently we provide getQueryLog in HiveStatement to fetch the operation log, 
>  and the operation log would be deleted on operation closing(delay for the 
> canceled operation).  Sometimes it's would be not easy for the user(jdbc) or 
> administrators to deep into the details of the finished(failed) operation, so 
> we present the operation log on webui and keep the operation log for some 
> time for latter analysis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24802) Show operation log at webui

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24802?focusedWorklogId=596127&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-596127
 ]

ASF GitHub Bot logged work on HIVE-24802:
-

Author: ASF GitHub Bot
Created on: 13/May/21 13:50
Start Date: 13/May/21 13:50
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1998:
URL: https://github.com/apache/hive/pull/1998#issuecomment-840575070


   trigger a new test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 596127)
Time Spent: 2.5h  (was: 2h 20m)

> Show operation log at webui
> ---
>
> Key: HIVE-24802
> URL: https://issues.apache.org/jira/browse/HIVE-24802
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
> Attachments: operationlog.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently we provide getQueryLog in HiveStatement to fetch the operation log, 
>  and the operation log would be deleted on operation closing(delay for the 
> canceled operation).  Sometimes it's would be not easy for the user(jdbc) or 
> administrators to deep into the details of the finished(failed) operation, so 
> we present the operation log on webui and keep the operation log for some 
> time for latter analysis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25109) CBO fails when updating table has constraints defined

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25109:
--
Labels: pull-request-available  (was: )

> CBO fails when updating table has constraints defined
> -
>
> Key: HIVE-25109
> URL: https://issues.apache.org/jira/browse/HIVE-25109
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> create table acid_uami_n0(i int,
>  de decimal(5,2) constraint nn1 not null enforced,
>  vc varchar(128) constraint ch2 CHECK (de >= cast(i as 
> decimal(5,2))) enforced)
>  clustered by (i) into 2 buckets stored as orc TBLPROPERTIES 
> ('transactional'='true');
> -- update
> explain cbo
> update acid_uami_n0 set de = 893.14 where de = 103.00;
> {code}
> hive.log
> {code}
> 2021-05-13T06:08:05,547 ERROR [061f4d3b-9cbd-464f-80db-f0cd443dc3d7 main] 
> parse.UpdateDeleteSemanticAnalyzer: CBO failed, skipping CBO. 
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Result 
> Schema didn't match Optimized Op Tree Schema
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.renameTopLevelSelectInResultSchema(PlanModifierForASTConv.java:217)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.convertOpTree(PlanModifierForASTConv.java:105)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1410)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12488)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:67)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze(UpdateDeleteSemanticAnalyzer.java:208)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeUpdate(UpdateDeleteSemanticAnalyzer.java:63)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyze(UpdateDeleteSemanticAnalyzer.java:53)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:72)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver

[jira] [Work logged] (HIVE-25109) CBO fails when updating table has constraints defined

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25109?focusedWorklogId=596105&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-596105
 ]

ASF GitHub Bot logged work on HIVE-25109:
-

Author: ASF GitHub Bot
Created on: 13/May/21 13:23
Start Date: 13/May/21 13:23
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #2268:
URL: https://github.com/apache/hive/pull/2268


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 596105)
Remaining Estimate: 0h
Time Spent: 10m

> CBO fails when updating table has constraints defined
> -
>
> Key: HIVE-25109
> URL: https://issues.apache.org/jira/browse/HIVE-25109
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> create table acid_uami_n0(i int,
>  de decimal(5,2) constraint nn1 not null enforced,
>  vc varchar(128) constraint ch2 CHECK (de >= cast(i as 
> decimal(5,2))) enforced)
>  clustered by (i) into 2 buckets stored as orc TBLPROPERTIES 
> ('transactional'='true');
> -- update
> explain cbo
> update acid_uami_n0 set de = 893.14 where de = 103.00;
> {code}
> hive.log
> {code}
> 2021-05-13T06:08:05,547 ERROR [061f4d3b-9cbd-464f-80db-f0cd443dc3d7 main] 
> parse.UpdateDeleteSemanticAnalyzer: CBO failed, skipping CBO. 
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Result 
> Schema didn't match Optimized Op Tree Schema
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.renameTopLevelSelectInResultSchema(PlanModifierForASTConv.java:217)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.convertOpTree(PlanModifierForASTConv.java:105)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1410)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12488)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:67)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze(UpdateDeleteSemanticAnalyzer.java:208)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeUpdate(UpdateDeleteSemanticAnalyzer.java:63)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyze(UpdateDeleteSemanticAnalyzer.java:53)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:72)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  [hive-e

[jira] [Assigned] (HIVE-25109) CBO fails when updating table has constraints defined

2021-05-13 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-25109:
-


> CBO fails when updating table has constraints defined
> -
>
> Key: HIVE-25109
> URL: https://issues.apache.org/jira/browse/HIVE-25109
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Logical Optimizer
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> {code}
> create table acid_uami_n0(i int,
>  de decimal(5,2) constraint nn1 not null enforced,
>  vc varchar(128) constraint ch2 CHECK (de >= cast(i as 
> decimal(5,2))) enforced)
>  clustered by (i) into 2 buckets stored as orc TBLPROPERTIES 
> ('transactional'='true');
> -- update
> explain cbo
> update acid_uami_n0 set de = 893.14 where de = 103.00;
> {code}
> hive.log
> {code}
> 2021-05-13T06:08:05,547 ERROR [061f4d3b-9cbd-464f-80db-f0cd443dc3d7 main] 
> parse.UpdateDeleteSemanticAnalyzer: CBO failed, skipping CBO. 
> org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Result 
> Schema didn't match Optimized Op Tree Schema
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.renameTopLevelSelectInResultSchema(PlanModifierForASTConv.java:217)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.PlanModifierForASTConv.convertOpTree(PlanModifierForASTConv.java:105)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:119)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1410)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12488)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:449)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:67)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.reparseAndSuperAnalyze(UpdateDeleteSemanticAnalyzer.java:208)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeUpdate(UpdateDeleteSemanticAnalyzer.java:63)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyze(UpdateDeleteSemanticAnalyzer.java:53)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.RewriteSemanticAnalyzer.analyzeInternal(RewriteSemanticAnalyzer.java:72)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:316)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403) 
> [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.ree

[jira] [Resolved] (HIVE-25108) Do Not Log and Throw MetaExceptions

2021-05-13 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-25108.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master. Thanks [~mgergely] for the review!

> Do Not Log and Throw MetaExceptions
> ---
>
> Key: HIVE-25108
> URL: https://issues.apache.org/jira/browse/HIVE-25108
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> "Log and throw" is a bad pattern and leads to logging the same error multiple 
> times.
> There is code in Hive that explicitly implements this behavior and should 
> therefore be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25108) Do Not Log and Throw MetaExceptions

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25108?focusedWorklogId=596103&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-596103
 ]

ASF GitHub Bot logged work on HIVE-25108:
-

Author: ASF GitHub Bot
Created on: 13/May/21 13:18
Start Date: 13/May/21 13:18
Worklog Time Spent: 10m 
  Work Description: belugabehr merged pull request #2265:
URL: https://github.com/apache/hive/pull/2265


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 596103)
Time Spent: 20m  (was: 10m)

> Do Not Log and Throw MetaExceptions
> ---
>
> Key: HIVE-25108
> URL: https://issues.apache.org/jira/browse/HIVE-25108
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> "Log and throw" is a bad pattern and leads to logging the same error multiple 
> times.
> There is code in Hive that explicitly implements this behavior and should 
> therefore be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24663) Batch process in ColStatsProcessor for partitions.

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24663?focusedWorklogId=595957&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-595957
 ]

ASF GitHub Bot logged work on HIVE-24663:
-

Author: ASF GitHub Bot
Created on: 13/May/21 08:00
Start Date: 13/May/21 08:00
Worklog Time Spent: 10m 
  Work Description: maheshk114 opened a new pull request #2266:
URL: https://github.com/apache/hive/pull/2266


   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 595957)
Remaining Estimate: 0h
Time Spent: 10m

> Batch process in ColStatsProcessor for partitions.
> --
>
> Key: HIVE-24663
> URL: https://issues.apache.org/jira/browse/HIVE-24663
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: performance
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When large number of partitions (>20K) are processed, ColStatsProcessor runs 
> into DB issues. 
> {{ db.setPartitionColumnStatistics(request);}} gets stuck for hours together 
> and in some cases postgres stops processing. 
> It would be good to introduce small batches for stats gathering in 
> ColStatsProcessor instead of bulk update.
> Ref: 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L181
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L199



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24663) Batch process in ColStatsProcessor for partitions.

2021-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24663:
--
Labels: performance pull-request-available  (was: performance)

> Batch process in ColStatsProcessor for partitions.
> --
>
> Key: HIVE-24663
> URL: https://issues.apache.org/jira/browse/HIVE-24663
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When large number of partitions (>20K) are processed, ColStatsProcessor runs 
> into DB issues. 
> {{ db.setPartitionColumnStatistics(request);}} gets stuck for hours together 
> and in some cases postgres stops processing. 
> It would be good to introduce small batches for stats gathering in 
> ColStatsProcessor instead of bulk update.
> Ref: 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L181
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L199



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24663) Batch process in ColStatsProcessor for partitions.

2021-05-13 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-24663:
---
Summary: Batch process in ColStatsProcessor for partitions.  (was: Batch 
process in ColStatsProcessor)

> Batch process in ColStatsProcessor for partitions.
> --
>
> Key: HIVE-24663
> URL: https://issues.apache.org/jira/browse/HIVE-24663
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: performance
>
> When large number of partitions (>20K) are processed, ColStatsProcessor runs 
> into DB issues. 
> {{ db.setPartitionColumnStatistics(request);}} gets stuck for hours together 
> and in some cases postgres stops processing. 
> It would be good to introduce small batches for stats gathering in 
> ColStatsProcessor instead of bulk update.
> Ref: 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L181
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L199



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24663) Batch process in ColStatsProcessor

2021-05-13 Thread mahesh kumar behera (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343755#comment-17343755
 ] 

mahesh kumar behera commented on HIVE-24663:


The original issue with the slowness in because of the way column stats are 
processed at HMS. The stats are updated one by one at HMS using JDO 
connections. This was resulting into performance issues as JDO does lots of 
conversion. So the proper fix is to batch the processing into single sql 
statements and execute it using direct sql. 

> Batch process in ColStatsProcessor
> --
>
> Key: HIVE-24663
> URL: https://issues.apache.org/jira/browse/HIVE-24663
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: performance
>
> When large number of partitions (>20K) are processed, ColStatsProcessor runs 
> into DB issues. 
> {{ db.setPartitionColumnStatistics(request);}} gets stuck for hours together 
> and in some cases postgres stops processing. 
> It would be good to introduce small batches for stats gathering in 
> ColStatsProcessor instead of bulk update.
> Ref: 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L181
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L199



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24663) Batch process in ColStatsProcessor

2021-05-13 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera reassigned HIVE-24663:
--

Assignee: mahesh kumar behera

> Batch process in ColStatsProcessor
> --
>
> Key: HIVE-24663
> URL: https://issues.apache.org/jira/browse/HIVE-24663
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: performance
>
> When large number of partitions (>20K) are processed, ColStatsProcessor runs 
> into DB issues. 
> {{ db.setPartitionColumnStatistics(request);}} gets stuck for hours together 
> and in some cases postgres stops processing. 
> It would be good to introduce small batches for stats gathering in 
> ColStatsProcessor instead of bulk update.
> Ref: 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L181
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L199



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25107) Thread classpath logging should be optional/DEBUG

2021-05-13 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-25107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343749#comment-17343749
 ] 

László Bodor commented on HIVE-25107:
-

[~zmatyus]: thanks for the input, I agree that we can lower it to DEBUG in 
other cases too, I'm going to handle it!

> Thread classpath logging should be optional/DEBUG
> -
>
> Key: HIVE-25107
> URL: https://issues.apache.org/jira/browse/HIVE-25107
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> This is since HIVE-21584
> I have a *72M* llap executor log file, then I grepped for only "thread class 
> path", piped into a separate file and a result was a *22M* file...1/3-1/4 of 
> the file was classpath info which is not usable for most of the time. This 
> overwhelming amount of classpath info is not needed, assuming that classpath 
> issues are reproducible with more or less effort, user should be responsible 
> for turning on this expensive logging on demand. Not to mention performance 
> implications which cannot be ignored beyond a certain amount of log messages.
> https://github.com/apache/hive/commit/a234475faa2cab2606f2a74eb9ca071f006998e2#diff-44b2ff3a3c4a6cfcaed0fcb40b74031844f8586e40a6f8261637e5ebcd558b73R4577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24948) Enhancing performance of OrcInputFormat.getSplits with bucket pruning

2021-05-13 Thread Eugene Chung (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Chung updated HIVE-24948:

Description: 
The summarized flow of generating input splits at Tez AM is like below; (by 
calling HiveSplitGenerator.initialize())
 # Perform dynamic partition pruning
 # Get the list of InputSplit by calling InputFormat.getSplits()
 
[https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L260-L260]
 # Perform bucket pruning with the list above if it's possible
 
[https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L299-L301]

But I observed that the action 2, getting the list of InputSplit, can make big 
overhead when the inputs are ORC files in HDFS. 

For example, there is a ORC table T partitioned by 'log_date' and each 
partition is bucketed by a column 'q'. There are 240 buckets in each partition 
and the size of each bucket(ORC file) is, let's say, 100MB.

The SQL is like this.  
{noformat}
set hive.tez.bucket.pruning=true;
select count(*) from T
where log_date between '2020-01-01' and '2020-06-30'
and q = 'foobar';{noformat}
It means there are 240 * 183(days) = 43680 ORC files in the input paths, but 
thanks to bucket pruning, only 183 files should be processed.

In my company's environment, the whole processing time of the SQL was roughly 5 
minutes. However, I've checked that it took more than 3 minutes to make the 
list of OrcSplit for 43680 ORC files. The logs with tez.am.log.level=DEBUG 
showed like below;
{noformat}
2021-03-25 01:21:31,850 [DEBUG] [InputInitializer {Map 1} #0] 
|orc.OrcInputFormat|: getSplits started
...
2021-03-25 01:24:51,435 [DEBUG] [InputInitializer {Map 1} #0] 
|orc.OrcInputFormat|: getSplits finished
2021-03-25 01:24:51,444 [INFO] [InputInitializer {Map 1} #0] 
|io.HiveInputFormat|: number of splits 43680
2021-03-25 01:24:51,444 [DEBUG] [InputInitializer {Map 1} #0] |log.PerfLogger|: 

...
2021-03-25 01:26:03,385 [INFO] [Dispatcher thread {Central}] 
|app.DAGAppMaster|: DAG completed, dagId=dag_1615862187190_731117_1, 
dagState=SUCCEEDED {noformat}
43680 - 183 = 43497 InputSplits which consume about 60% of entire processing 
time are just simply discarded by the action 3, pruneBuckets().

 

With bucket pruning, I think making the whole list of ORC InputSplit is not 
necessary.

Therefore, I suggest that the flow would be like this;
 # Perform dynamic partition pruning
 # Get the list of InputSplit by calling InputFormat.getSplits()
 ## OrcInputFormat.getSplits() returns the bucket-pruned list if BitSet from 
FixedBucketPruningOptimizer exists

  was:
The summarized flow of generating input splits at Tez AM is like below; (by 
calling HiveSplitGenerator.initialize())
 # Perform dynamic partition pruning
 # Get the list of InputSplit by calling InputFormat.getSplits()
 
[https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L260-L260]
 # Perform bucket pruning with the list above if it's possible
 
[https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L299-L301]

But I observed that the action 2, getting the list of InputSplit, can make big 
overhead when the inputs are ORC files in HDFS. 

For example, there is a ORC table T partitioned by 'log_date' and each 
partition is bucketed by a column 'q'. There are 240 buckets in each partition 
and the size of each bucket(ORC file) is, let's say, 100MB.

The SQL is like this.  
{noformat}
set hive.tez.bucket.pruning=true;
select q, count(*) from T
where log_date between '2020-01-01' and '2020-06-30'
and q = 'foobar'
group by q;{noformat}
It means there are 240 * 183(days) = 43680 ORC files in the input paths, but 
thanks to bucket pruning, only 183 files should be processed.

In my company's environment, the whole processing time of the SQL was roughly 5 
minutes. However, I've checked that it took more than 3 minutes to make the 
list of OrcSplit for 43680 ORC files. The logs with tez.am.log.level=DEBUG 
showed like below;
{noformat}
2021-03-25 01:21:31,850 [DEBUG] [InputInitializer {Map 1} #0] 
|orc.OrcInputFormat|: getSplits started
...
2021-03-25 01:24:51,435 [DEBUG] [InputInitializer {Map 1} #0] 
|orc.OrcInputFormat|: getSplits finished
2021-03-25 01:24:51,444 [INFO] [InputInitializer {Map 1} #0] 
|io.HiveInputFormat|: number of splits 43680
2021-03-25 01:24:51,444 [DEBUG] [InputInitializer {Map 1} #0] |log.PerfLogger|: 


[jira] [Updated] (HIVE-24948) Enhancing performance of OrcInputFormat.getSplits with bucket pruning

2021-05-13 Thread Eugene Chung (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Chung updated HIVE-24948:

Description: 
The summarized flow of generating input splits at Tez AM is like below; (by 
calling HiveSplitGenerator.initialize())
 # Perform dynamic partition pruning
 # Get the list of InputSplit by calling InputFormat.getSplits()
 
[https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L260-L260]
 # Perform bucket pruning with the list above if it's possible
 
[https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L299-L301]

But I observed that the action 2, getting the list of InputSplit, can make big 
overhead when the inputs are ORC files in HDFS. 

For example, there is a ORC table T partitioned by 'log_date' and each 
partition is bucketed by a column 'q'. There are 240 buckets in each partition 
and the size of each bucket(ORC file) is, let's say, 100MB.

The SQL is like this.  
{noformat}
set hive.tez.bucket.pruning=true;
select q, count(*) from T
where log_date between '2020-01-01' and '2020-06-30'
and q = 'foobar'
group by q;{noformat}
It means there are 240 * 183(days) = 43680 ORC files in the input paths, but 
thanks to bucket pruning, only 183 files should be processed.

In my company's environment, the whole processing time of the SQL was roughly 5 
minutes. However, I've checked that it took more than 3 minutes to make the 
list of OrcSplit for 43680 ORC files. The logs with tez.am.log.level=DEBUG 
showed like below;
{noformat}
2021-03-25 01:21:31,850 [DEBUG] [InputInitializer {Map 1} #0] 
|orc.OrcInputFormat|: getSplits started
...
2021-03-25 01:24:51,435 [DEBUG] [InputInitializer {Map 1} #0] 
|orc.OrcInputFormat|: getSplits finished
2021-03-25 01:24:51,444 [INFO] [InputInitializer {Map 1} #0] 
|io.HiveInputFormat|: number of splits 43680
2021-03-25 01:24:51,444 [DEBUG] [InputInitializer {Map 1} #0] |log.PerfLogger|: 

...
2021-03-25 01:26:03,385 [INFO] [Dispatcher thread {Central}] 
|app.DAGAppMaster|: DAG completed, dagId=dag_1615862187190_731117_1, 
dagState=SUCCEEDED {noformat}
43680 - 183 = 43497 InputSplits which consume about 60% of entire processing 
time are just simply discarded by the action 3, pruneBuckets().

 

With bucket pruning, I think making the whole list of ORC InputSplit is not 
necessary.

Therefore, I suggest that the flow would be like this;
 # Perform dynamic partition pruning
 # Get the list of InputSplit by calling InputFormat.getSplits()
 ## OrcInputFormat.getSplits() returns the bucket-pruned list if BitSet from 
FixedBucketPruningOptimizer exists

  was:
The summarized flow of generating input splits at Tez AM is like below; (by 
calling HiveSplitGenerator.initialize())
 # Perform dynamic partition pruning
 # Get the list of InputSplit by calling InputFormat.getSplits()
 
[https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L260-L260]
 # Perform bucket pruning with the list above if it's possible
 
[https://github.com/apache/hive/blob/624f62aadc08577cafaa299cfcf17c71fa6cdb3a/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L299-L301]

But I observed that the action 2, getting the list of InputSplit, can make big 
overhead when the inputs are ORC files in HDFS.

 

For example, there is a ORC table T partitioned by 'log_date' and each 
partition is bucketed by a column 'q'. There are 240 buckets in each partition 
and the size of each bucket(ORC file) is, let's say, 100MB.

The SQL is like this.  
{noformat}
set hive.tez.bucket.pruning=true;
select q, count(*) from T
where log_date between '2020-01-01' and '2020-06-30'
and q = 'foobar'
group by q;{noformat}
It means there are 240 * 183(days) = 43680 ORC files in the input path, but 
thanks to bucket pruning, only 183 files should be processed.

In my company's environment, the whole processing time of the SQL was roughly 5 
minutes. However, I've checked that it took more than 3 minutes to make the 
list of OrcSplit for 43680 ORC files. The logs with tez.am.log.level=DEBUG 
showed like below;
{noformat}
2021-03-25 01:21:31,850 [DEBUG] [InputInitializer {Map 1} #0] 
|orc.OrcInputFormat|: getSplits started
...
2021-03-25 01:24:51,435 [DEBUG] [InputInitializer {Map 1} #0] 
|orc.OrcInputFormat|: getSplits finished
2021-03-25 01:24:51,444 [INFO] [InputInitializer {Map 1} #0] 
|io.HiveInputFormat|: number of splits 43680
2021-03-25 01:24:51,444 [DEBUG] [InputInitializer {Map 1} #0] |log.PerfLogger|: