[jira] [Resolved] (HIVE-25847) Unable to run hive issue

2022-01-04 Thread Tushar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tushar  resolved HIVE-25847.

Release Note: Issue resolved 
  Resolution: Fixed

> Unable to run hive issue
> 
>
> Key: HIVE-25847
> URL: https://issues.apache.org/jira/browse/HIVE-25847
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Tushar 
>Assignee: Tushar 
>Priority: Minor
>
> Unable to connect hive



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25847) Unable to run hive issue

2022-01-04 Thread Tushar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17469044#comment-17469044
 ] 

Tushar  commented on HIVE-25847:


RCA-issue resolved

> Unable to run hive issue
> 
>
> Key: HIVE-25847
> URL: https://issues.apache.org/jira/browse/HIVE-25847
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Tushar 
>Assignee: Tushar 
>Priority: Minor
>
> Unable to connect hive



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25847) Unable to run hive issue

2022-01-04 Thread Tushar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tushar  reassigned HIVE-25847:
--

Assignee: Tushar 

> Unable to run hive issue
> 
>
> Key: HIVE-25847
> URL: https://issues.apache.org/jira/browse/HIVE-25847
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Tushar 
>Assignee: Tushar 
>Priority: Minor
>
> Unable to connect hive



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-21100) Allow flattening of table subdirectories resulted when using TEZ engine and UNION clause

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-21100:
--
Labels: pull-request-available  (was: )

> Allow flattening of table subdirectories resulted when using TEZ engine and 
> UNION clause
> 
>
> Key: HIVE-21100
> URL: https://issues.apache.org/jira/browse/HIVE-21100
> Project: Hive
>  Issue Type: Improvement
>Reporter: George Pachitariu
>Assignee: George Pachitariu
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, 
> HIVE-21100.3.patch, HIVE-21100.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Right now, when writing data into a table with Tez engine and the clause 
> UNION ALL is the last step of the query, Hive on Tez will create a 
> subdirectory for each branch of the UNION ALL.
> With this patch the subdirectories are removed, and the files are renamed and 
> moved to the parent directory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-21100) Allow flattening of table subdirectories resulted when using TEZ engine and UNION clause

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21100?focusedWorklogId=703743=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703743
 ]

ASF GitHub Bot logged work on HIVE-21100:
-

Author: ASF GitHub Bot
Created on: 05/Jan/22 05:15
Start Date: 05/Jan/22 05:15
Worklog Time Spent: 10m 
  Work Description: hsnusonic opened a new pull request #2921:
URL: https://github.com/apache/hive/pull/2921


   
   
   ### What changes were proposed in this pull request?
   
   A flag hive.tez.union.flatten.subdirectories is added. If the flag is true, 
the UNION subdirectories will be removed.
   
   ### Why are the changes needed?
   
   Additional directories add burdens to file system.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, if the hive.tez.union.flatten.subdirectories is set to true, the result 
file structure will be different after UNION ALL.
   
   ### How was this patch tested?
   
   A new unit test TestMoveTask is added to ensure the function 
`flattenUnionSubdirectories()` work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703743)
Remaining Estimate: 0h
Time Spent: 10m

> Allow flattening of table subdirectories resulted when using TEZ engine and 
> UNION clause
> 
>
> Key: HIVE-21100
> URL: https://issues.apache.org/jira/browse/HIVE-21100
> Project: Hive
>  Issue Type: Improvement
>Reporter: George Pachitariu
>Assignee: George Pachitariu
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, 
> HIVE-21100.3.patch, HIVE-21100.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Right now, when writing data into a table with Tez engine and the clause 
> UNION ALL is the last step of the query, Hive on Tez will create a 
> subdirectory for each branch of the UNION ALL.
> With this patch the subdirectories are removed, and the files are renamed and 
> moved to the parent directory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25783) Enforce ASF headers on Metastore

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25783?focusedWorklogId=703730=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703730
 ]

ASF GitHub Bot logged work on HIVE-25783:
-

Author: ASF GitHub Bot
Created on: 05/Jan/22 04:15
Start Date: 05/Jan/22 04:15
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2852:
URL: https://github.com/apache/hive/pull/2852#discussion_r778534809



##
File path: standalone-metastore/metastore-common/pom.xml
##
@@ -530,84 +499,6 @@
   
 
   
-  

Review comment:
   Same as above, we do not need integration test in `metastore-common`, 
maybe this plugin should be placed at `metastore-server`, which has aready been 
here.

##
File path: standalone-metastore/metastore-common/pom.xml
##
@@ -445,21 +436,6 @@
 org.apache.maven.plugins
 maven-antrun-plugin
 
-  

Review comment:
   `metastore-common` provides the common utils and apis for metastore, and 
we have only one test class in this module: 
[RetryTest](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/test/java/org/apache/hadoop/hive/metastore/utils/RetryTest.java),
 this can be run without any requirements.

##
File path: 
standalone-metastore/metastore-common/src/main/resources/datanucleus-log4j.properties
##
@@ -1,17 +0,0 @@
-# Define the destination and format of our logging
-log4j.appender.A1=org.apache.log4j.FileAppender

Review comment:
   Do not see any usage of this file in `metastore-common`, so removed...

##
File path: standalone-metastore/metastore-server/pom.xml
##
@@ -514,26 +514,6 @@
   
 
   
-  

Review comment:
   There is no protobuf file in `metastore-server`, this should be in 
`metastore-common` instead

##
File path: standalone-metastore/metastore-common/pom.xml
##
@@ -404,15 +404,6 @@
   
 
   
-

Review comment:
   No package.jdo file under the `metastore-common` directory

##
File path: standalone-metastore/metastore-common/pom.xml
##
@@ -640,22 +531,6 @@
   
 
   
-  

Review comment:
   There is no need for antlr parsing the grammar file in 
`metastore-common`, perhaps we need it in `metastore-server` for parsing 
`Filter.g`: 
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/parser/Filter.g.

##
File path: 
standalone-metastore/metastore-common/src/main/resources/metastore-site.xml
##
@@ -1,34 +0,0 @@
-
-
> Key: HIVE-25783
> URL: https://issues.apache.org/jira/browse/HIVE-25783
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The Jira tries to investigate if we can provide rat check to the CI, make 
> sure that the newly added source files contain the ASF license information. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25783) Enforce ASF headers on Metastore

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25783?focusedWorklogId=703731=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703731
 ]

ASF GitHub Bot logged work on HIVE-25783:
-

Author: ASF GitHub Bot
Created on: 05/Jan/22 04:15
Start Date: 05/Jan/22 04:15
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on a change in pull request #2852:
URL: https://github.com/apache/hive/pull/2852#discussion_r778531171



##
File path: standalone-metastore/metastore-common/pom.xml
##
@@ -640,22 +531,6 @@
   
 
   
-  

Review comment:
   No need for antlr parsing the grammar file in `metastore-common`, 
perhaps we need it in `metastore-server` for parsing `Filter.g`: 
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/parser/Filter.g.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703731)
Time Spent: 50m  (was: 40m)

> Enforce ASF headers on Metastore
> 
>
> Key: HIVE-25783
> URL: https://issues.apache.org/jira/browse/HIVE-25783
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The Jira tries to investigate if we can provide rat check to the CI, make 
> sure that the newly added source files contain the ASF license information. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25846) Ensure that deregistering hive servers works, even after zookeeper session expired

2022-01-04 Thread Jeongdae Kim (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeongdae Kim updated HIVE-25846:

Description: Zookeeper watchers are one time trigger and when zookeeper 
session is expired by long gc pause or something and is reconnected, all 
watchers already registered are gone. so, we should add deregister watchers 
again to get notification.  (was: Zookeeper watchers are one time trigger and 
when zookeeper session is expired by long gc pause or something, we should add 
deregister watchers again to get notification.)

> Ensure that deregistering hive servers works, even after zookeeper session 
> expired
> --
>
> Key: HIVE-25846
> URL: https://issues.apache.org/jira/browse/HIVE-25846
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.8, 3.1.2
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Zookeeper watchers are one time trigger and when zookeeper session is expired 
> by long gc pause or something and is reconnected, all watchers already 
> registered are gone. so, we should add deregister watchers again to get 
> notification.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (HIVE-25846) Ensure that deregistering hive servers works, even after zookeeper session expired

2022-01-04 Thread Jeongdae Kim (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-25846 started by Jeongdae Kim.
---
> Ensure that deregistering hive servers works, even after zookeeper session 
> expired
> --
>
> Key: HIVE-25846
> URL: https://issues.apache.org/jira/browse/HIVE-25846
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.8, 3.1.2
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Zookeeper watchers are one time trigger and when zookeeper session is expired 
> by long gc pause or something, we should add deregister watchers again to get 
> notification.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25846) Ensure that deregistering hive servers works, even after zookeeper session expired

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25846?focusedWorklogId=703727=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703727
 ]

ASF GitHub Bot logged work on HIVE-25846:
-

Author: ASF GitHub Bot
Created on: 05/Jan/22 04:04
Start Date: 05/Jan/22 04:04
Worklog Time Spent: 10m 
  Work Description: JeongDaeKim opened a new pull request #2920:
URL: https://github.com/apache/hive/pull/2920


   
   
   
   ### What changes were proposed in this pull request?
   
   
   To register deregister watcher again, when zookeeper session re-established 
   
   ### Why are the changes needed?
   
   
   After zookeeper session expired, hive servers can not get node-deleted event 
from zookeeper, even though it registered deregister watcher already. 
therefore, we can't deregister hive servers since session expiration.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   
   A test was added.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703727)
Remaining Estimate: 0h
Time Spent: 10m

> Ensure that deregistering hive servers works, even after zookeeper session 
> expired
> --
>
> Key: HIVE-25846
> URL: https://issues.apache.org/jira/browse/HIVE-25846
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.8, 3.1.2
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Zookeeper watchers are one time trigger and when zookeeper session is expired 
> by long gc pause or something, we should add deregister watchers again to get 
> notification.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25846) Ensure that deregistering hive servers works, even after zookeeper session expired

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25846:
--
Labels: pull-request-available  (was: )

> Ensure that deregistering hive servers works, even after zookeeper session 
> expired
> --
>
> Key: HIVE-25846
> URL: https://issues.apache.org/jira/browse/HIVE-25846
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.8, 3.1.2
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Zookeeper watchers are one time trigger and when zookeeper session is expired 
> by long gc pause or something, we should add deregister watchers again to get 
> notification.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25846) Ensure that deregistering hive servers works, even after zookeeper session expired

2022-01-04 Thread Jeongdae Kim (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeongdae Kim updated HIVE-25846:

Description: Zookeeper watchers are one time trigger and when zookeeper 
session is expired by long gc pause or something, we should add deregister 
watchers again to get notification.  (was: Zookeeper watchers are one time 
trigger and when zookeeper session is expired by long gc pause or something, we 
should add reregister watchers again to get notification.)

> Ensure that deregistering hive servers works, even after zookeeper session 
> expired
> --
>
> Key: HIVE-25846
> URL: https://issues.apache.org/jira/browse/HIVE-25846
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.8, 3.1.2
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>Priority: Minor
>
> Zookeeper watchers are one time trigger and when zookeeper session is expired 
> by long gc pause or something, we should add deregister watchers again to get 
> notification.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25846) Ensure that deregistering hive servers works, even after zookeeper session expired

2022-01-04 Thread Jeongdae Kim (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeongdae Kim updated HIVE-25846:

Description: Zookeeper watchers are one time trigger and when zookeeper 
session is expired by long gc pause or something, we should add reregister 
watchers again to get notification.  (was: Zookeeper watchers are one time 
trigger and when zookeeper sessions is expired by long gc pause or something, 
we should add reregister watchers again to get notification.)

> Ensure that deregistering hive servers works, even after zookeeper session 
> expired
> --
>
> Key: HIVE-25846
> URL: https://issues.apache.org/jira/browse/HIVE-25846
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.8, 3.1.2
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>Priority: Minor
>
> Zookeeper watchers are one time trigger and when zookeeper session is expired 
> by long gc pause or something, we should add reregister watchers again to get 
> notification.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25846) Ensure that deregistering hive servers works, even after zookeeper session expired

2022-01-04 Thread Jeongdae Kim (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeongdae Kim reassigned HIVE-25846:
---


> Ensure that deregistering hive servers works, even after zookeeper session 
> expired
> --
>
> Key: HIVE-25846
> URL: https://issues.apache.org/jira/browse/HIVE-25846
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.2, 2.3.8
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>Priority: Minor
>
> Zookeeper watchers are one time trigger and when zookeeper sessions is 
> expired by long gc pause or something, we should add reregister watchers 
> again to get notification.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25623) Create a parametrized test to check against the disabled MIN_HISTORY config

2022-01-04 Thread Mark Bathori (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Bathori updated HIVE-25623:

Status: Patch Available  (was: Open)

> Create a parametrized test to check against the disabled MIN_HISTORY config
> ---
>
> Key: HIVE-25623
> URL: https://issues.apache.org/jira/browse/HIVE-25623
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Mark Bathori
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently all test cases in TestDbTxnManager2/TestCommands(X) run against 
> enabled MIN_HISTORY config. We should also execute them for the scenario when 
> MIN_HISTORY_LEVEL table is missing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25623) Create a parametrized test to check against the disabled MIN_HISTORY config

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25623?focusedWorklogId=703459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703459
 ]

ASF GitHub Bot logged work on HIVE-25623:
-

Author: ASF GitHub Bot
Created on: 04/Jan/22 18:17
Start Date: 04/Jan/22 18:17
Worklog Time Spent: 10m 
  Work Description: mbathori-cloudera commented on pull request #2835:
URL: https://github.com/apache/hive/pull/2835#issuecomment-1005058783


   The TestNegativeLlapLocalCliDriver [authorization_import_ptn] test failure 
seems to be an intermittent issue. It is passing locally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703459)
Time Spent: 20m  (was: 10m)

> Create a parametrized test to check against the disabled MIN_HISTORY config
> ---
>
> Key: HIVE-25623
> URL: https://issues.apache.org/jira/browse/HIVE-25623
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Mark Bathori
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently all test cases in TestDbTxnManager2/TestCommands(X) run against 
> enabled MIN_HISTORY config. We should also execute them for the scenario when 
> MIN_HISTORY_LEVEL table is missing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25667) Unify code managing JDBC databases in tests

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25667?focusedWorklogId=703452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703452
 ]

ASF GitHub Bot logged work on HIVE-25667:
-

Author: ASF GitHub Bot
Created on: 04/Jan/22 17:54
Start Date: 04/Jan/22 17:54
Worklog Time Spent: 10m 
  Work Description: mbathori-cloudera opened a new pull request #2919:
URL: https://github.com/apache/hive/pull/2919


   ### What changes were proposed in this pull request?
   Currently there are two class hierarchies managing JDBC databases in tests, 
DatabaseRule and AbstractExternalDB. These changes meant to unify these 
database related functionalities and get rid of the duplicated codes.
   
   
   ### Why are the changes needed?
   The current solution is redundant, contains a lot of code duplications. The 
unified database handling gives easier extensibility and modification.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   All the previous tests that were covering the affected database 
functionality are passing with the same expected results.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703452)
Remaining Estimate: 0h
Time Spent: 10m

> Unify code managing JDBC databases in tests
> ---
>
> Key: HIVE-25667
> URL: https://issues.apache.org/jira/browse/HIVE-25667
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Mark Bathori
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there are two class hierarchies managing JDBC databases in tests, 
> [DatabaseRule|https://github.com/apache/hive/blob/d35de014dd49fdcfe0aacb68e6c587beff6d1dea/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/DatabaseRule.java]
>  and 
> [AbstractExternalDB|https://github.com/apache/hive/blob/d35de014dd49fdcfe0aacb68e6c587beff6d1dea/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java].
>  There are many similarities between these hierarchies and certain parts are 
> duplicated. 
> The goal of this JIRA is to refactor the aforementioned hierarchies to reduce 
> code duplication and improve extensibility.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25667) Unify code managing JDBC databases in tests

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25667:
--
Labels: pull-request-available  (was: )

> Unify code managing JDBC databases in tests
> ---
>
> Key: HIVE-25667
> URL: https://issues.apache.org/jira/browse/HIVE-25667
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Mark Bathori
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there are two class hierarchies managing JDBC databases in tests, 
> [DatabaseRule|https://github.com/apache/hive/blob/d35de014dd49fdcfe0aacb68e6c587beff6d1dea/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/DatabaseRule.java]
>  and 
> [AbstractExternalDB|https://github.com/apache/hive/blob/d35de014dd49fdcfe0aacb68e6c587beff6d1dea/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java].
>  There are many similarities between these hierarchies and certain parts are 
> duplicated. 
> The goal of this JIRA is to refactor the aforementioned hierarchies to reduce 
> code duplication and improve extensibility.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25667) Unify code managing JDBC databases in tests

2022-01-04 Thread Mark Bathori (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Bathori reassigned HIVE-25667:
---

Assignee: Mark Bathori

> Unify code managing JDBC databases in tests
> ---
>
> Key: HIVE-25667
> URL: https://issues.apache.org/jira/browse/HIVE-25667
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Mark Bathori
>Priority: Major
>
> Currently there are two class hierarchies managing JDBC databases in tests, 
> [DatabaseRule|https://github.com/apache/hive/blob/d35de014dd49fdcfe0aacb68e6c587beff6d1dea/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/DatabaseRule.java]
>  and 
> [AbstractExternalDB|https://github.com/apache/hive/blob/d35de014dd49fdcfe0aacb68e6c587beff6d1dea/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java].
>  There are many similarities between these hierarchies and certain parts are 
> duplicated. 
> The goal of this JIRA is to refactor the aforementioned hierarchies to reduce 
> code duplication and improve extensibility.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25844) Exception deserialization error-s may cause beeline to terminate immediately

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25844?focusedWorklogId=703371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703371
 ]

ASF GitHub Bot logged work on HIVE-25844:
-

Author: ASF GitHub Bot
Created on: 04/Jan/22 15:37
Start Date: 04/Jan/22 15:37
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk opened a new pull request #2918:
URL: https://github.com/apache/hive/pull/2918


   …rminate immediately
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703371)
Remaining Estimate: 0h
Time Spent: 10m

> Exception deserialization error-s may cause beeline to terminate immediately
> 
>
> Key: HIVE-25844
> URL: https://issues.apache.org/jira/browse/HIVE-25844
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 3.1.2
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> the exception on the server side happens:
>  * fetch task conversion is on
>  * there is an exception during reading the table the error bubbles up
>  * => transmits a message to beeline that error class name is: 
> "org.apache.phoenix.schema.ColumnNotFoundException" + the message
>  * it tries to reconstruct the exception around HiveSqlException
>  * but during the constructor call 
> org.apache.phoenix.exception.SQLExceptionCode is needed which fails to load 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service
>  * a
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service is thrown - which 
> is not handled in that method - so it becomes a real error ; and shuts down 
> the client
> {code:java}
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service
> [...]
> at java.lang.Class.forName(Class.java:264)
> at 
> org.apache.hive.service.cli.HiveSQLException.newInstance(HiveSQLException.java:245)
> at 
> org.apache.hive.service.cli.HiveSQLException.toStackTrace(HiveSQLException.java:211)
> [...]
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.Service
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25844) Exception deserialization error-s may cause beeline to terminate immediately

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25844:
--
Labels: pull-request-available  (was: )

> Exception deserialization error-s may cause beeline to terminate immediately
> 
>
> Key: HIVE-25844
> URL: https://issues.apache.org/jira/browse/HIVE-25844
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 3.1.2
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> the exception on the server side happens:
>  * fetch task conversion is on
>  * there is an exception during reading the table the error bubbles up
>  * => transmits a message to beeline that error class name is: 
> "org.apache.phoenix.schema.ColumnNotFoundException" + the message
>  * it tries to reconstruct the exception around HiveSqlException
>  * but during the constructor call 
> org.apache.phoenix.exception.SQLExceptionCode is needed which fails to load 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service
>  * a
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service is thrown - which 
> is not handled in that method - so it becomes a real error ; and shuts down 
> the client
> {code:java}
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service
> [...]
> at java.lang.Class.forName(Class.java:264)
> at 
> org.apache.hive.service.cli.HiveSQLException.newInstance(HiveSQLException.java:245)
> at 
> org.apache.hive.service.cli.HiveSQLException.toStackTrace(HiveSQLException.java:211)
> [...]
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.Service
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25844) Exception deserialization error-s may cause beeline to terminate immediately

2022-01-04 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-25844:
---


> Exception deserialization error-s may cause beeline to terminate immediately
> 
>
> Key: HIVE-25844
> URL: https://issues.apache.org/jira/browse/HIVE-25844
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 3.1.2
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> the exception on the server side happens:
>  * fetch task conversion is on
>  * there is an exception during reading the table the error bubbles up
>  * => transmits a message to beeline that error class name is: 
> "org.apache.phoenix.schema.ColumnNotFoundException" + the message
>  * it tries to reconstruct the exception around HiveSqlException
>  * but during the constructor call 
> org.apache.phoenix.exception.SQLExceptionCode is needed which fails to load 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service
>  * a
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service is thrown - which 
> is not handled in that method - so it becomes a real error ; and shuts down 
> the client
> {code:java}
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service
> [...]
> at java.lang.Class.forName(Class.java:264)
> at 
> org.apache.hive.service.cli.HiveSQLException.newInstance(HiveSQLException.java:245)
> at 
> org.apache.hive.service.cli.HiveSQLException.toStackTrace(HiveSQLException.java:211)
> [...]
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.Service
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25843) Add flag to disable Iceberg FileIO config serialization

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25843:
--
Labels: pull-request-available  (was: )

> Add flag to disable Iceberg FileIO config serialization
> ---
>
> Key: HIVE-25843
> URL: https://issues.apache.org/jira/browse/HIVE-25843
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive serializes the Iceberg table object into each individual split. Since 
> the FileIO is part of the Iceberg table and it has its own hadoop 
> configuration, this configuration will be the dominant factor determining the 
> size of the serialized split. In our tests we have found that due to this 
> serialized config, iceberg splits are 15-20x larger than normal Hive splits 
> (which led to OOM in some of our perf tests).
> This PR proposes to introduce a config which can turn off this config 
> serialization, and let the deserializer-side fill out the config values 
> instead (which works for Hive executors, since they have all the config 
> values in hand). This can reduce the Iceberg split size by ~20x based on 
> local tests.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25843) Add flag to disable Iceberg FileIO config serialization

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25843?focusedWorklogId=703278=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703278
 ]

ASF GitHub Bot logged work on HIVE-25843:
-

Author: ASF GitHub Bot
Created on: 04/Jan/22 12:26
Start Date: 04/Jan/22 12:26
Worklog Time Spent: 10m 
  Work Description: marton-bod opened a new pull request #2917:
URL: https://github.com/apache/hive/pull/2917


   Cross-porting https://github.com/apache/iceberg/pull/3752 from Iceberg 
(commit da712eaf60744c933c08fe1cab7a00cdcb2f4829)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703278)
Remaining Estimate: 0h
Time Spent: 10m

> Add flag to disable Iceberg FileIO config serialization
> ---
>
> Key: HIVE-25843
> URL: https://issues.apache.org/jira/browse/HIVE-25843
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive serializes the Iceberg table object into each individual split. Since 
> the FileIO is part of the Iceberg table and it has its own hadoop 
> configuration, this configuration will be the dominant factor determining the 
> size of the serialized split. In our tests we have found that due to this 
> serialized config, iceberg splits are 15-20x larger than normal Hive splits 
> (which led to OOM in some of our perf tests).
> This PR proposes to introduce a config which can turn off this config 
> serialization, and let the deserializer-side fill out the config values 
> instead (which works for Hive executors, since they have all the config 
> values in hand). This can reduce the Iceberg split size by ~20x based on 
> local tests.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25843) Add flag to disable Iceberg FileIO config serialization

2022-01-04 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25843:
-


> Add flag to disable Iceberg FileIO config serialization
> ---
>
> Key: HIVE-25843
> URL: https://issues.apache.org/jira/browse/HIVE-25843
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Hive serializes the Iceberg table object into each individual split. Since 
> the FileIO is part of the Iceberg table and it has its own hadoop 
> configuration, this configuration will be the dominant factor determining the 
> size of the serialized split. In our tests we have found that due to this 
> serialized config, iceberg splits are 15-20x larger than normal Hive splits 
> (which led to OOM in some of our perf tests).
> This PR proposes to introduce a config which can turn off this config 
> serialization, and let the deserializer-side fill out the config values 
> instead (which works for Hive executors, since they have all the config 
> values in hand). This can reduce the Iceberg split size by ~20x based on 
> local tests.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25842?focusedWorklogId=703277=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703277
 ]

ASF GitHub Bot logged work on HIVE-25842:
-

Author: ASF GitHub Bot
Created on: 04/Jan/22 12:20
Start Date: 04/Jan/22 12:20
Worklog Time Spent: 10m 
  Work Description: lcspinter opened a new pull request #2916:
URL: https://github.com/apache/hive/pull/2916


   
   
   ### What changes were proposed in this pull request?
   Move delta metric collection from Tez side to compaction side. All delta 
file metrics are collected during initiator, worker and cleaner phase.
   
   
   
   ### Why are the changes needed?
   Metrics are collected only when a Tez query runs a table (select * and 
select count( * ) don't update the metrics)
   Metrics aren't updated after compaction or cleaning after compaction, so 
users will probably see "issues" with compaction (like many active or obsolete 
or small deltas) that don't exist.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   Manual test, unit test
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703277)
Time Spent: 0.5h  (was: 20m)

> Reimplement delta file metric collection
> 
>
> Key: HIVE-25842
> URL: https://issues.apache.org/jira/browse/HIVE-25842
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> FUNCTIONALITY: Metrics are collected only when a Tez query runs a table 
> (select * and select count( * ) don't update the metrics)
> Metrics aren't updated after compaction or cleaning after compaction, so 
> users will probably see "issues" with compaction (like many active or 
> obsolete or small deltas) that don't exist.
> RISK: Metrics are collected during queries – we tried to put a try-catch 
> around each method in DeltaFilesMetricsReporter but of course this isn't 
> foolproof. This is a HUGE performance and functionality liability. Tests 
> caught some issues, but our tests aren't perfect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25842?focusedWorklogId=703275=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703275
 ]

ASF GitHub Bot logged work on HIVE-25842:
-

Author: ASF GitHub Bot
Created on: 04/Jan/22 12:18
Start Date: 04/Jan/22 12:18
Worklog Time Spent: 10m 
  Work Description: lcspinter closed pull request #2915:
URL: https://github.com/apache/hive/pull/2915


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703275)
Time Spent: 20m  (was: 10m)

> Reimplement delta file metric collection
> 
>
> Key: HIVE-25842
> URL: https://issues.apache.org/jira/browse/HIVE-25842
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> FUNCTIONALITY: Metrics are collected only when a Tez query runs a table 
> (select * and select count( * ) don't update the metrics)
> Metrics aren't updated after compaction or cleaning after compaction, so 
> users will probably see "issues" with compaction (like many active or 
> obsolete or small deltas) that don't exist.
> RISK: Metrics are collected during queries – we tried to put a try-catch 
> around each method in DeltaFilesMetricsReporter but of course this isn't 
> foolproof. This is a HUGE performance and functionality liability. Tests 
> caught some issues, but our tests aren't perfect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25842) Reimplement delta file metric collection

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25842?focusedWorklogId=703271=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703271
 ]

ASF GitHub Bot logged work on HIVE-25842:
-

Author: ASF GitHub Bot
Created on: 04/Jan/22 12:12
Start Date: 04/Jan/22 12:12
Worklog Time Spent: 10m 
  Work Description: laszlocsabapinter opened a new pull request #2915:
URL: https://github.com/apache/hive/pull/2915


   ### What changes were proposed in this pull request?
   Move delta metric collection from Tez side to compaction side.  All delta 
file metrics are collected during initiator, worker and cleaner phase. 
   
   ### Why are the changes needed?
   Metrics are collected only when a Tez query runs a table (select * and 
select count( * ) don't update the metrics)
   Metrics aren't updated after compaction or cleaning after compaction, so 
users will probably see "issues" with compaction (like many active or obsolete 
or small deltas) that don't exist.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Manual test, unit test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703271)
Remaining Estimate: 0h
Time Spent: 10m

> Reimplement delta file metric collection
> 
>
> Key: HIVE-25842
> URL: https://issues.apache.org/jira/browse/HIVE-25842
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> FUNCTIONALITY: Metrics are collected only when a Tez query runs a table 
> (select * and select count( * ) don't update the metrics)
> Metrics aren't updated after compaction or cleaning after compaction, so 
> users will probably see "issues" with compaction (like many active or 
> obsolete or small deltas) that don't exist.
> RISK: Metrics are collected during queries – we tried to put a try-catch 
> around each method in DeltaFilesMetricsReporter but of course this isn't 
> foolproof. This is a HUGE performance and functionality liability. Tests 
> caught some issues, but our tests aren't perfect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25842) Reimplement delta file metric collection

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25842:
--
Labels: pull-request-available  (was: )

> Reimplement delta file metric collection
> 
>
> Key: HIVE-25842
> URL: https://issues.apache.org/jira/browse/HIVE-25842
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> FUNCTIONALITY: Metrics are collected only when a Tez query runs a table 
> (select * and select count( * ) don't update the metrics)
> Metrics aren't updated after compaction or cleaning after compaction, so 
> users will probably see "issues" with compaction (like many active or 
> obsolete or small deltas) that don't exist.
> RISK: Metrics are collected during queries – we tried to put a try-catch 
> around each method in DeltaFilesMetricsReporter but of course this isn't 
> foolproof. This is a HUGE performance and functionality liability. Tests 
> caught some issues, but our tests aren't perfect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-24805) Compactor: Initiator shouldn't fetch table details again and again for partitioned tables

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24805?focusedWorklogId=703270=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703270
 ]

ASF GitHub Bot logged work on HIVE-24805:
-

Author: ASF GitHub Bot
Created on: 04/Jan/22 12:11
Start Date: 04/Jan/22 12:11
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2906:
URL: https://github.com/apache/hive/pull/2906#discussion_r778034243



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java
##
@@ -101,6 +103,25 @@ public void init(AtomicBoolean stop) throws Exception {
 }
   }
 
+  @Override
+  public void setCache(CompactorMetadataCache metadataCache) {
+this.metadataCache = metadataCache;
+  }
+
+  protected Table cacheAndResolveTable(CompactionInfo ci) throws MetaException 
{
+if (metadataCache != null) {

Review comment:
   do you think it's worth using Optional here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703270)
Time Spent: 1h  (was: 50m)

> Compactor: Initiator shouldn't fetch table details again and again for 
> partitioned tables
> -
>
> Key: HIVE-24805
> URL: https://issues.apache.org/jira/browse/HIVE-24805
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Rajesh Balamohan
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Initiator shouldn't be fetch table details for all its partitions. When there 
> are large number of databases/tables, it takes lot of time for Initiator to 
> complete its initial iteration and load on DB also goes higher.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L129
> https://github.com/apache/hive/blob/64bb52316f19426ebea0087ee15e282cbde1d852/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L456
> For all the following partitions, table details would be the same. However, 
> it ends up fetching table details from HMS again and again.
> {noformat}
> 2021-02-22 08:13:16,106 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2451899
> 2021-02-22 08:13:16,124 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2451830
> 2021-02-22 08:13:16,140 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2452586
> 2021-02-22 08:13:16,149 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2452698
> 2021-02-22 08:13:16,158 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2452063
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-24805) Compactor: Initiator shouldn't fetch table details again and again for partitioned tables

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24805?focusedWorklogId=703264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703264
 ]

ASF GitHub Bot logged work on HIVE-24805:
-

Author: ASF GitHub Bot
Created on: 04/Jan/22 12:03
Start Date: 04/Jan/22 12:03
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2906:
URL: https://github.com/apache/hive/pull/2906#discussion_r778029906



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CacheAwareCompactor.java
##
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.txn;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.cache.Cache;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.util.concurrent.UncheckedExecutionException;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+
+import java.util.Objects;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+
+public interface CacheAwareCompactor {
+
+  static void trySetCache(Object obj, CompactorMetadataCache cache) {
+if (CacheAwareCompactor.class.isAssignableFrom(obj.getClass())) {
+  ((CacheAwareCompactor) obj).setCache(cache);
+}
+  }
+
+  void setCache(CompactorMetadataCache cache);
+
+  class CompactorMetadataCache {
+
+private final Cache tableCache;
+private final Cache partitionCache;
+
+@VisibleForTesting
+public CompactorMetadataCache(long timeout, TimeUnit unit) {
+  this.tableCache = CacheBuilder.newBuilder().expireAfterAccess(timeout, 
unit).softValues().build();
+  this.partitionCache = 
CacheBuilder.newBuilder().expireAfterAccess(timeout, unit).softValues().build();
+}
+
+public static CompactorMetadataCache createIfEnabled(Configuration conf) {
+  long timeout = MetastoreConf.getTimeVar(conf,
+MetastoreConf.ConfVars.COMPACTOR_METADATA_CACHE_TIMEOUT, 
TimeUnit.SECONDS);
+  if (timeout == 0) {
+return null;
+  }
+  return new CompactorMetadataCache(timeout, TimeUnit.SECONDS);
+}
+
+public Table resolveTable(CompactionInfo ci, Callable loader) {
+  try {
+TableCacheKey key = new TableCacheKey(ci);
+return tableCache.get(key, loader);
+  } catch (ExecutionException e) {
+throw new UncheckedExecutionException(e);
+  }
+}
+
+public Partition resolvePartition(CompactionInfo ci, Callable 
loader) {
+  try {
+if (ci.partName == null) {
+  return null;
+}
+PartitionCacheKey key = new PartitionCacheKey(ci);

Review comment:
   same as above:
   
   ci.getFullPartitionName
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703264)
Time Spent: 50m  (was: 40m)

> Compactor: Initiator shouldn't fetch table details again and again for 
> partitioned tables
> -
>
> Key: HIVE-24805
> URL: https://issues.apache.org/jira/browse/HIVE-24805
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Rajesh Balamohan
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Initiator shouldn't be fetch table details for all its partitions. When there 
> are large number of 

[jira] [Work logged] (HIVE-24805) Compactor: Initiator shouldn't fetch table details again and again for partitioned tables

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24805?focusedWorklogId=703263=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703263
 ]

ASF GitHub Bot logged work on HIVE-24805:
-

Author: ASF GitHub Bot
Created on: 04/Jan/22 12:02
Start Date: 04/Jan/22 12:02
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2906:
URL: https://github.com/apache/hive/pull/2906#discussion_r778029687



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CacheAwareCompactor.java
##
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.txn;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.cache.Cache;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.util.concurrent.UncheckedExecutionException;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+
+import java.util.Objects;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+
+public interface CacheAwareCompactor {
+
+  static void trySetCache(Object obj, CompactorMetadataCache cache) {
+if (CacheAwareCompactor.class.isAssignableFrom(obj.getClass())) {
+  ((CacheAwareCompactor) obj).setCache(cache);
+}
+  }
+
+  void setCache(CompactorMetadataCache cache);
+
+  class CompactorMetadataCache {
+
+private final Cache tableCache;
+private final Cache partitionCache;
+
+@VisibleForTesting
+public CompactorMetadataCache(long timeout, TimeUnit unit) {
+  this.tableCache = CacheBuilder.newBuilder().expireAfterAccess(timeout, 
unit).softValues().build();
+  this.partitionCache = 
CacheBuilder.newBuilder().expireAfterAccess(timeout, unit).softValues().build();
+}
+
+public static CompactorMetadataCache createIfEnabled(Configuration conf) {
+  long timeout = MetastoreConf.getTimeVar(conf,
+MetastoreConf.ConfVars.COMPACTOR_METADATA_CACHE_TIMEOUT, 
TimeUnit.SECONDS);
+  if (timeout == 0) {
+return null;
+  }
+  return new CompactorMetadataCache(timeout, TimeUnit.SECONDS);
+}
+
+public Table resolveTable(CompactionInfo ci, Callable loader) {
+  try {
+TableCacheKey key = new TableCacheKey(ci);

Review comment:
   I think you could simplify here to String:
   
   ci.getFullTableName()
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703263)
Time Spent: 40m  (was: 0.5h)

> Compactor: Initiator shouldn't fetch table details again and again for 
> partitioned tables
> -
>
> Key: HIVE-24805
> URL: https://issues.apache.org/jira/browse/HIVE-24805
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Rajesh Balamohan
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Initiator shouldn't be fetch table details for all its partitions. When there 
> are large number of databases/tables, it takes lot of time for Initiator to 
> complete its initial iteration and load on DB also goes higher.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L129
> 

[jira] [Assigned] (HIVE-25842) Reimplement delta file metric collection

2022-01-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-25842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér reassigned HIVE-25842:



> Reimplement delta file metric collection
> 
>
> Key: HIVE-25842
> URL: https://issues.apache.org/jira/browse/HIVE-25842
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>
> FUNCTIONALITY: Metrics are collected only when a Tez query runs a table 
> (select * and select count( * ) don't update the metrics)
> Metrics aren't updated after compaction or cleaning after compaction, so 
> users will probably see "issues" with compaction (like many active or 
> obsolete or small deltas) that don't exist.
> RISK: Metrics are collected during queries – we tried to put a try-catch 
> around each method in DeltaFilesMetricsReporter but of course this isn't 
> foolproof. This is a HUGE performance and functionality liability. Tests 
> caught some issues, but our tests aren't perfect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25800) loadDynamicPartitions in Hive.java should not load all partitions of a managed table

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25800?focusedWorklogId=703260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703260
 ]

ASF GitHub Bot logged work on HIVE-25800:
-

Author: ASF GitHub Bot
Created on: 04/Jan/22 11:50
Start Date: 04/Jan/22 11:50
Worklog Time Spent: 10m 
  Work Description: sourabh912 commented on pull request #2868:
URL: https://github.com/apache/hive/pull/2868#issuecomment-1004743262


   Thank you @lcspinter for the approval and @nrg4878 for merging it. Commit 
merged to master: 
https://github.com/apache/hive/commit/63c6d8ba70dfa59e791e1e49f44629c22fb41b7f


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703260)
Time Spent: 1h 10m  (was: 1h)

> loadDynamicPartitions in Hive.java should not load all partitions of a 
> managed table 
> -
>
> Key: HIVE-25800
> URL: https://issues.apache.org/jira/browse/HIVE-25800
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Sourabh Goyal
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> HIVE-20661 added an improvement in loadDynamicPartitions() api in Hive.java 
> to not add partitions one by one in HMS. As part of that improvement, 
> following code was introduced: 
> {code:java}
> // fetch all the partitions matching the part spec using the partition 
> iterable
> // this way the maximum batch size configuration parameter is considered
> PartitionIterable partitionIterable = new PartitionIterable(Hive.get(), tbl, 
> partSpec,
>   conf.getInt(MetastoreConf.ConfVars.BATCH_RETRIEVE_MAX.getVarname(), 
> 300));
> Iterator iterator = partitionIterable.iterator();
> // Match valid partition path to partitions
> while (iterator.hasNext()) {
>   Partition partition = iterator.next();
>   partitionDetailsMap.entrySet().stream()
>   .filter(entry -> 
> entry.getValue().fullSpec.equals(partition.getSpec()))
>   .findAny().ifPresent(entry -> {
> entry.getValue().partition = partition;
> entry.getValue().hasOldPartition = true;
>   });
> } {code}
> The above code fetches all the existing partitions for a table from HMS and 
> compare that dynamic partitions list to decide old and new partitions to be 
> added to HMS (in batches). The call to fetch all partitions has introduced a 
> performance regression for tables with large number of partitions (of the 
> order of 100K). 
>  
> This is fixed for external tables in 
> https://issues.apache.org/jira/browse/HIVE-25178.  However for ACID tables 
> there is an open Jira(HIVE-25187). Until we have an appropriate fix in 
> HIVE-25187, we can apply the following: 
> Skip fetching all partitions. Instead, in the threadPool which loads each 
> partition individually,  call get_partition() to check if the partition 
> already exists in HMS or not.  
> This will introduce additional getPartition() call for every partition to be 
> loaded dynamically but removes fetching all existing partitions for a table. 
> I believe this is fine since for tables with small number of existing 
> partitions in HMS - getPartitions() won't add too much overhead but for 
> tables with large number of existing partitions, it will certainly avoid 
> getting all partitions from HMS 
> cc - [~lpinter] [~ngangam] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25800) loadDynamicPartitions in Hive.java should not load all partitions of a managed table

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25800?focusedWorklogId=703258=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703258
 ]

ASF GitHub Bot logged work on HIVE-25800:
-

Author: ASF GitHub Bot
Created on: 04/Jan/22 11:49
Start Date: 04/Jan/22 11:49
Worklog Time Spent: 10m 
  Work Description: sourabh912 closed pull request #2868:
URL: https://github.com/apache/hive/pull/2868


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703258)
Time Spent: 1h  (was: 50m)

> loadDynamicPartitions in Hive.java should not load all partitions of a 
> managed table 
> -
>
> Key: HIVE-25800
> URL: https://issues.apache.org/jira/browse/HIVE-25800
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Sourabh Goyal
>Assignee: Sourabh Goyal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> HIVE-20661 added an improvement in loadDynamicPartitions() api in Hive.java 
> to not add partitions one by one in HMS. As part of that improvement, 
> following code was introduced: 
> {code:java}
> // fetch all the partitions matching the part spec using the partition 
> iterable
> // this way the maximum batch size configuration parameter is considered
> PartitionIterable partitionIterable = new PartitionIterable(Hive.get(), tbl, 
> partSpec,
>   conf.getInt(MetastoreConf.ConfVars.BATCH_RETRIEVE_MAX.getVarname(), 
> 300));
> Iterator iterator = partitionIterable.iterator();
> // Match valid partition path to partitions
> while (iterator.hasNext()) {
>   Partition partition = iterator.next();
>   partitionDetailsMap.entrySet().stream()
>   .filter(entry -> 
> entry.getValue().fullSpec.equals(partition.getSpec()))
>   .findAny().ifPresent(entry -> {
> entry.getValue().partition = partition;
> entry.getValue().hasOldPartition = true;
>   });
> } {code}
> The above code fetches all the existing partitions for a table from HMS and 
> compare that dynamic partitions list to decide old and new partitions to be 
> added to HMS (in batches). The call to fetch all partitions has introduced a 
> performance regression for tables with large number of partitions (of the 
> order of 100K). 
>  
> This is fixed for external tables in 
> https://issues.apache.org/jira/browse/HIVE-25178.  However for ACID tables 
> there is an open Jira(HIVE-25187). Until we have an appropriate fix in 
> HIVE-25187, we can apply the following: 
> Skip fetching all partitions. Instead, in the threadPool which loads each 
> partition individually,  call get_partition() to check if the partition 
> already exists in HMS or not.  
> This will introduce additional getPartition() call for every partition to be 
> loaded dynamically but removes fetching all existing partitions for a table. 
> I believe this is fine since for tables with small number of existing 
> partitions in HMS - getPartitions() won't add too much overhead but for 
> tables with large number of existing partitions, it will certainly avoid 
> getting all partitions from HMS 
> cc - [~lpinter] [~ngangam] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-24805) Compactor: Initiator shouldn't fetch table details again and again for partitioned tables

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24805?focusedWorklogId=703257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703257
 ]

ASF GitHub Bot logged work on HIVE-24805:
-

Author: ASF GitHub Bot
Created on: 04/Jan/22 11:48
Start Date: 04/Jan/22 11:48
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2906:
URL: https://github.com/apache/hive/pull/2906#discussion_r778021444



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -201,7 +201,7 @@ private void clean(CompactionInfo ci, long minOpenTxnGLB, 
boolean metricsEnabled
 
   Partition p = null;
   if (ci.partName != null) {
-p = resolvePartition(ci);
+p = cacheAndResolvePartition(ci);

Review comment:
   same as above
   
   resolvePartitionAndCache
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703257)
Time Spent: 0.5h  (was: 20m)

> Compactor: Initiator shouldn't fetch table details again and again for 
> partitioned tables
> -
>
> Key: HIVE-24805
> URL: https://issues.apache.org/jira/browse/HIVE-24805
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Rajesh Balamohan
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Initiator shouldn't be fetch table details for all its partitions. When there 
> are large number of databases/tables, it takes lot of time for Initiator to 
> complete its initial iteration and load on DB also goes higher.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L129
> https://github.com/apache/hive/blob/64bb52316f19426ebea0087ee15e282cbde1d852/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L456
> For all the following partitions, table details would be the same. However, 
> it ends up fetching table details from HMS again and again.
> {noformat}
> 2021-02-22 08:13:16,106 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2451899
> 2021-02-22 08:13:16,124 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2451830
> 2021-02-22 08:13:16,140 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2452586
> 2021-02-22 08:13:16,149 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2452698
> 2021-02-22 08:13:16,158 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2452063
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-24805) Compactor: Initiator shouldn't fetch table details again and again for partitioned tables

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24805?focusedWorklogId=703256=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703256
 ]

ASF GitHub Bot logged work on HIVE-24805:
-

Author: ASF GitHub Bot
Created on: 04/Jan/22 11:47
Start Date: 04/Jan/22 11:47
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #2906:
URL: https://github.com/apache/hive/pull/2906#discussion_r778021118



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -184,7 +184,7 @@ private void clean(CompactionInfo ci, long minOpenTxnGLB, 
boolean metricsEnabled
   if (metricsEnabled) {
 perfLogger.perfLogBegin(CLASS_NAME, cleanerMetric);
   }
-  Table t = resolveTable(ci);
+  Table t = cacheAndResolveTable(ci);

Review comment:
   what do you think if we put `cache` at the end and keep the primary goal 
at the beginning? 
   
   resolveTableAndCache
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703256)
Time Spent: 20m  (was: 10m)

> Compactor: Initiator shouldn't fetch table details again and again for 
> partitioned tables
> -
>
> Key: HIVE-24805
> URL: https://issues.apache.org/jira/browse/HIVE-24805
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Rajesh Balamohan
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Initiator shouldn't be fetch table details for all its partitions. When there 
> are large number of databases/tables, it takes lot of time for Initiator to 
> complete its initial iteration and load on DB also goes higher.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L129
> https://github.com/apache/hive/blob/64bb52316f19426ebea0087ee15e282cbde1d852/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L456
> For all the following partitions, table details would be the same. However, 
> it ends up fetching table details from HMS again and again.
> {noformat}
> 2021-02-22 08:13:16,106 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2451899
> 2021-02-22 08:13:16,124 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2451830
> 2021-02-22 08:13:16,140 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2452586
> 2021-02-22 08:13:16,149 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2452698
> 2021-02-22 08:13:16,158 INFO  
> org.apache.hadoop.hive.ql.txn.compactor.Initiator: [Thread-11]: Checking to 
> see if we should compact 
> tpcds_bin_partitioned_orc_1000.store_returns_tmp2.sr_returned_date_sk=2452063
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25841) Improve performance of deleteColumnStatsState

2022-01-04 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-25841:
-

Assignee: Peter Vary

> Improve performance of deleteColumnStatsState
> -
>
> Key: HIVE-25841
> URL: https://issues.apache.org/jira/browse/HIVE-25841
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The {{MetaStoreDirectSql.deleteColumnStatsState()}} performance is lacking 
> when the {{PARTITION_PARAMS}} and the {{PARTITIONS}} has high number of rows.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25841) Improve performance of deleteColumnStatsState

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25841?focusedWorklogId=703237=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703237
 ]

ASF GitHub Bot logged work on HIVE-25841:
-

Author: ASF GitHub Bot
Created on: 04/Jan/22 09:59
Start Date: 04/Jan/22 09:59
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request #2914:
URL: https://github.com/apache/hive/pull/2914


   ### What changes were proposed in this pull request?
   Different query for the mysql
   
   ### Why are the changes needed?
   To improve the performance
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Local machine


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703237)
Remaining Estimate: 0h
Time Spent: 10m

> Improve performance of deleteColumnStatsState
> -
>
> Key: HIVE-25841
> URL: https://issues.apache.org/jira/browse/HIVE-25841
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The {{MetaStoreDirectSql.deleteColumnStatsState()}} performance is lacking 
> when the {{PARTITION_PARAMS}} and the {{PARTITIONS}} has high number of rows.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25841) Improve performance of deleteColumnStatsState

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25841:
--
Labels: pull-request-available  (was: )

> Improve performance of deleteColumnStatsState
> -
>
> Key: HIVE-25841
> URL: https://issues.apache.org/jira/browse/HIVE-25841
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The {{MetaStoreDirectSql.deleteColumnStatsState()}} performance is lacking 
> when the {{PARTITION_PARAMS}} and the {{PARTITIONS}} has high number of rows.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25829) Tez exec mode support for credential provider for jobs

2022-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25829?focusedWorklogId=703215=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703215
 ]

ASF GitHub Bot logged work on HIVE-25829:
-

Author: ASF GitHub Bot
Created on: 04/Jan/22 08:02
Start Date: 04/Jan/22 08:02
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #2911:
URL: https://github.com/apache/hive/pull/2911#discussion_r777886508



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConfUtil.java
##
@@ -199,16 +199,16 @@ public static void 
updateJobCredentialProviders(Configuration jobConf) {
 if (credstorePassword != null) {
   String execEngine = jobConf.get(ConfVars.HIVE_EXECUTION_ENGINE.varname);
 
-  if ("mr".equalsIgnoreCase(execEngine)) {
-// if the execution engine is MR set the map/reduce env with the 
credential store password
+  if ("mr".equalsIgnoreCase(execEngine) || 
"tez".equalsIgnoreCase(execEngine)) {
+// if the execution engine is MR/Tez set the map/reduce env with the 
credential store password
 
 Collection redactedProperties =
 
jobConf.getStringCollection(MRJobConfig.MR_JOB_REDACTED_PROPERTIES);
-
 Stream.of(
 JobConf.MAPRED_MAP_TASK_ENV,
 JobConf.MAPRED_REDUCE_TASK_ENV,
-MRJobConfig.MR_AM_ADMIN_USER_ENV)
+MRJobConfig.MR_AM_ADMIN_USER_ENV,
+"tez.am.launch.env")

Review comment:
   thanks @nareshpr, I'll double-check, I tested this patch on a cluster as 
a POC, and I saw a simple map/reduce dag worked
   also, I'm going to decide how to use this as referring to 
TezConfiguration.TEZ_AM_LAUNCH_ENV, which is not visible currently in 
hive-common




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 703215)
Time Spent: 0.5h  (was: 20m)

> Tez exec mode support for credential provider for jobs
> --
>
> Key: HIVE-25829
> URL: https://issues.apache.org/jira/browse/HIVE-25829
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Ádám Szita
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HIVE-14822 introduced support to securely forward a job specific java 
> credential store path, and a corresponding password to the backend executors. 
> This is currently implemented for only MR2 and Spark execution engines. I 
> propose we extend this feature by adding Tez mode to said list.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)