[jira] [Work logged] (HIVE-27187) Incremental rebuild of materialized view having aggregate and stored by iceberg

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27187?focusedWorklogId=855998=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855998
 ]

ASF GitHub Bot logged work on HIVE-27187:
-

Author: ASF GitHub Bot
Created on: 11/Apr/23 05:33
Start Date: 11/Apr/23 05:33
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4166:
URL: https://github.com/apache/hive/pull/4166#issuecomment-1502711234

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4166)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4166=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4166=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4166=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=CODE_SMELL)
 [6 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4166=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4166=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 855998)
Time Spent: 3h 40m  (was: 3.5h)

> Incremental rebuild of materialized view having aggregate and stored by 
> iceberg
> ---
>
> Key: HIVE-27187
> URL: https://issues.apache.org/jira/browse/HIVE-27187
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently incremental rebuild of materialized view stored by iceberg which 
> definition query contains aggregate operator is transformed to an insert 
> overwrite statement which contains a union operator if the source tables 
> contains insert operations only. One branch of the union scans the view the 
> other produces the delta.
> This can be improved further: transform the statement to a multi insert 
> statement representing a merge statement to insert new aggregations and 
> update existing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-23567) authorization_disallow_transform.q is unstable

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23567:
--
Labels: pull-request-available  (was: )

> authorization_disallow_transform.q is unstable
> --
>
> Key: HIVE-23567
> URL: https://issues.apache.org/jira/browse/HIVE-23567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-23567) authorization_disallow_transform.q is unstable

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23567?focusedWorklogId=855992=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855992
 ]

ASF GitHub Bot logged work on HIVE-23567:
-

Author: ASF GitHub Bot
Created on: 11/Apr/23 04:18
Start Date: 11/Apr/23 04:18
Worklog Time Spent: 10m 
  Work Description: rkirtir opened a new pull request, #4215:
URL: https://github.com/apache/hive/pull/4215

   
   
   ### What changes were proposed in this pull request?
   HIVE-23567
   
   
   ### Why are the changes needed?
   Enabling authorization_disallow_transform.q in test suite
   
   ### Does this PR introduce _any_ user-facing change?
   NO
   
   
   ### How was this patch tested?
   via test
   




Issue Time Tracking
---

Worklog Id: (was: 855992)
Remaining Estimate: 0h
Time Spent: 10m

> authorization_disallow_transform.q is unstable
> --
>
> Key: HIVE-23567
> URL: https://issues.apache.org/jira/browse/HIVE-23567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-23548) TestActivePassiveHA is unstable

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23548?focusedWorklogId=855991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855991
 ]

ASF GitHub Bot logged work on HIVE-23548:
-

Author: ASF GitHub Bot
Created on: 11/Apr/23 04:09
Start Date: 11/Apr/23 04:09
Worklog Time Spent: 10m 
  Work Description: rkirtir opened a new pull request, #4214:
URL: https://github.com/apache/hive/pull/4214

   
   
   ### What changes were proposed in this pull request?
   HIVE-23548
   
   
   ### Why are the changes needed?
   Enabling TestActivePassiveHA  in test suite
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   via test
   




Issue Time Tracking
---

Worklog Id: (was: 855991)
Remaining Estimate: 0h
Time Spent: 10m

> TestActivePassiveHA is unstable
> ---
>
> Key: HIVE-23548
> URL: https://issues.apache.org/jira/browse/HIVE-23548
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: KIRTI RUGE
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-23548) TestActivePassiveHA is unstable

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23548:
--
Labels: pull-request-available  (was: )

> TestActivePassiveHA is unstable
> ---
>
> Key: HIVE-23548
> URL: https://issues.apache.org/jira/browse/HIVE-23548
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: KIRTI RUGE
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27164) Create Temp Txn Table As Select is failing at tablePath validation

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27164?focusedWorklogId=855988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855988
 ]

ASF GitHub Bot logged work on HIVE-27164:
-

Author: ASF GitHub Bot
Created on: 11/Apr/23 02:24
Start Date: 11/Apr/23 02:24
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4176:
URL: https://github.com/apache/hive/pull/4176#issuecomment-1502599259

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4176)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=BUG)
 
[![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png
 
'C')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=BUG)
 [1 
Bug](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4176=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4176=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4176=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=CODE_SMELL)
 [11 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4176=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4176=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 855988)
Time Spent: 3h  (was: 2h 50m)

> Create Temp Txn Table As Select is failing at tablePath validation
> --
>
> Key: HIVE-27164
> URL: https://issues.apache.org/jira/browse/HIVE-27164
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Naresh P R
>Assignee: Venugopal Reddy K
>Priority: Major
>  Labels: pull-request-available
> Attachments: mm_cttas.q
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> After HIVE-25303, every CTAS goes for  
> HiveMetaStore$HMSHandler#translate_table_dryrun() call to fetch table 
> location for CTAS queries which fails with following exception for temp 
> tables if MetastoreDefaultTransformer is set.
> {code:java}
> 2023-03-17 16:41:23,390 INFO  
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
> [pool-6-thread-196]: Starting translation for CreateTable for processor 
> HMSClient-@localhost with [EXTWRITE, EXTREAD, HIVEBUCKET2, HIVEFULLACIDREAD, 
> HIVEFULLACIDWRITE, HIVECACHEINVALIDATE, HIVEMANAGESTATS, 
> HIVEMANAGEDINSERTWRITE, HIVEMANAGEDINSERTREAD, HIVESQL, HIVEMQT, 
> HIVEONLYMQTWRITE] on table test_temp
> 2023-03-17 16:41:23,392 ERROR 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-6-thread-196]: 
> MetaException(message:Illegal location for managed table, it has to be within 
> database's managed location)
>        

[jira] [Resolved] (HIVE-27143) Optimize HCatStorer move task

2023-04-10 Thread Daniel Dai (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved HIVE-27143.
---
Fix Version/s: 4.0.0
 Hadoop Flags: Reviewed
 Release Note: PR merged.
   Resolution: Fixed

> Optimize HCatStorer move task
> -
>
> Key: HIVE-27143
> URL: https://issues.apache.org/jira/browse/HIVE-27143
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 3.1.3
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> moveTask in hcatalog is inefficient, it does 2 iterations dryRun and 
> execution, and is sequential. This can be improved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26986) A DAG created by OperatorGraph is not equal to the Tez DAG.

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26986?focusedWorklogId=855981=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855981
 ]

ASF GitHub Bot logged work on HIVE-26986:
-

Author: ASF GitHub Bot
Created on: 11/Apr/23 00:19
Start Date: 11/Apr/23 00:19
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #3998: 
HIVE-26986: Fix OperatorGraph when a query plan contains UnionOperator
URL: https://github.com/apache/hive/pull/3998




Issue Time Tracking
---

Worklog Id: (was: 855981)
Time Spent: 50m  (was: 40m)

> A DAG created by OperatorGraph is not equal to the Tez DAG.
> ---
>
> Key: HIVE-26986
> URL: https://issues.apache.org/jira/browse/HIVE-26986
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 4.0.0-alpha-2
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.0-must, pull-request-available
> Attachments: Query71 OperatorGraph.png, Query71 TezDAG.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> A DAG created by OperatorGraph is not equal to the corresponding DAG that is 
> submitted to Tez.
> Because of this problem, ParallelEdgeFixer reports a pair of normal edges to 
> a parallel edge.
> We observe this problem by comparing OperatorGraph and Tez DAG when running 
> TPC-DS query 71 on 1TB ORC format managed table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26957) Add convertCharset(s, from, to) function

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26957?focusedWorklogId=855982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855982
 ]

ASF GitHub Bot logged work on HIVE-26957:
-

Author: ASF GitHub Bot
Created on: 11/Apr/23 00:19
Start Date: 11/Apr/23 00:19
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on PR #3982:
URL: https://github.com/apache/hive/pull/3982#issuecomment-1502504331

   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.




Issue Time Tracking
---

Worklog Id: (was: 855982)
Time Spent: 4h 20m  (was: 4h 10m)

> Add convertCharset(s, from, to) function
> 
>
> Key: HIVE-26957
> URL: https://issues.apache.org/jira/browse/HIVE-26957
> Project: Hive
>  Issue Type: New Feature
>Reporter: Bingye Chen
>Assignee: Bingye Chen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Add convertCharset(s, from, to) function.
> The function converts the string `s` from the `from` charset to the `to` 
> charset.It is already implemented in clickhouse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26985) Create a trackable hive configuration object

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26985?focusedWorklogId=855980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855980
 ]

ASF GitHub Bot logged work on HIVE-26985:
-

Author: ASF GitHub Bot
Created on: 11/Apr/23 00:19
Start Date: 11/Apr/23 00:19
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #4002: 
HIVE-26985: Create a trackable hive configuration object
URL: https://github.com/apache/hive/pull/4002




Issue Time Tracking
---

Worklog Id: (was: 855980)
Time Spent: 1h  (was: 50m)

> Create a trackable hive configuration object
> 
>
> Key: HIVE-26985
> URL: https://issues.apache.org/jira/browse/HIVE-26985
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: hive.log
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> During configuration-related investigations, I want to be able to easily find 
> out when and how a certain configuration is changed. I'm looking for an 
> improvement that simply logs if "hive.a.b.c" is changed from "hello" to 
> "asdf" or even null and on which thread/codepath.
> Not sure if there is already a trackable configuration object in hadoop that 
> we can reuse, or we need to implement it in hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27143) Optimize HCatStorer move task

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27143?focusedWorklogId=855979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855979
 ]

ASF GitHub Bot logged work on HIVE-27143:
-

Author: ASF GitHub Bot
Created on: 11/Apr/23 00:19
Start Date: 11/Apr/23 00:19
Worklog Time Spent: 10m 
  Work Description: daijy merged PR #4177:
URL: https://github.com/apache/hive/pull/4177




Issue Time Tracking
---

Worklog Id: (was: 855979)
Time Spent: 40m  (was: 0.5h)

> Optimize HCatStorer move task
> -
>
> Key: HIVE-27143
> URL: https://issues.apache.org/jira/browse/HIVE-27143
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 3.1.3
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> moveTask in hcatalog is inefficient, it does 2 iterations dryRun and 
> execution, and is sequential. This can be improved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27032) Introduce liquibase for HMS schema evolution

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27032?focusedWorklogId=855976=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855976
 ]

ASF GitHub Bot logged work on HIVE-27032:
-

Author: ASF GitHub Bot
Created on: 11/Apr/23 00:00
Start Date: 11/Apr/23 00:00
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4060:
URL: https://github.com/apache/hive/pull/4060#issuecomment-1502487017

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4060)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4060=false=SECURITY_HOTSPOT)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4060=false=SECURITY_HOTSPOT)
 [4 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4060=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=CODE_SMELL)
 [204 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4060=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4060=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 855976)
Time Spent: 1h 40m  (was: 1.5h)

> Introduce liquibase for HMS schema evolution
> 
>
> Key: HIVE-27032
> URL: https://issues.apache.org/jira/browse/HIVE-27032
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Végh
>Assignee: László Végh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Introduce liquibase, and replace current upgrade procedure with it.
> The Schematool CLI API should remain untouched, while under the hood, 
> liquibase should be used for HMS schema evolution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27150) Drop single partition can also support direct sql

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27150?focusedWorklogId=855972=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855972
 ]

ASF GitHub Bot logged work on HIVE-27150:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 22:55
Start Date: 10/Apr/23 22:55
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on code in PR #4123:
URL: https://github.com/apache/hive/pull/4123#discussion_r1162150844


##
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestObjectStore.java:
##
@@ -498,6 +497,68 @@ public void testPartitionOpsWhenTableDoesNotExist() throws 
InvalidObjectExceptio
 }
   }
 
+  @Test
+  public void testDropPartitionByName() throws Exception {
+Database db1 = new DatabaseBuilder()
+.setName(DB1)
+.setDescription("description")
+.setLocation("locationurl")
+.build(conf);
+try (AutoCloseable c = deadline()) {
+  objectStore.createDatabase(db1);
+}
+StorageDescriptor sd = createFakeSd("location");
+HashMap tableParams = new HashMap<>();
+tableParams.put("EXTERNAL", "false");
+FieldSchema partitionKey1 = new FieldSchema("Country", 
ColumnType.STRING_TYPE_NAME, "");
+FieldSchema partitionKey2 = new FieldSchema("State", 
ColumnType.STRING_TYPE_NAME, "");
+Table tbl1 =
+new Table(TABLE1, DB1, "owner", 1, 2, 3, sd, 
Arrays.asList(partitionKey1, partitionKey2),
+tableParams, null, null, "MANAGED_TABLE");
+try (AutoCloseable c = deadline()) {
+  objectStore.createTable(tbl1);
+}
+HashMap partitionParams = new HashMap<>();
+partitionParams.put("PARTITION_LEVEL_PRIVILEGE", "true");
+List value1 = Arrays.asList("US", "CA");
+Partition part1 = new Partition(value1, DB1, TABLE1, 111, 111, sd, 
partitionParams);
+part1.setCatName(DEFAULT_CATALOG_NAME);
+try (AutoCloseable c = deadline()) {
+  objectStore.addPartition(part1);
+}
+List value2 = Arrays.asList("US", "MA");
+Partition part2 = new Partition(value2, DB1, TABLE1, 222, 222, sd, 
partitionParams);
+part2.setCatName(DEFAULT_CATALOG_NAME);
+try (AutoCloseable c = deadline()) {
+  objectStore.addPartition(part2);
+}
+
+List partitions;
+try (AutoCloseable c = deadline()) {
+  objectStore.dropPartition(DEFAULT_CATALOG_NAME, DB1, TABLE1, 
"country=US/state=CA");
+  partitions = objectStore.getPartitions(DEFAULT_CATALOG_NAME, DB1, 
TABLE1, 10);
+}
+Assert.assertEquals(1, partitions.size());
+Assert.assertEquals(222, partitions.get(0).getCreateTime());
+try (AutoCloseable c = deadline()) {
+  objectStore.dropPartition(DEFAULT_CATALOG_NAME, DB1, TABLE1, 
"country=US/state=MA");
+  partitions = objectStore.getPartitions(DEFAULT_CATALOG_NAME, DB1, 
TABLE1, 10);
+}
+Assert.assertEquals(0, partitions.size());
+
+try (AutoCloseable c = deadline()) {
+  // Illegal partName will do nothing, it doesn't matter
+  // because the real HMSHandler will guarantee the partName is legal and 
exists.
+  objectStore.dropPartition(DEFAULT_CATALOG_NAME, DB1, TABLE1, 
"country=US/state=NON_EXIST");
+  objectStore.dropPartition(DEFAULT_CATALOG_NAME, DB1, TABLE1, 
"country=US/st=CA");

Review Comment:
   If this API is returning false, we could assert false here.





Issue Time Tracking
---

Worklog Id: (was: 855972)
Time Spent: 4h 40m  (was: 4.5h)

> Drop single partition can also support direct sql
> -
>
> Key: HIVE-27150
> URL: https://issues.apache.org/jira/browse/HIVE-27150
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> *Background:*
> [HIVE-6980|https://issues.apache.org/jira/browse/HIVE-6980] supports direct 
> sql for drop_partitions, we can reuse this huge improvement in drop_partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27150) Drop single partition can also support direct sql

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27150?focusedWorklogId=855970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855970
 ]

ASF GitHub Bot logged work on HIVE-27150:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 22:39
Start Date: 10/Apr/23 22:39
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on code in PR #4123:
URL: https://github.com/apache/hive/pull/4123#discussion_r1162143358


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java:
##
@@ -5026,20 +5026,18 @@ private boolean drop_partition_common(RawStore ms, 
String catName, String db_nam
 verifyIsWritablePath(partPath);
   }
 
-  if (!ms.dropPartition(catName, db_name, tbl_name, part_vals)) {
-throw new MetaException("Unable to drop partition");
-  } else {
-if (!transactionalListeners.isEmpty()) {
+  String partName = Warehouse.makePartName(tbl.getPartitionKeys(), 
part_vals);
+  ms.dropPartition(catName, db_name, tbl_name, partName);

Review Comment:
   If dropPartition in the object store is not successful, then we should throw 
a meta exception, right? 





Issue Time Tracking
---

Worklog Id: (was: 855970)
Time Spent: 4.5h  (was: 4h 20m)

> Drop single partition can also support direct sql
> -
>
> Key: HIVE-27150
> URL: https://issues.apache.org/jira/browse/HIVE-27150
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> *Background:*
> [HIVE-6980|https://issues.apache.org/jira/browse/HIVE-6980] supports direct 
> sql for drop_partitions, we can reuse this huge improvement in drop_partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27189) Remove duplicate debug log in Hive.isSubDIr

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27189?focusedWorklogId=855968=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855968
 ]

ASF GitHub Bot logged work on HIVE-27189:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 22:32
Start Date: 10/Apr/23 22:32
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera merged PR #4167:
URL: https://github.com/apache/hive/pull/4167




Issue Time Tracking
---

Worklog Id: (was: 855968)
Time Spent: 1h 50m  (was: 1h 40m)

> Remove duplicate debug log in Hive.isSubDIr
> ---
>
> Key: HIVE-27189
> URL: https://issues.apache.org/jira/browse/HIVE-27189
> Project: Hive
>  Issue Type: Improvement
>Reporter: shuyouZZ
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In class {{{}org.apache.hadoop.hive.ql.metadata.HIve{}}}, invoke method 
> {{isSubDir}} will print twice
> {code:java}
> LOG.debug("The source path is " + fullF1 + " and the destination path is " + 
> fullF2);{code}
> we should remove the duplicate debug log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27164) Create Temp Txn Table As Select is failing at tablePath validation

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27164?focusedWorklogId=855966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855966
 ]

ASF GitHub Bot logged work on HIVE-27164:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 22:09
Start Date: 10/Apr/23 22:09
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4176:
URL: https://github.com/apache/hive/pull/4176#issuecomment-1502392192

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4176)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4176=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4176=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4176=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=CODE_SMELL)
 [11 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4176=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4176=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 855966)
Time Spent: 2h 50m  (was: 2h 40m)

> Create Temp Txn Table As Select is failing at tablePath validation
> --
>
> Key: HIVE-27164
> URL: https://issues.apache.org/jira/browse/HIVE-27164
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Naresh P R
>Assignee: Venugopal Reddy K
>Priority: Major
>  Labels: pull-request-available
> Attachments: mm_cttas.q
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> After HIVE-25303, every CTAS goes for  
> HiveMetaStore$HMSHandler#translate_table_dryrun() call to fetch table 
> location for CTAS queries which fails with following exception for temp 
> tables if MetastoreDefaultTransformer is set.
> {code:java}
> 2023-03-17 16:41:23,390 INFO  
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
> [pool-6-thread-196]: Starting translation for CreateTable for processor 
> HMSClient-@localhost with [EXTWRITE, EXTREAD, HIVEBUCKET2, HIVEFULLACIDREAD, 
> HIVEFULLACIDWRITE, HIVECACHEINVALIDATE, HIVEMANAGESTATS, 
> HIVEMANAGEDINSERTWRITE, HIVEMANAGEDINSERTREAD, HIVESQL, HIVEMQT, 
> HIVEONLYMQTWRITE] on table test_temp
> 2023-03-17 16:41:23,392 ERROR 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-6-thread-196]: 
> MetaException(message:Illegal location for managed table, it has to be within 
> database's managed location)

[jira] [Updated] (HIVE-27240) NPE on Hive Hook Proto Log Writer

2023-04-10 Thread Shubham Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Sharma updated HIVE-27240:
--
Description: 
Post deployment of Hive 4.0.0-alpha-1 observed NPE error blocking proto logger 
to serialize json on HiveHookEventProtoPartialBuilder
{code:java}
023-04-10T17:43:44,226 ERROR [Hive Hook Proto Log Writer 0]: 
hooks.HiveHookEventProtoPartialBuilder (:()) - Unexpected exception while 
serializing json.
java.lang.NullPointerException: null
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:986) 
~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:908) 
~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:1263) 
~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.outputStagePlans(ExplainTask.java:1408)
 ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.getJSONPlan(ExplainTask.java:367) 
~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.getJSONPlan(ExplainTask.java:268) 
~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.getExplainJSON(HiveHookEventProtoPartialBuilder.java:84)
 ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.addQueryObj(HiveHookEventProtoPartialBuilder.java:75)
 ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.build(HiveHookEventProtoPartialBuilder.java:55)
 ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.hooks.HiveProtoLoggingHook$EventLogger.writeEvent(HiveProtoLoggingHook.java:312)
 ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.hooks.HiveProtoLoggingHook$EventLogger.lambda$handle$1(HiveProtoLoggingHook.java:274)
 ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_362]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_362]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 ~[?:1.8.0_362]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 ~[?:1.8.0_362]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_362]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_362]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_362] {code}

ExplainTask isn't getting initialised as earlier leading to querystate as null 
value, attaching earlier init code from HiveProtoLoggingHook
{code:java}
explain.initialize(hookContext.getQueryState(), plan, null, null); {code}

  was:
Post deployment of Hive 4.0.0-alpha-1 observed NPE error blocking proto logger 
to serialize json on HiveHookEventProtoPartialBuilder
{code:java}
023-04-10T17:43:44,226 ERROR [Hive Hook Proto Log Writer 0]: 
hooks.HiveHookEventProtoPartialBuilder (:()) - Unexpected exception while 
serializing json.
java.lang.NullPointerException: null
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:986) 
~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:908) 
~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:1263) 
~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.outputStagePlans(ExplainTask.java:1408)
 ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.getJSONPlan(ExplainTask.java:367) 
~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.getJSONPlan(ExplainTask.java:268) 
~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.getExplainJSON(HiveHookEventProtoPartialBuilder.java:84)
 ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.addQueryObj(HiveHookEventProtoPartialBuilder.java:75)
 ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.build(HiveHookEventProtoPartialBuilder.java:55)
 ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 

[jira] [Created] (HIVE-27240) NPE on Hive Hook Proto Log Writer

2023-04-10 Thread Shubham Sharma (Jira)
Shubham Sharma created HIVE-27240:
-

 Summary: NPE on Hive Hook Proto Log Writer
 Key: HIVE-27240
 URL: https://issues.apache.org/jira/browse/HIVE-27240
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 4.0.0-alpha-2, 4.0.0-alpha-1
Reporter: Shubham Sharma
Assignee: Shubham Sharma


Post deployment of Hive 4.0.0-alpha-1 observed NPE error blocking proto logger 
to serialize json on HiveHookEventProtoPartialBuilder
{code:java}
023-04-10T17:43:44,226 ERROR [Hive Hook Proto Log Writer 0]: 
hooks.HiveHookEventProtoPartialBuilder (:()) - Unexpected exception while 
serializing json.
java.lang.NullPointerException: null
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:986) 
~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:908) 
~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:1263) 
~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.outputStagePlans(ExplainTask.java:1408)
 ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.getJSONPlan(ExplainTask.java:367) 
~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.exec.ExplainTask.getJSONPlan(ExplainTask.java:268) 
~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.getExplainJSON(HiveHookEventProtoPartialBuilder.java:84)
 ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.addQueryObj(HiveHookEventProtoPartialBuilder.java:75)
 ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.build(HiveHookEventProtoPartialBuilder.java:55)
 ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.hooks.HiveProtoLoggingHook$EventLogger.writeEvent(HiveProtoLoggingHook.java:312)
 ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
org.apache.hadoop.hive.ql.hooks.HiveProtoLoggingHook$EventLogger.lambda$handle$1(HiveProtoLoggingHook.java:274)
 ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_362]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_362]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 ~[?:1.8.0_362]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 ~[?:1.8.0_362]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_362]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_362]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_362] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855950
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 20:30
Start Date: 10/Apr/23 20:30
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4131:
URL: https://github.com/apache/hive/pull/4131#issuecomment-1502282051

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4131)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4131=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4131=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4131=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=CODE_SMELL)
 [12 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4131=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4131=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 855950)
Time Spent: 8h 50m  (was: 8h 40m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26537) Deprecate older APIs in the HMS

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26537?focusedWorklogId=855937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855937
 ]

ASF GitHub Bot logged work on HIVE-26537:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 18:39
Start Date: 10/Apr/23 18:39
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera commented on code in PR #3599:
URL: https://github.com/apache/hive/pull/3599#discussion_r1161975796


##
standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift:
##
@@ -2679,7 +2742,8 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest 
req)
 
   list get_partition_names(1:string db_name, 2:string tbl_name, 3:i16 
max_parts=-1)

Review Comment:
   We'll deprecate the older APIs in the next release.





Issue Time Tracking
---

Worklog Id: (was: 855937)
Time Spent: 7h 10m  (was: 7h)

> Deprecate older APIs in the HMS
> ---
>
> Key: HIVE-26537
> URL: https://issues.apache.org/jira/browse/HIVE-26537
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Critical
>  Labels: hive-4.0.0-must, pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> This Jira is to track the clean-up(deprecate older APIs and point the HMS 
> client to the newer APIs) work in the hive metastore server.
> More details will be added here soon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26127) INSERT OVERWRITE throws FileNotFound when destination partition is deleted

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26127?focusedWorklogId=855924=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855924
 ]

ASF GitHub Bot logged work on HIVE-26127:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 17:35
Start Date: 10/Apr/23 17:35
Worklog Time Spent: 10m 
  Work Description: vihangk1 opened a new pull request, #3561:
URL: https://github.com/apache/hive/pull/3561

   …tition is deleted
   
   ### What changes were proposed in this pull request?
   Backports HIVE-26127 to branch-3 from master.
   
   ### Why are the changes needed?
   The issue reported in HIVE-26127 also affects branch-3.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added a q test from the original patch.
   




Issue Time Tracking
---

Worklog Id: (was: 855924)
Time Spent: 2h 50m  (was: 2h 40m)

> INSERT OVERWRITE throws FileNotFound when destination partition is deleted 
> ---
>
> Key: HIVE-26127
> URL: https://issues.apache.org/jira/browse/HIVE-26127
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  # create external table src (col int) partitioned by (year int);
>  # create external table dest (col int) partitioned by (year int);
>  # insert into src partition (year=2022) values (1);
>  # insert into dest partition (year=2022) values (2);
>  # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022
>  # insert overwrite table dest select * from src;
> We will get FileNotFoundException as below.
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Directory 
> file:/home/yuwen/workdir/upstream/hive/itests/qtest/target/localfs/warehouse/ext_part/par=1
>  could not be cleaned up.
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:5387)
>     at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:5282)
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2657)
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$6(Hive.java:3143)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748) {code}
> It is because it call listStatus on a path doesn't exist. We should not fail 
> insert overwrite because there is nothing to be clean up.
> {code:java}
> fs.listStatus(path, pathFilter){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26127) INSERT OVERWRITE throws FileNotFound when destination partition is deleted

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26127?focusedWorklogId=855923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855923
 ]

ASF GitHub Bot logged work on HIVE-26127:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 17:35
Start Date: 10/Apr/23 17:35
Worklog Time Spent: 10m 
  Work Description: vihangk1 commented on PR #3561:
URL: https://github.com/apache/hive/pull/3561#issuecomment-1502096693

   Unfortunately, I missed the notification of PR being approved and it was 
marked stale. Let me reopen this.




Issue Time Tracking
---

Worklog Id: (was: 855923)
Time Spent: 2h 40m  (was: 2.5h)

> INSERT OVERWRITE throws FileNotFound when destination partition is deleted 
> ---
>
> Key: HIVE-26127
> URL: https://issues.apache.org/jira/browse/HIVE-26127
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  # create external table src (col int) partitioned by (year int);
>  # create external table dest (col int) partitioned by (year int);
>  # insert into src partition (year=2022) values (1);
>  # insert into dest partition (year=2022) values (2);
>  # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022
>  # insert overwrite table dest select * from src;
> We will get FileNotFoundException as below.
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Directory 
> file:/home/yuwen/workdir/upstream/hive/itests/qtest/target/localfs/warehouse/ext_part/par=1
>  could not be cleaned up.
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:5387)
>     at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:5282)
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2657)
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$6(Hive.java:3143)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748) {code}
> It is because it call listStatus on a path doesn't exist. We should not fail 
> insert overwrite because there is nothing to be clean up.
> {code:java}
> fs.listStatus(path, pathFilter){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27239) Upgrade async-profiler libs to recent version

2023-04-10 Thread Dmitriy Fingerman (Jira)
Dmitriy Fingerman created HIVE-27239:


 Summary: Upgrade async-profiler libs to recent version
 Key: HIVE-27239
 URL: https://issues.apache.org/jira/browse/HIVE-27239
 Project: Hive
  Issue Type: Improvement
 Environment: Apache Hive has ProfileServlet which uses async-profiler 
for profiling for various events. It would be good to upgrade async-profiler 
libs to a recent version.
Reporter: Dmitriy Fingerman
Assignee: Dmitriy Fingerman






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855905=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855905
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 16:14
Start Date: 10/Apr/23 16:14
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161627935


##
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactorWithAbortCleanupUsingCompactionCycle.java:
##
@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor;
+
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.ql.txn.compactor.TestCompactor;
+import org.junit.Before;
+
+public class TestCompactorWithAbortCleanupUsingCompactionCycle extends 
TestCompactor {

Review Comment:
   I don't think that should be supported anymore





Issue Time Tracking
---

Worklog Id: (was: 855905)
Time Spent: 11h 50m  (was: 11h 40m)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855900=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855900
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 16:10
Start Date: 10/Apr/23 16:10
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1161857242


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -361,6 +378,83 @@ private Table 
getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
 return table;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+return getStatsSource().equals(ICEBERG);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+if (table.currentSnapshot() != null) {
+  Path statsPath = getStatsPath(table);
+  if (getStatsSource().equals(ICEBERG)) {
+try (FileSystem fs = statsPath.getFileSystem(conf)) {
+  if (fs.exists(statsPath)) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+String statsPath = getStatsPath(table).toString();
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  List blobMetadata = reader.fileMetadata().blobs();
+  Map> collect =
+  
Streams.stream(reader.readAll(blobMetadata)).collect(Collectors.toMap(Pair::first,
+  blobMetadataByteBufferPair -> SerializationUtils.deserialize(
+  
ByteBuffers.toByteArray(blobMetadataByteBufferPair.second();
+  return collect.get(blobMetadata.get(0)).get(0).getStatsObj();
+} catch (IOException e) {
+  LOG.error(String.valueOf(e));
+}
+return null;
+  }
+
+
+  @Override
+  public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table 
table,
+  List colStats) {
+TableDesc tableDesc = Utilities.getTableDesc(table);
+Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties());
+String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId();
+byte[] serializeColStats = SerializationUtils.serialize((Serializable) 
colStats);
+
+try (PuffinWriter writer = 
Puffin.write(tbl.io().newOutputFile(getStatsPath(tbl).toString()))
+.createdBy("Hive").build()) {

Review Comment:
   Done



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -361,6 +378,83 @@ private Table 
getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
 return table;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+return getStatsSource().equals(ICEBERG);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+if (table.currentSnapshot() != null) {
+  Path statsPath = getStatsPath(table);
+  if (getStatsSource().equals(ICEBERG)) {
+try (FileSystem fs = statsPath.getFileSystem(conf)) {
+  if (fs.exists(statsPath)) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+String statsPath = getStatsPath(table).toString();
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  List blobMetadata = reader.fileMetadata().blobs();
+  Map> collect =
+  
Streams.stream(reader.readAll(blobMetadata)).collect(Collectors.toMap(Pair::first,
+  blobMetadataByteBufferPair -> SerializationUtils.deserialize(
+  
ByteBuffers.toByteArray(blobMetadataByteBufferPair.second();
+  return collect.get(blobMetadata.get(0)).get(0).getStatsObj();
+} catch (IOException e) {
+  LOG.error(String.valueOf(e));
+}
+return null;
+  }
+
+
+  @Override
+  public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table 
table,
+  List colStats) {
+

[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855901=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855901
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 16:10
Start Date: 10/Apr/23 16:10
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1161857489


##
ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java:
##
@@ -1069,8 +1069,12 @@ public static List getTableColumnStats(
 }
 if (fetchColStats && !colStatsToRetrieve.isEmpty()) {
   try {
-List colStat = 
Hive.get().getTableColumnStatistics(
-dbName, tabName, colStatsToRetrieve, false);
+List colStat;
+if (table != null && table.isNonNative() && 
table.getStorageHandler().canProvideColStatistics(table)) {

Review Comment:
   Fixed





Issue Time Tracking
---

Worklog Id: (was: 855901)
Time Spent: 8h 40m  (was: 8.5h)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855898=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855898
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 16:09
Start Date: 10/Apr/23 16:09
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1161856412


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -361,6 +378,83 @@ private Table 
getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
 return table;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+return getStatsSource().equals(ICEBERG);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+if (table.currentSnapshot() != null) {
+  Path statsPath = getStatsPath(table);
+  if (getStatsSource().equals(ICEBERG)) {
+try (FileSystem fs = statsPath.getFileSystem(conf)) {
+  if (fs.exists(statsPath)) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+String statsPath = getStatsPath(table).toString();
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  List blobMetadata = reader.fileMetadata().blobs();
+  Map> collect =
+  
Streams.stream(reader.readAll(blobMetadata)).collect(Collectors.toMap(Pair::first,
+  blobMetadataByteBufferPair -> SerializationUtils.deserialize(
+  
ByteBuffers.toByteArray(blobMetadataByteBufferPair.second();
+  return collect.get(blobMetadata.get(0)).get(0).getStatsObj();
+} catch (IOException e) {
+  LOG.error(String.valueOf(e));
+}
+return null;

Review Comment:
   Even in the absence of stats the query can run successfully.
   I was thinking it would be better to throw an error message rather than 
filing the entire query. 



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -361,6 +378,83 @@ private Table 
getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
 return table;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+return getStatsSource().equals(ICEBERG);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+if (table.currentSnapshot() != null) {
+  Path statsPath = getStatsPath(table);
+  if (getStatsSource().equals(ICEBERG)) {
+try (FileSystem fs = statsPath.getFileSystem(conf)) {
+  if (fs.exists(statsPath)) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+String statsPath = getStatsPath(table).toString();
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  List blobMetadata = reader.fileMetadata().blobs();
+  Map> collect =
+  
Streams.stream(reader.readAll(blobMetadata)).collect(Collectors.toMap(Pair::first,
+  blobMetadataByteBufferPair -> SerializationUtils.deserialize(
+  
ByteBuffers.toByteArray(blobMetadataByteBufferPair.second();
+  return collect.get(blobMetadata.get(0)).get(0).getStatsObj();
+} catch (IOException e) {
+  LOG.error(String.valueOf(e));
+}
+return null;
+  }
+
+
+  @Override
+  public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table 
table,
+  List colStats) {
+TableDesc tableDesc = Utilities.getTableDesc(table);
+Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties());

Review Comment:
   Done





Issue Time Tracking
---

Worklog Id: (was: 855898)
Time Spent: 8h 10m  (was: 8h)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> 

[jira] [Work logged] (HIVE-27184) Add class name profiling option in ProfileServlet

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27184?focusedWorklogId=855897=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855897
 ]

ASF GitHub Bot logged work on HIVE-27184:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 16:09
Start Date: 10/Apr/23 16:09
Worklog Time Spent: 10m 
  Work Description: difin commented on PR #4196:
URL: https://github.com/apache/hive/pull/4196#issuecomment-1502008694

   > Thanks @difin. LGTM.
   > 
   > minor comment: Need to check if parameters with "$" (e.g java classnames) 
should be decoded. It can be a separate ticket.
   
   Hi @rbalamohan,
   I checked what happens when profiling a method with "$".
   
   From a command line the profiling command works if you escape the dollar 
sign by adding "\" before "$" :
   `curl 
"http://localhost:10002/prof?output=tree=30=1=java.util.concurrent.locks.AbstractQueuedSynchronizer\$ConditionObject.awaitNanos"`
   
   Than it generates an output file with name like this:
   
`async-prof-pid-73790-java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos-4.tree`
   
   To open it in a linux shell the file also needs to be escaped.
   
[async-prof-pid-73790-java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos-4.tree.html.txt](https://github.com/apache/hive/files/11191887/async-prof-pid-73790-java.util.concurrent.locks.AbstractQueuedSynchronizer.ConditionObject.awaitNanos-4.tree.html.txt)
   
   The output inside the output file is fine. A sample output file is attached.




Issue Time Tracking
---

Worklog Id: (was: 855897)
Time Spent: 1h 10m  (was: 1h)

> Add class name profiling option in ProfileServlet
> -
>
> Key: HIVE-27184
> URL: https://issues.apache.org/jira/browse/HIVE-27184
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: performance, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> With async-profiler "-e classame.method", it is possible to profile specific 
> events. Currently profileServlet supports events like cpu, alloc, lock etc. 
> It will be good to enhance to support method name profiling as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855899
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 16:09
Start Date: 10/Apr/23 16:09
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1161857030


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -361,6 +378,83 @@ private Table 
getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
 return table;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+return getStatsSource().equals(ICEBERG);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+if (table.currentSnapshot() != null) {
+  Path statsPath = getStatsPath(table);
+  if (getStatsSource().equals(ICEBERG)) {
+try (FileSystem fs = statsPath.getFileSystem(conf)) {
+  if (fs.exists(statsPath)) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+String statsPath = getStatsPath(table).toString();
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  List blobMetadata = reader.fileMetadata().blobs();
+  Map> collect =
+  
Streams.stream(reader.readAll(blobMetadata)).collect(Collectors.toMap(Pair::first,
+  blobMetadataByteBufferPair -> SerializationUtils.deserialize(
+  
ByteBuffers.toByteArray(blobMetadataByteBufferPair.second();
+  return collect.get(blobMetadata.get(0)).get(0).getStatsObj();
+} catch (IOException e) {
+  LOG.error(String.valueOf(e));
+}
+return null;
+  }
+
+
+  @Override
+  public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table 
table,
+  List colStats) {
+TableDesc tableDesc = Utilities.getTableDesc(table);
+Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties());
+String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId();

Review Comment:
   Fixed, the null check is moved to canSetColStatistics.





Issue Time Tracking
---

Worklog Id: (was: 855899)
Time Spent: 8h 20m  (was: 8h 10m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855896=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855896
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 16:08
Start Date: 10/Apr/23 16:08
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1161856243


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -361,6 +378,83 @@ private Table 
getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
 return table;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+return getStatsSource().equals(ICEBERG);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+if (table.currentSnapshot() != null) {
+  Path statsPath = getStatsPath(table);
+  if (getStatsSource().equals(ICEBERG)) {
+try (FileSystem fs = statsPath.getFileSystem(conf)) {
+  if (fs.exists(statsPath)) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());

Review Comment:
   Done.



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -361,6 +378,83 @@ private Table 
getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
 return table;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+return getStatsSource().equals(ICEBERG);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+if (table.currentSnapshot() != null) {
+  Path statsPath = getStatsPath(table);
+  if (getStatsSource().equals(ICEBERG)) {
+try (FileSystem fs = statsPath.getFileSystem(conf)) {
+  if (fs.exists(statsPath)) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());

Review Comment:
   Done



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -361,6 +378,83 @@ private Table 
getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
 return table;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+return getStatsSource().equals(ICEBERG);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+if (table.currentSnapshot() != null) {
+  Path statsPath = getStatsPath(table);
+  if (getStatsSource().equals(ICEBERG)) {
+try (FileSystem fs = statsPath.getFileSystem(conf)) {
+  if (fs.exists(statsPath)) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+String statsPath = getStatsPath(table).toString();
+LOG.info("Using stats from puffin file at:" + statsPath);

Review Comment:
   Done





Issue Time Tracking
---

Worklog Id: (was: 855896)
Time Spent: 8h  (was: 7h 50m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855895
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 16:08
Start Date: 10/Apr/23 16:08
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1161856054


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -361,6 +378,83 @@ private Table 
getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
 return table;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+return getStatsSource().equals(ICEBERG);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());

Review Comment:
   Done



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -361,6 +378,83 @@ private Table 
getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
 return table;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+return getStatsSource().equals(ICEBERG);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+if (table.currentSnapshot() != null) {
+  Path statsPath = getStatsPath(table);
+  if (getStatsSource().equals(ICEBERG)) {

Review Comment:
   Fixed.





Issue Time Tracking
---

Worklog Id: (was: 855895)
Time Spent: 7h 50m  (was: 7h 40m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855893=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855893
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 16:08
Start Date: 10/Apr/23 16:08
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1161855860


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -2205,9 +2205,8 @@ public static enum ConfVars {
 "padding tolerance config (hive.exec.orc.block.padding.tolerance)."),
 HIVE_ORC_CODEC_POOL("hive.use.orc.codec.pool", false,
 "Whether to use codec pool in ORC. Disable if there are bugs with 
codec reuse."),
-HIVE_USE_STATS_FROM("hive.use.stats.from","iceberg","Use stats from 
iceberg table snapshot for query " +
-"planning. This has three values metastore, puffin and iceberg"),
-
+HIVE_ICEBERG_STATS_SOURCE("hive.iceberg.stats.source","iceberg","Use stats 
from iceberg table snapshot for query " +
+"planning. This has three values metastore and iceberg"),

Review Comment:
   Fixed. There will be only 2 values.





Issue Time Tracking
---

Worklog Id: (was: 855893)
Time Spent: 7h 40m  (was: 7.5h)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27238) Avoid Calcite Code generation for RelMetaDataProvider on every query

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27238?focusedWorklogId=855892=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855892
 ]

ASF GitHub Bot logged work on HIVE-27238:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 16:06
Start Date: 10/Apr/23 16:06
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4212:
URL: https://github.com/apache/hive/pull/4212#issuecomment-1502004561

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4212)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4212=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4212=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4212=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=CODE_SMELL)
 [1 Code 
Smell](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4212=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4212=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 855892)
Time Spent: 0.5h  (was: 20m)

> Avoid Calcite Code generation for RelMetaDataProvider on every query
> 
>
> Key: HIVE-27238
> URL: https://issues.apache.org/jira/browse/HIVE-27238
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In CalcitePlanner, we are instantiating a new CachingRelMetadataProvider on 
> every query.  Within the Calcite code, they keep the provider key to prevent 
> a new MetadataHandler class from being created.  But by generating a new 
> provider, the cache never gets a hit so we keep instantiating new 
> MetadataHandlers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27200) Backport HIVE-24928 to branch-3

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27200?focusedWorklogId=855878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855878
 ]

ASF GitHub Bot logged work on HIVE-27200:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 15:19
Start Date: 10/Apr/23 15:19
Worklog Time Spent: 10m 
  Work Description: sunchao merged PR #4175:
URL: https://github.com/apache/hive/pull/4175




Issue Time Tracking
---

Worklog Id: (was: 855878)
Time Spent: 0.5h  (was: 20m)

> Backport HIVE-24928 to branch-3
> ---
>
> Key: HIVE-27200
> URL: https://issues.apache.org/jira/browse/HIVE-27200
> Project: Hive
>  Issue Type: Improvement
>  Components: StorageHandler
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This is to backport HIVE-24928 so that for HiveStorageHandler table 'ANALYZE 
> TABLE ... COMPUTE STATISTICS' can use storagehandler to provide basic stats 
> with BasicStatsNoJobTask



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27200) Backport HIVE-24928 to branch-3

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27200?focusedWorklogId=855879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855879
 ]

ASF GitHub Bot logged work on HIVE-27200:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 15:19
Start Date: 10/Apr/23 15:19
Worklog Time Spent: 10m 
  Work Description: sunchao commented on PR #4175:
URL: https://github.com/apache/hive/pull/4175#issuecomment-1501946089

   Merged, thanks @yigress !




Issue Time Tracking
---

Worklog Id: (was: 855879)
Time Spent: 40m  (was: 0.5h)

> Backport HIVE-24928 to branch-3
> ---
>
> Key: HIVE-27200
> URL: https://issues.apache.org/jira/browse/HIVE-27200
> Project: Hive
>  Issue Type: Improvement
>  Components: StorageHandler
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is to backport HIVE-24928 so that for HiveStorageHandler table 'ANALYZE 
> TABLE ... COMPUTE STATISTICS' can use storagehandler to provide basic stats 
> with BasicStatsNoJobTask



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27200) Backport HIVE-24928 to branch-3

2023-04-10 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HIVE-27200.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

> Backport HIVE-24928 to branch-3
> ---
>
> Key: HIVE-27200
> URL: https://issues.apache.org/jira/browse/HIVE-27200
> Project: Hive
>  Issue Type: Improvement
>  Components: StorageHandler
>Reporter: Yi Zhang
>Assignee: Yi Zhang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is to backport HIVE-24928 so that for HiveStorageHandler table 'ANALYZE 
> TABLE ... COMPUTE STATISTICS' can use storagehandler to provide basic stats 
> with BasicStatsNoJobTask



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27173) Add method for Spark to be able to trigger DML events

2023-04-10 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-27173.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fix has been merged to master. 

> Add method for Spark to be able to trigger DML events
> -
>
> Key: HIVE-27173
> URL: https://issues.apache.org/jira/browse/HIVE-27173
> Project: Hive
>  Issue Type: Improvement
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Spark currently uses Hive.java from Hive as a convenient way to hide from the 
> having to deal with HMS Client and the thrift objects. Currently, Hive has 
> support for DML events (being able to generate events on DML operations but 
> does not expose a public method to do so). It has a private method that takes 
> in Hive objects like Table etc. Would be nice if we can have something with 
> more primitive datatypes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27173) Add method for Spark to be able to trigger DML events

2023-04-10 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam reassigned HIVE-27173:


Assignee: Naveen Gangam

> Add method for Spark to be able to trigger DML events
> -
>
> Key: HIVE-27173
> URL: https://issues.apache.org/jira/browse/HIVE-27173
> Project: Hive
>  Issue Type: Improvement
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Spark currently uses Hive.java from Hive as a convenient way to hide from the 
> having to deal with HMS Client and the thrift objects. Currently, Hive has 
> support for DML events (being able to generate events on DML operations but 
> does not expose a public method to do so). It has a private method that takes 
> in Hive objects like Table etc. Would be nice if we can have something with 
> more primitive datatypes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27173) Add method for Spark to be able to trigger DML events

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27173?focusedWorklogId=855861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855861
 ]

ASF GitHub Bot logged work on HIVE-27173:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 14:35
Start Date: 10/Apr/23 14:35
Worklog Time Spent: 10m 
  Work Description: nrg4878 merged PR #4201:
URL: https://github.com/apache/hive/pull/4201




Issue Time Tracking
---

Worklog Id: (was: 855861)
Time Spent: 40m  (was: 0.5h)

> Add method for Spark to be able to trigger DML events
> -
>
> Key: HIVE-27173
> URL: https://issues.apache.org/jira/browse/HIVE-27173
> Project: Hive
>  Issue Type: Improvement
>Reporter: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Spark currently uses Hive.java from Hive as a convenient way to hide from the 
> having to deal with HMS Client and the thrift objects. Currently, Hive has 
> support for DML events (being able to generate events on DML operations but 
> does not expose a public method to do so). It has a private method that takes 
> in Hive objects like Table etc. Would be nice if we can have something with 
> more primitive datatypes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27173) Add method for Spark to be able to trigger DML events

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27173?focusedWorklogId=855862=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855862
 ]

ASF GitHub Bot logged work on HIVE-27173:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 14:35
Start Date: 10/Apr/23 14:35
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on PR #4201:
URL: https://github.com/apache/hive/pull/4201#issuecomment-1501893908

   Thank you for the review @dengzhhu653 




Issue Time Tracking
---

Worklog Id: (was: 855862)
Time Spent: 50m  (was: 40m)

> Add method for Spark to be able to trigger DML events
> -
>
> Key: HIVE-27173
> URL: https://issues.apache.org/jira/browse/HIVE-27173
> Project: Hive
>  Issue Type: Improvement
>Reporter: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Spark currently uses Hive.java from Hive as a convenient way to hide from the 
> having to deal with HMS Client and the thrift objects. Currently, Hive has 
> support for DML events (being able to generate events on DML operations but 
> does not expose a public method to do so). It has a private method that takes 
> in Hive objects like Table etc. Would be nice if we can have something with 
> more primitive datatypes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855860
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 14:34
Start Date: 10/Apr/23 14:34
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161772310


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java:
##
@@ -82,7 +82,29 @@ public static ValidTxnList 
createValidTxnListForCleaner(GetOpenTxnsResponse txns
 bitSet.set(0, abortedTxns.length);
 //add ValidCleanerTxnList? - could be problematic for all the places that 
read it from
 // string as they'd have to know which object to instantiate
-return new ValidReadTxnList(abortedTxns, bitSet, highWaterMark, 
Long.MAX_VALUE);
+return new ValidReadTxnList(abortedTxns, bitSet, highWatermark, 
Long.MAX_VALUE);
+  }
+
+  public static ValidTxnList 
createValidTxnListForAbortedTxnCleaner(GetOpenTxnsResponse txns, long 
minOpenTxn) {

Review Comment:
   how is that different from `createValidTxnListForCleaner `, everything in 
Open_txns list `< minOpenTxn - 1` would be aborted





Issue Time Tracking
---

Worklog Id: (was: 855860)
Time Spent: 11h 40m  (was: 11.5h)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27223) Show Compactions failing with NPE

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27223?focusedWorklogId=855856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855856
 ]

ASF GitHub Bot logged work on HIVE-27223:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 14:28
Start Date: 10/Apr/23 14:28
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4204:
URL: https://github.com/apache/hive/pull/4204#issuecomment-1501886365

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4204)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4204=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4204=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4204=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=CODE_SMELL)
 [0 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4204=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4204=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 855856)
Time Spent: 0.5h  (was: 20m)

> Show Compactions failing with NPE
> -
>
> Key: HIVE-27223
> URL: https://issues.apache.org/jira/browse/HIVE-27223
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {noformat}
> java.lang.NullPointerException: null
>   at java.io.DataOutputStream.writeBytes(DataOutputStream.java:274) ~[?:?]
>   at 
> org.apache.hadoop.hive.ql.ddl.process.show.compactions.ShowCompactionsOperation.writeRow(ShowCompactionsOperation.java:135)
>  
>   at 
> org.apache.hadoop.hive.ql.ddl.process.show.compactions.ShowCompactionsOperation.execute(ShowCompactionsOperation.java:57)
>  
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360) 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855855
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 14:28
Start Date: 10/Apr/23 14:28
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161767398


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java:
##
@@ -82,7 +82,29 @@ public static ValidTxnList 
createValidTxnListForCleaner(GetOpenTxnsResponse txns
 bitSet.set(0, abortedTxns.length);
 //add ValidCleanerTxnList? - could be problematic for all the places that 
read it from
 // string as they'd have to know which object to instantiate
-return new ValidReadTxnList(abortedTxns, bitSet, highWaterMark, 
Long.MAX_VALUE);
+return new ValidReadTxnList(abortedTxns, bitSet, highWatermark, 
Long.MAX_VALUE);
+  }
+
+  public static ValidTxnList 
createValidTxnListForAbortedTxnCleaner(GetOpenTxnsResponse txns, long 
minOpenTxn) {
+long highWatermark = minOpenTxn - 1;
+long[] exceptions = new long[txns.getOpen_txnsSize()];
+int i = 0;
+BitSet abortedBits = BitSet.valueOf(txns.getAbortedBits());
+// getOpen_txns() guarantees that the list contains only aborted & open 
txns.
+// exceptions list must contain both txn types since validWriteIdList 
filters out the aborted ones and valid ones for that table.
+// If a txn is not in exception list, it is considered as a valid one and 
thought of as an uncompacted write.
+// See TxnHandler#getValidWriteIdsForTable() for more details.
+for(long txnId : txns.getOpen_txns()) {
+  if(txnId > highWatermark) {
+break;
+  }
+  exceptions[i] = txnId;
+  i++;
+}
+exceptions = Arrays.copyOf(exceptions, i);
+//add ValidCleanerTxnList? - could be problematic for all the places that 
read it from

Review Comment:
   is this a leftover comment?





Issue Time Tracking
---

Worklog Id: (was: 855855)
Time Spent: 11.5h  (was: 11h 20m)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855852
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 14:27
Start Date: 10/Apr/23 14:27
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161766383


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java:
##
@@ -82,7 +82,29 @@ public static ValidTxnList 
createValidTxnListForCleaner(GetOpenTxnsResponse txns
 bitSet.set(0, abortedTxns.length);
 //add ValidCleanerTxnList? - could be problematic for all the places that 
read it from
 // string as they'd have to know which object to instantiate
-return new ValidReadTxnList(abortedTxns, bitSet, highWaterMark, 
Long.MAX_VALUE);
+return new ValidReadTxnList(abortedTxns, bitSet, highWatermark, 
Long.MAX_VALUE);
+  }
+
+  public static ValidTxnList 
createValidTxnListForAbortedTxnCleaner(GetOpenTxnsResponse txns, long 
minOpenTxn) {
+long highWatermark = minOpenTxn - 1;
+long[] exceptions = new long[txns.getOpen_txnsSize()];
+int i = 0;
+BitSet abortedBits = BitSet.valueOf(txns.getAbortedBits());
+// getOpen_txns() guarantees that the list contains only aborted & open 
txns.
+// exceptions list must contain both txn types since validWriteIdList 
filters out the aborted ones and valid ones for that table.
+// If a txn is not in exception list, it is considered as a valid one and 
thought of as an uncompacted write.
+// See TxnHandler#getValidWriteIdsForTable() for more details.
+for(long txnId : txns.getOpen_txns()) {

Review Comment:
   txns.getOpen_txns() is sorted so no need for whole list scan





Issue Time Tracking
---

Worklog Id: (was: 855852)
Time Spent: 11h 20m  (was: 11h 10m)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855845=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855845
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 14:18
Start Date: 10/Apr/23 14:18
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161759075


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java:
##
@@ -82,7 +82,29 @@ public static ValidTxnList 
createValidTxnListForCleaner(GetOpenTxnsResponse txns
 bitSet.set(0, abortedTxns.length);
 //add ValidCleanerTxnList? - could be problematic for all the places that 
read it from
 // string as they'd have to know which object to instantiate
-return new ValidReadTxnList(abortedTxns, bitSet, highWaterMark, 
Long.MAX_VALUE);
+return new ValidReadTxnList(abortedTxns, bitSet, highWatermark, 
Long.MAX_VALUE);
+  }
+
+  public static ValidTxnList 
createValidTxnListForAbortedTxnCleaner(GetOpenTxnsResponse txns, long 
minOpenTxn) {
+long highWatermark = minOpenTxn - 1;
+long[] exceptions = new long[txns.getOpen_txnsSize()];
+int i = 0;
+BitSet abortedBits = BitSet.valueOf(txns.getAbortedBits());
+// getOpen_txns() guarantees that the list contains only aborted & open 
txns.
+// exceptions list must contain both txn types since validWriteIdList 
filters out the aborted ones and valid ones for that table.
+// If a txn is not in exception list, it is considered as a valid one and 
thought of as an uncompacted write.
+// See TxnHandler#getValidWriteIdsForTable() for more details.
+for(long txnId : txns.getOpen_txns()) {

Review Comment:
   reformat, missing space





Issue Time Tracking
---

Worklog Id: (was: 855845)
Time Spent: 11h 10m  (was: 11h)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855844=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855844
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 14:18
Start Date: 10/Apr/23 14:18
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161758527


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java:
##
@@ -60,20 +60,20 @@
 public class TxnUtils {
   private static final Logger LOG = LoggerFactory.getLogger(TxnUtils.class);
 
-  public static ValidTxnList createValidTxnListForCleaner(GetOpenTxnsResponse 
txns, long minOpenTxnGLB) {
-long highWaterMark = minOpenTxnGLB - 1;
+  public static ValidTxnList 
createValidTxnListForCompactionCleaner(GetOpenTxnsResponse txns, long 
minOpenTxn) {
+long highWatermark = minOpenTxn - 1;
 long[] abortedTxns = new long[txns.getOpen_txnsSize()];
 BitSet abortedBits = BitSet.valueOf(txns.getAbortedBits());
 int i = 0;
 for(long txnId : txns.getOpen_txns()) {
-  if(txnId > highWaterMark) {
+  if(txnId > highWatermark) {
 break;
   }
   if(abortedBits.get(i)) {

Review Comment:
   space





Issue Time Tracking
---

Worklog Id: (was: 855844)
Time Spent: 11h  (was: 10h 50m)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855843=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855843
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 14:17
Start Date: 10/Apr/23 14:17
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161758279


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java:
##
@@ -60,20 +60,20 @@
 public class TxnUtils {
   private static final Logger LOG = LoggerFactory.getLogger(TxnUtils.class);
 
-  public static ValidTxnList createValidTxnListForCleaner(GetOpenTxnsResponse 
txns, long minOpenTxnGLB) {
-long highWaterMark = minOpenTxnGLB - 1;
+  public static ValidTxnList 
createValidTxnListForCompactionCleaner(GetOpenTxnsResponse txns, long 
minOpenTxn) {
+long highWatermark = minOpenTxn - 1;
 long[] abortedTxns = new long[txns.getOpen_txnsSize()];
 BitSet abortedBits = BitSet.valueOf(txns.getAbortedBits());
 int i = 0;
 for(long txnId : txns.getOpen_txns()) {
-  if(txnId > highWaterMark) {
+  if(txnId > highWatermark) {

Review Comment:
   space





Issue Time Tracking
---

Worklog Id: (was: 855843)
Time Spent: 10h 50m  (was: 10h 40m)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855799=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855799
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 12:37
Start Date: 10/Apr/23 12:37
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1161623163


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -2205,9 +2205,8 @@ public static enum ConfVars {
 "padding tolerance config (hive.exec.orc.block.padding.tolerance)."),
 HIVE_ORC_CODEC_POOL("hive.use.orc.codec.pool", false,
 "Whether to use codec pool in ORC. Disable if there are bugs with 
codec reuse."),
-HIVE_USE_STATS_FROM("hive.use.stats.from","iceberg","Use stats from 
iceberg table snapshot for query " +
-"planning. This has three values metastore, puffin and iceberg"),
-
+HIVE_ICEBERG_STATS_SOURCE("hive.iceberg.stats.source","iceberg","Use stats 
from iceberg table snapshot for query " +
+"planning. This has three values metastore and iceberg"),

Review Comment:
   > This has three values metastore and iceberg
   
   what is the third value,?



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -361,6 +378,83 @@ private Table 
getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
 return table;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+return getStatsSource().equals(ICEBERG);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+if (table.currentSnapshot() != null) {
+  Path statsPath = getStatsPath(table);
+  if (getStatsSource().equals(ICEBERG)) {

Review Comment:
   can use ```canSetColStatistics()```



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -361,6 +378,83 @@ private Table 
getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
 return table;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+return getStatsSource().equals(ICEBERG);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+if (table.currentSnapshot() != null) {
+  Path statsPath = getStatsPath(table);
+  if (getStatsSource().equals(ICEBERG)) {
+try (FileSystem fs = statsPath.getFileSystem(conf)) {
+  if (fs.exists(statsPath)) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+String statsPath = getStatsPath(table).toString();
+LOG.info("Using stats from puffin file at:" + statsPath);

Review Comment:
   Logger format: 
   ```
   LOG.info("Using stats from puffin file at: {}", statsPath);
   ```



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -361,6 +378,83 @@ private Table 
getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
 return table;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+return getStatsSource().equals(ICEBERG);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+if (table.currentSnapshot() != null) {
+  Path statsPath = getStatsPath(table);
+  if (getStatsSource().equals(ICEBERG)) {
+try (FileSystem fs = statsPath.getFileSystem(conf)) {
+  if (fs.exists(statsPath)) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
+Table table = Catalogs.loadTable(conf, 
Utilities.getTableDesc(hmsTable).getProperties());
+String statsPath = getStatsPath(table).toString();
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  List blobMetadata = reader.fileMetadata().blobs();
+  Map> collect =
+  

[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855793
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 12:14
Start Date: 10/Apr/23 12:14
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161672912


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java:
##
@@ -516,6 +516,19 @@ Set findPotentialCompactions(int 
abortedThreshold, long abortedT
   @RetrySemantics.ReadOnly
   List findReadyToClean(long minOpenTxnWaterMark, long 
retentionTime) throws MetaException;
 
+  /**
+   * Find the aborted entries in TXN_COMPONENTS which can be used to
+   * clean directories belonging to transactions in aborted state.
+   * @param abortedTimeThreshold Age of table/partition's oldest aborted 
transaction involving a given table
+   *or partition that will trigger cleanup.
+   * @param abortedThreshold Number of aborted transactions involving a given 
table or partition
+   * that will trigger cleanup.
+   * @return Information of potential abort items that needs to be cleaned.
+   * @throws MetaException
+   */
+  @RetrySemantics.ReadOnly
+  List findReadyToCleanForAborts(long abortedTimeThreshold, int 
abortedThreshold) throws MetaException;

Review Comment:
   maybe `findReadyToCleanAborts` ?





Issue Time Tracking
---

Worklog Id: (was: 855793)
Time Spent: 10.5h  (was: 10h 20m)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855794
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 12:14
Start Date: 10/Apr/23 12:14
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161673317


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java:
##
@@ -541,6 +554,15 @@ Set findPotentialCompactions(int 
abortedThreshold, long abortedT
   @RetrySemantics.CannotRetry
   void markCleaned(CompactionInfo info) throws MetaException;
 
+  /**
+   * This will remove an aborted entries from TXN_COMPONENTS table after
+   * the aborted directories are removed from the filesystem.
+   * @param info info on the aborted directories cleanup that needs to be 
removed
+   * @throws MetaException
+   */
+  @RetrySemantics.CannotRetry
+  void markCleanedForAborts(AcidTxnInfo info) throws MetaException;

Review Comment:
   I wouldn't create a separate API just for that





Issue Time Tracking
---

Worklog Id: (was: 855794)
Time Spent: 10h 40m  (was: 10.5h)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855792=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855792
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 12:12
Start Date: 10/Apr/23 12:12
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161672116


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java:
##
@@ -516,6 +516,19 @@ Set findPotentialCompactions(int 
abortedThreshold, long abortedT
   @RetrySemantics.ReadOnly
   List findReadyToClean(long minOpenTxnWaterMark, long 
retentionTime) throws MetaException;
 
+  /**
+   * Find the aborted entries in TXN_COMPONENTS which can be used to
+   * clean directories belonging to transactions in aborted state.
+   * @param abortedTimeThreshold Age of table/partition's oldest aborted 
transaction involving a given table
+   *or partition that will trigger cleanup.
+   * @param abortedThreshold Number of aborted transactions involving a given 
table or partition
+   * that will trigger cleanup.
+   * @return Information of potential abort items that needs to be cleaned.
+   * @throws MetaException
+   */
+  @RetrySemantics.ReadOnly
+  List findReadyToCleanForAborts(long abortedTimeThreshold, int 
abortedThreshold) throws MetaException;

Review Comment:
   I wouldn't introduce new API for that





Issue Time Tracking
---

Worklog Id: (was: 855792)
Time Spent: 10h 20m  (was: 10h 10m)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855791
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 12:10
Start Date: 10/Apr/23 12:10
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161670816


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java:
##
@@ -702,6 +699,102 @@ public void markCleaned(CompactionInfo info) throws 
MetaException {
 }
   }
 
+  @Override
+  public void markCleanedForAborts(AcidTxnInfo info) throws MetaException {
+// Do cleanup of TXN_COMPONENTS table
+LOG.debug("Running markCleanedForAborts with CompactionInfo: {}", info);
+try {
+  Connection dbConn = null;
+  try {
+dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED, 
connPoolCompaction);
+markAbortCleaned(dbConn, info);
+LOG.debug("Going to commit");
+dbConn.commit();
+  } catch (SQLException e) {
+LOG.error("Unable to delete from txn components due to {}", 
e.getMessage());
+LOG.debug("Going to rollback");
+rollbackDBConn(dbConn);
+checkRetryable(e, "markCleanedForAborts(" + info + ")");
+throw new MetaException("Unable to connect to transaction database " +
+e.getMessage());
+  } finally {
+closeDbConn(dbConn);
+  }
+} catch (RetryException e) {
+  markCleanedForAborts(info);
+}
+  }
+
+  private void markAbortCleaned(Connection dbConn, AcidTxnInfo info) throws 
MetaException, RetryException {

Review Comment:
   rename to `removeTxnComponents`





Issue Time Tracking
---

Worklog Id: (was: 855791)
Time Spent: 10h 10m  (was: 10h)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855789=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855789
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 12:03
Start Date: 10/Apr/23 12:03
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161666811


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidTxnInfo.java:
##
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.txn;
+
+import org.apache.commons.lang3.builder.ToStringBuilder;
+import org.apache.hadoop.hive.common.ValidCompactorWriteIdList;
+import org.apache.hadoop.hive.metastore.api.TableValidWriteIds;
+
+import java.util.Set;
+
+/**
+ * A class used for encapsulating information of abort-cleanup activities and 
compaction activities.
+ */
+public class AcidTxnInfo {

Review Comment:
   Can we reuse CompactionInfo object and not create another entity?





Issue Time Tracking
---

Worklog Id: (was: 855789)
Time Spent: 10h  (was: 9h 50m)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855788
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 12:00
Start Date: 10/Apr/23 12:00
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161665575


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/TaskHandlerFactory.java:
##
@@ -43,7 +44,14 @@ private TaskHandlerFactory() {
 
   public List getHandlers(HiveConf conf, TxnStore txnHandler, 
MetadataCache metadataCache,
   boolean metricsEnabled, 
FSRemover fsRemover) {
-return Arrays.asList(new CompactionCleaner(conf, txnHandler, metadataCache,
+boolean useAbortHandler = MetastoreConf.getBoolVar(conf, 
MetastoreConf.ConfVars.COMPACTOR_CLEAN_ABORTS_USING_CLEANER);
+List taskHandlers = new ArrayList<>();
+if (useAbortHandler) {

Review Comment:
   no need for that check, from now on use Cleaner to handle aborts





Issue Time Tracking
---

Worklog Id: (was: 855788)
Time Spent: 9h 50m  (was: 9h 40m)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855785=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855785
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 11:58
Start Date: 10/Apr/23 11:58
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161664755


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/CompactionCleaner.java:
##
@@ -259,49 +247,11 @@ private void cleanUsingAcidDir(CompactionInfo ci, String 
location, long minOpenT
  */
 
 // Creating 'reader' list since we are interested in the set of 'obsolete' 
files
-ValidReaderWriteIdList validWriteIdList = getValidCleanerWriteIdList(ci, 
validTxnList);
-LOG.debug("Cleaning based on writeIdList: {}", validWriteIdList);
-
-Path path = new Path(location);
-FileSystem fs = path.getFileSystem(conf);
-
-// Collect all the files/dirs
-Map dirSnapshots = 
AcidUtils.getHdfsDirSnapshotsForCleaner(fs, path);
-AcidDirectory dir = AcidUtils.getAcidState(fs, path, conf, 
validWriteIdList, Ref.from(false), false,
-dirSnapshots);
+ValidReaderWriteIdList validWriteIdList = 
getValidCleanerWriteIdListForCompactionCleaner(ci, validTxnList);
 Table table = metadataCache.computeIfAbsent(ci.getFullTableName(), () -> 
resolveTable(ci.dbname, ci.tableName));
-boolean isDynPartAbort = CompactorUtil.isDynPartAbort(table, ci.partName);
-
-List obsoleteDirs = CompactorUtil.getObsoleteDirs(dir, 
isDynPartAbort);
-if (isDynPartAbort || dir.hasUncompactedAborts()) {
-  ci.setWriteIds(dir.hasUncompactedAborts(), dir.getAbortedWriteIds());
-}
-
-List deleted = fsRemover.clean(new 
CleanupRequestBuilder().setLocation(location)
-
.setDbName(ci.dbname).setFullPartitionName(ci.getFullPartitionName())
-.setRunAs(ci.runAs).setObsoleteDirs(obsoleteDirs).setPurge(true)
-.build());
-
-if (!deleted.isEmpty()) {
-  AcidMetricService.updateMetricsFromCleaner(ci.dbname, ci.tableName, 
ci.partName, dir.getObsolete(), conf,
-  txnHandler);
-}
-
-// Make sure there are no leftovers below the compacted watermark
-boolean success = false;
-conf.set(ValidTxnList.VALID_TXNS_KEY, new ValidReadTxnList().toString());
-dir = AcidUtils.getAcidState(fs, path, conf, new ValidReaderWriteIdList(
-ci.getFullTableName(), new long[0], new BitSet(), 
ci.highestWriteId, Long.MAX_VALUE),
-Ref.from(false), false, dirSnapshots);
+LOG.debug("Cleaning based on writeIdList: {}", validWriteIdList);
 
-List remained = subtract(CompactorUtil.getObsoleteDirs(dir, 
isDynPartAbort), deleted);
-if (!remained.isEmpty()) {
-  LOG.warn("{} Remained {} obsolete directories from {}. {}",
-  idWatermark(ci), remained.size(), location, 
CompactorUtil.getDebugInfo(remained));
-} else {
-  LOG.debug("{} All cleared below the watermark: {} from {}", 
idWatermark(ci), ci.highestWriteId, location);
-  success = true;
-}
+boolean success = cleanAndVerifyObsoleteDirectories(ci, location, 
validWriteIdList, table);

Review Comment:
   1 line below no need to check for `isDynPartAbort `





Issue Time Tracking
---

Worklog Id: (was: 855785)
Time Spent: 9h 40m  (was: 9.5h)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855783=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855783
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 11:56
Start Date: 10/Apr/23 11:56
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161658033


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/CompactionCleaner.java:
##
@@ -337,18 +287,9 @@ private static String idWatermark(CompactionInfo ci) {
 return " id=" + ci.id;
   }
 
-  private ValidReaderWriteIdList getValidCleanerWriteIdList(CompactionInfo ci, 
ValidTxnList validTxnList)
+  private ValidReaderWriteIdList 
getValidCleanerWriteIdListForCompactionCleaner(CompactionInfo ci, ValidTxnList 
validTxnList)

Review Comment:
   why rename here, just override the parent method and call super?





Issue Time Tracking
---

Worklog Id: (was: 855783)
Time Spent: 9h 20m  (was: 9h 10m)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855782=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855782
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 11:54
Start Date: 10/Apr/23 11:54
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161662278


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/TaskHandler.java:
##
@@ -81,4 +102,63 @@ protected Partition resolvePartition(String dbName, String 
tableName, String par
   return null;
 }
   }
+
+  protected ValidReaderWriteIdList getValidCleanerWriteIdList(AcidTxnInfo 
acidTxnInfo, ValidTxnList validTxnList)

Review Comment:
   should we rename this method to getValidWriteIdList?





Issue Time Tracking
---

Worklog Id: (was: 855782)
Time Spent: 9h 10m  (was: 9h)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855784
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 11:58
Start Date: 10/Apr/23 11:58
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161653684


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor.handler;
+
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
+import org.apache.hadoop.hive.metastore.metrics.PerfLogger;
+import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable;
+import org.apache.hadoop.hive.ql.txn.compactor.FSRemover;
+import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static java.util.Objects.isNull;
+
+/**
+ * Abort-cleanup based implementation of TaskHandler.
+ * Provides implementation of creation of abort clean tasks.
+ */
+class AbortedTxnCleaner extends TaskHandler {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(AbortedTxnCleaner.class.getName());
+
+  public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler,
+   MetadataCache metadataCache, boolean metricsEnabled,
+   FSRemover fsRemover) {
+super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover);
+  }
+
+  /**
+   The following cleanup is based on the following idea - 
+   1. Aborted cleanup is independent of compaction. This is because 
directories which are written by
+  aborted txns are not visible by any open txns. It is only visible while 
determining the AcidState (which
+  only sees the aborted deltas and does not read the file).
+
+   The following algorithm is used to clean the set of aborted directories - 

+  a. Find the list of entries which are suitable for cleanup (This is done 
in {@link TxnStore#findReadyToCleanForAborts(long, int)}).
+  b. If the table/partition does not exist, then remove the associated 
aborted entry in TXN_COMPONENTS table. 
+  c. Get the AcidState of the table by using the min open txnID, database 
name, tableName, partition name, highest write ID 
+  d. Fetch the aborted directories and delete the directories. 
+  e. Fetch the aborted write IDs from the AcidState and use it to delete 
the associated metadata in the TXN_COMPONENTS table.
+   **/
+  @Override
+  public List getTasks() throws MetaException {
+int abortedThreshold = HiveConf.getIntVar(conf,
+  HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD);
+long abortedTimeThreshold = HiveConf
+  .getTimeVar(conf, 
HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD,
+  TimeUnit.MILLISECONDS);
+List readyToCleanAborts = 
txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold);
+
+if (!readyToCleanAborts.isEmpty()) {
+  return readyToCleanAborts.stream().map(ci -> 
ThrowingRunnable.unchecked(() ->
+  clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE, 

[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855780
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 11:46
Start Date: 10/Apr/23 11:46
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161658033


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/CompactionCleaner.java:
##
@@ -337,18 +287,9 @@ private static String idWatermark(CompactionInfo ci) {
 return " id=" + ci.id;
   }
 
-  private ValidReaderWriteIdList getValidCleanerWriteIdList(CompactionInfo ci, 
ValidTxnList validTxnList)
+  private ValidReaderWriteIdList 
getValidCleanerWriteIdListForCompactionCleaner(CompactionInfo ci, ValidTxnList 
validTxnList)

Review Comment:
   why rename here?





Issue Time Tracking
---

Worklog Id: (was: 855780)
Time Spent: 9h  (was: 8h 50m)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855777
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 11:42
Start Date: 10/Apr/23 11:42
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161655781


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor.handler;
+
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
+import org.apache.hadoop.hive.metastore.metrics.PerfLogger;
+import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable;
+import org.apache.hadoop.hive.ql.txn.compactor.FSRemover;
+import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static java.util.Objects.isNull;
+
+/**
+ * Abort-cleanup based implementation of TaskHandler.
+ * Provides implementation of creation of abort clean tasks.
+ */
+class AbortedTxnCleaner extends TaskHandler {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(AbortedTxnCleaner.class.getName());
+
+  public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler,
+   MetadataCache metadataCache, boolean metricsEnabled,
+   FSRemover fsRemover) {
+super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover);
+  }
+
+  /**
+   The following cleanup is based on the following idea - 
+   1. Aborted cleanup is independent of compaction. This is because 
directories which are written by
+  aborted txns are not visible by any open txns. It is only visible while 
determining the AcidState (which
+  only sees the aborted deltas and does not read the file).
+
+   The following algorithm is used to clean the set of aborted directories - 

+  a. Find the list of entries which are suitable for cleanup (This is done 
in {@link TxnStore#findReadyToCleanForAborts(long, int)}).
+  b. If the table/partition does not exist, then remove the associated 
aborted entry in TXN_COMPONENTS table. 
+  c. Get the AcidState of the table by using the min open txnID, database 
name, tableName, partition name, highest write ID 
+  d. Fetch the aborted directories and delete the directories. 
+  e. Fetch the aborted write IDs from the AcidState and use it to delete 
the associated metadata in the TXN_COMPONENTS table.
+   **/
+  @Override
+  public List getTasks() throws MetaException {
+int abortedThreshold = HiveConf.getIntVar(conf,
+  HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD);
+long abortedTimeThreshold = HiveConf
+  .getTimeVar(conf, 
HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD,
+  TimeUnit.MILLISECONDS);
+List readyToCleanAborts = 
txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold);
+
+if (!readyToCleanAborts.isEmpty()) {
+  return readyToCleanAborts.stream().map(ci -> 
ThrowingRunnable.unchecked(() ->
+  clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE, 

[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855776=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855776
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 11:41
Start Date: 10/Apr/23 11:41
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161655435


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor.handler;
+
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
+import org.apache.hadoop.hive.metastore.metrics.PerfLogger;
+import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable;
+import org.apache.hadoop.hive.ql.txn.compactor.FSRemover;
+import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static java.util.Objects.isNull;
+
+/**
+ * Abort-cleanup based implementation of TaskHandler.
+ * Provides implementation of creation of abort clean tasks.
+ */
+class AbortedTxnCleaner extends TaskHandler {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(AbortedTxnCleaner.class.getName());
+
+  public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler,
+   MetadataCache metadataCache, boolean metricsEnabled,
+   FSRemover fsRemover) {
+super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover);
+  }
+
+  /**
+   The following cleanup is based on the following idea - 
+   1. Aborted cleanup is independent of compaction. This is because 
directories which are written by
+  aborted txns are not visible by any open txns. It is only visible while 
determining the AcidState (which
+  only sees the aborted deltas and does not read the file).
+
+   The following algorithm is used to clean the set of aborted directories - 

+  a. Find the list of entries which are suitable for cleanup (This is done 
in {@link TxnStore#findReadyToCleanForAborts(long, int)}).
+  b. If the table/partition does not exist, then remove the associated 
aborted entry in TXN_COMPONENTS table. 
+  c. Get the AcidState of the table by using the min open txnID, database 
name, tableName, partition name, highest write ID 
+  d. Fetch the aborted directories and delete the directories. 
+  e. Fetch the aborted write IDs from the AcidState and use it to delete 
the associated metadata in the TXN_COMPONENTS table.
+   **/
+  @Override
+  public List getTasks() throws MetaException {
+int abortedThreshold = HiveConf.getIntVar(conf,
+  HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD);
+long abortedTimeThreshold = HiveConf
+  .getTimeVar(conf, 
HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD,
+  TimeUnit.MILLISECONDS);
+List readyToCleanAborts = 
txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold);
+
+if (!readyToCleanAborts.isEmpty()) {
+  return readyToCleanAborts.stream().map(ci -> 
ThrowingRunnable.unchecked(() ->
+  clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE, 

[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855775
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 11:38
Start Date: 10/Apr/23 11:38
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161653684


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor.handler;
+
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
+import org.apache.hadoop.hive.metastore.metrics.PerfLogger;
+import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable;
+import org.apache.hadoop.hive.ql.txn.compactor.FSRemover;
+import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static java.util.Objects.isNull;
+
+/**
+ * Abort-cleanup based implementation of TaskHandler.
+ * Provides implementation of creation of abort clean tasks.
+ */
+class AbortedTxnCleaner extends TaskHandler {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(AbortedTxnCleaner.class.getName());
+
+  public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler,
+   MetadataCache metadataCache, boolean metricsEnabled,
+   FSRemover fsRemover) {
+super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover);
+  }
+
+  /**
+   The following cleanup is based on the following idea - 
+   1. Aborted cleanup is independent of compaction. This is because 
directories which are written by
+  aborted txns are not visible by any open txns. It is only visible while 
determining the AcidState (which
+  only sees the aborted deltas and does not read the file).
+
+   The following algorithm is used to clean the set of aborted directories - 

+  a. Find the list of entries which are suitable for cleanup (This is done 
in {@link TxnStore#findReadyToCleanForAborts(long, int)}).
+  b. If the table/partition does not exist, then remove the associated 
aborted entry in TXN_COMPONENTS table. 
+  c. Get the AcidState of the table by using the min open txnID, database 
name, tableName, partition name, highest write ID 
+  d. Fetch the aborted directories and delete the directories. 
+  e. Fetch the aborted write IDs from the AcidState and use it to delete 
the associated metadata in the TXN_COMPONENTS table.
+   **/
+  @Override
+  public List getTasks() throws MetaException {
+int abortedThreshold = HiveConf.getIntVar(conf,
+  HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD);
+long abortedTimeThreshold = HiveConf
+  .getTimeVar(conf, 
HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD,
+  TimeUnit.MILLISECONDS);
+List readyToCleanAborts = 
txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold);
+
+if (!readyToCleanAborts.isEmpty()) {
+  return readyToCleanAborts.stream().map(ci -> 
ThrowingRunnable.unchecked(() ->
+  clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE, 

[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855774=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855774
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 11:34
Start Date: 10/Apr/23 11:34
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161652023


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor.handler;
+
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
+import org.apache.hadoop.hive.metastore.metrics.PerfLogger;
+import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable;
+import org.apache.hadoop.hive.ql.txn.compactor.FSRemover;
+import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static java.util.Objects.isNull;
+
+/**
+ * Abort-cleanup based implementation of TaskHandler.
+ * Provides implementation of creation of abort clean tasks.
+ */
+class AbortedTxnCleaner extends TaskHandler {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(AbortedTxnCleaner.class.getName());
+
+  public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler,
+   MetadataCache metadataCache, boolean metricsEnabled,
+   FSRemover fsRemover) {
+super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover);
+  }
+
+  /**
+   The following cleanup is based on the following idea - 
+   1. Aborted cleanup is independent of compaction. This is because 
directories which are written by
+  aborted txns are not visible by any open txns. It is only visible while 
determining the AcidState (which
+  only sees the aborted deltas and does not read the file).
+
+   The following algorithm is used to clean the set of aborted directories - 

+  a. Find the list of entries which are suitable for cleanup (This is done 
in {@link TxnStore#findReadyToCleanForAborts(long, int)}).
+  b. If the table/partition does not exist, then remove the associated 
aborted entry in TXN_COMPONENTS table. 
+  c. Get the AcidState of the table by using the min open txnID, database 
name, tableName, partition name, highest write ID 
+  d. Fetch the aborted directories and delete the directories. 
+  e. Fetch the aborted write IDs from the AcidState and use it to delete 
the associated metadata in the TXN_COMPONENTS table.
+   **/
+  @Override
+  public List getTasks() throws MetaException {
+int abortedThreshold = HiveConf.getIntVar(conf,
+  HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD);
+long abortedTimeThreshold = HiveConf
+  .getTimeVar(conf, 
HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD,
+  TimeUnit.MILLISECONDS);
+List readyToCleanAborts = 
txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold);
+
+if (!readyToCleanAborts.isEmpty()) {
+  return readyToCleanAborts.stream().map(ci -> 
ThrowingRunnable.unchecked(() ->
+  clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE, 

[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855773
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 11:26
Start Date: 10/Apr/23 11:26
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161643395


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor.handler;
+
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
+import org.apache.hadoop.hive.metastore.metrics.PerfLogger;
+import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable;
+import org.apache.hadoop.hive.ql.txn.compactor.FSRemover;
+import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static java.util.Objects.isNull;
+
+/**
+ * Abort-cleanup based implementation of TaskHandler.
+ * Provides implementation of creation of abort clean tasks.
+ */
+class AbortedTxnCleaner extends AcidTxnCleaner {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(AbortedTxnCleaner.class.getName());
+
+  public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler,
+   MetadataCache metadataCache, boolean metricsEnabled,
+   FSRemover fsRemover) {
+super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover);
+  }
+
+  /**
+   The following cleanup is based on the following idea - 
+   1. Aborted cleanup is independent of compaction. This is because 
directories which are written by
+  aborted txns are not visible by any open txns. It is only visible while 
determining the AcidState (which
+  only sees the aborted deltas and does not read the file).
+
+   The following algorithm is used to clean the set of aborted directories - 

+  a. Find the list of entries which are suitable for cleanup (This is done 
in {@link TxnStore#findReadyToCleanForAborts(long, int)}).
+  b. If the table/partition does not exist, then remove the associated 
aborted entry in TXN_COMPONENTS table. 
+  c. Get the AcidState of the table by using the min open txnID, database 
name, tableName, partition name, highest write ID 
+  d. Fetch the aborted directories and delete the directories. 
+  e. Fetch the aborted write IDs from the AcidState and use it to delete 
the associated metadata in the TXN_COMPONENTS table.
+   **/
+  @Override
+  public List getTasks() throws MetaException {
+int abortedThreshold = HiveConf.getIntVar(conf,
+  HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD);
+long abortedTimeThreshold = HiveConf
+  .getTimeVar(conf, 
HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD,
+  TimeUnit.MILLISECONDS);
+List readyToCleanAborts = 
txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold);
+
+if (!readyToCleanAborts.isEmpty()) {
+  return readyToCleanAborts.stream().map(ci -> 
ThrowingRunnable.unchecked(() ->
+  clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE, 

[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855771=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855771
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 11:25
Start Date: 10/Apr/23 11:25
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161645900


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor.handler;
+
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
+import org.apache.hadoop.hive.metastore.metrics.PerfLogger;
+import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable;
+import org.apache.hadoop.hive.ql.txn.compactor.FSRemover;
+import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static java.util.Objects.isNull;
+
+/**
+ * Abort-cleanup based implementation of TaskHandler.
+ * Provides implementation of creation of abort clean tasks.
+ */
+class AbortedTxnCleaner extends AcidTxnCleaner {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(AbortedTxnCleaner.class.getName());
+
+  public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler,
+   MetadataCache metadataCache, boolean metricsEnabled,
+   FSRemover fsRemover) {
+super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover);
+  }
+
+  /**
+   The following cleanup is based on the following idea - 
+   1. Aborted cleanup is independent of compaction. This is because 
directories which are written by
+  aborted txns are not visible by any open txns. It is only visible while 
determining the AcidState (which
+  only sees the aborted deltas and does not read the file).
+
+   The following algorithm is used to clean the set of aborted directories - 

+  a. Find the list of entries which are suitable for cleanup (This is done 
in {@link TxnStore#findReadyToCleanForAborts(long, int)}).
+  b. If the table/partition does not exist, then remove the associated 
aborted entry in TXN_COMPONENTS table. 
+  c. Get the AcidState of the table by using the min open txnID, database 
name, tableName, partition name, highest write ID 
+  d. Fetch the aborted directories and delete the directories. 
+  e. Fetch the aborted write IDs from the AcidState and use it to delete 
the associated metadata in the TXN_COMPONENTS table.
+   **/
+  @Override
+  public List getTasks() throws MetaException {
+int abortedThreshold = HiveConf.getIntVar(conf,
+  HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD);
+long abortedTimeThreshold = HiveConf
+  .getTimeVar(conf, 
HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD,
+  TimeUnit.MILLISECONDS);
+List readyToCleanAborts = 
txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold);
+
+if (!readyToCleanAborts.isEmpty()) {
+  return readyToCleanAborts.stream().map(ci -> 
ThrowingRunnable.unchecked(() ->
+  clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE, 

[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855772=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855772
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 11:25
Start Date: 10/Apr/23 11:25
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161643395


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor.handler;
+
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
+import org.apache.hadoop.hive.metastore.metrics.PerfLogger;
+import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable;
+import org.apache.hadoop.hive.ql.txn.compactor.FSRemover;
+import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static java.util.Objects.isNull;
+
+/**
+ * Abort-cleanup based implementation of TaskHandler.
+ * Provides implementation of creation of abort clean tasks.
+ */
+class AbortedTxnCleaner extends AcidTxnCleaner {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(AbortedTxnCleaner.class.getName());
+
+  public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler,
+   MetadataCache metadataCache, boolean metricsEnabled,
+   FSRemover fsRemover) {
+super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover);
+  }
+
+  /**
+   The following cleanup is based on the following idea - 
+   1. Aborted cleanup is independent of compaction. This is because 
directories which are written by
+  aborted txns are not visible by any open txns. It is only visible while 
determining the AcidState (which
+  only sees the aborted deltas and does not read the file).
+
+   The following algorithm is used to clean the set of aborted directories - 

+  a. Find the list of entries which are suitable for cleanup (This is done 
in {@link TxnStore#findReadyToCleanForAborts(long, int)}).
+  b. If the table/partition does not exist, then remove the associated 
aborted entry in TXN_COMPONENTS table. 
+  c. Get the AcidState of the table by using the min open txnID, database 
name, tableName, partition name, highest write ID 
+  d. Fetch the aborted directories and delete the directories. 
+  e. Fetch the aborted write IDs from the AcidState and use it to delete 
the associated metadata in the TXN_COMPONENTS table.
+   **/
+  @Override
+  public List getTasks() throws MetaException {
+int abortedThreshold = HiveConf.getIntVar(conf,
+  HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD);
+long abortedTimeThreshold = HiveConf
+  .getTimeVar(conf, 
HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD,
+  TimeUnit.MILLISECONDS);
+List readyToCleanAborts = 
txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold);
+
+if (!readyToCleanAborts.isEmpty()) {
+  return readyToCleanAborts.stream().map(ci -> 
ThrowingRunnable.unchecked(() ->
+  clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE, 

[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855770
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 11:22
Start Date: 10/Apr/23 11:22
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161645900


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor.handler;
+
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
+import org.apache.hadoop.hive.metastore.metrics.PerfLogger;
+import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable;
+import org.apache.hadoop.hive.ql.txn.compactor.FSRemover;
+import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static java.util.Objects.isNull;
+
+/**
+ * Abort-cleanup based implementation of TaskHandler.
+ * Provides implementation of creation of abort clean tasks.
+ */
+class AbortedTxnCleaner extends AcidTxnCleaner {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(AbortedTxnCleaner.class.getName());
+
+  public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler,
+   MetadataCache metadataCache, boolean metricsEnabled,
+   FSRemover fsRemover) {
+super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover);
+  }
+
+  /**
+   The following cleanup is based on the following idea - 
+   1. Aborted cleanup is independent of compaction. This is because 
directories which are written by
+  aborted txns are not visible by any open txns. It is only visible while 
determining the AcidState (which
+  only sees the aborted deltas and does not read the file).
+
+   The following algorithm is used to clean the set of aborted directories - 

+  a. Find the list of entries which are suitable for cleanup (This is done 
in {@link TxnStore#findReadyToCleanForAborts(long, int)}).
+  b. If the table/partition does not exist, then remove the associated 
aborted entry in TXN_COMPONENTS table. 
+  c. Get the AcidState of the table by using the min open txnID, database 
name, tableName, partition name, highest write ID 
+  d. Fetch the aborted directories and delete the directories. 
+  e. Fetch the aborted write IDs from the AcidState and use it to delete 
the associated metadata in the TXN_COMPONENTS table.
+   **/
+  @Override
+  public List getTasks() throws MetaException {
+int abortedThreshold = HiveConf.getIntVar(conf,
+  HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD);
+long abortedTimeThreshold = HiveConf
+  .getTimeVar(conf, 
HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD,
+  TimeUnit.MILLISECONDS);
+List readyToCleanAborts = 
txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold);
+
+if (!readyToCleanAborts.isEmpty()) {
+  return readyToCleanAborts.stream().map(ci -> 
ThrowingRunnable.unchecked(() ->
+  clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE, 

[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855769=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855769
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 11:17
Start Date: 10/Apr/23 11:17
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161643395


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor.handler;
+
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
+import org.apache.hadoop.hive.metastore.metrics.PerfLogger;
+import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable;
+import org.apache.hadoop.hive.ql.txn.compactor.FSRemover;
+import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static java.util.Objects.isNull;
+
+/**
+ * Abort-cleanup based implementation of TaskHandler.
+ * Provides implementation of creation of abort clean tasks.
+ */
+class AbortedTxnCleaner extends AcidTxnCleaner {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(AbortedTxnCleaner.class.getName());
+
+  public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler,
+   MetadataCache metadataCache, boolean metricsEnabled,
+   FSRemover fsRemover) {
+super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover);
+  }
+
+  /**
+   The following cleanup is based on the following idea - 
+   1. Aborted cleanup is independent of compaction. This is because 
directories which are written by
+  aborted txns are not visible by any open txns. It is only visible while 
determining the AcidState (which
+  only sees the aborted deltas and does not read the file).
+
+   The following algorithm is used to clean the set of aborted directories - 

+  a. Find the list of entries which are suitable for cleanup (This is done 
in {@link TxnStore#findReadyToCleanForAborts(long, int)}).
+  b. If the table/partition does not exist, then remove the associated 
aborted entry in TXN_COMPONENTS table. 
+  c. Get the AcidState of the table by using the min open txnID, database 
name, tableName, partition name, highest write ID 
+  d. Fetch the aborted directories and delete the directories. 
+  e. Fetch the aborted write IDs from the AcidState and use it to delete 
the associated metadata in the TXN_COMPONENTS table.
+   **/
+  @Override
+  public List getTasks() throws MetaException {
+int abortedThreshold = HiveConf.getIntVar(conf,
+  HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD);
+long abortedTimeThreshold = HiveConf
+  .getTimeVar(conf, 
HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD,
+  TimeUnit.MILLISECONDS);
+List readyToCleanAborts = 
txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold);
+
+if (!readyToCleanAborts.isEmpty()) {
+  return readyToCleanAborts.stream().map(ci -> 
ThrowingRunnable.unchecked(() ->
+  clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE, 

[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855768=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855768
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 11:14
Start Date: 10/Apr/23 11:14
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161641913


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor.handler;
+
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
+import org.apache.hadoop.hive.metastore.metrics.PerfLogger;
+import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable;
+import org.apache.hadoop.hive.ql.txn.compactor.FSRemover;
+import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static java.util.Objects.isNull;
+
+/**
+ * Abort-cleanup based implementation of TaskHandler.
+ * Provides implementation of creation of abort clean tasks.
+ */
+class AbortedTxnCleaner extends TaskHandler {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(AbortedTxnCleaner.class.getName());
+
+  public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler,
+   MetadataCache metadataCache, boolean metricsEnabled,
+   FSRemover fsRemover) {
+super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover);
+  }
+
+  /**
+   The following cleanup is based on the following idea - 
+   1. Aborted cleanup is independent of compaction. This is because 
directories which are written by
+  aborted txns are not visible by any open txns. It is only visible while 
determining the AcidState (which
+  only sees the aborted deltas and does not read the file).
+
+   The following algorithm is used to clean the set of aborted directories - 

+  a. Find the list of entries which are suitable for cleanup (This is done 
in {@link TxnStore#findReadyToCleanForAborts(long, int)}).
+  b. If the table/partition does not exist, then remove the associated 
aborted entry in TXN_COMPONENTS table. 
+  c. Get the AcidState of the table by using the min open txnID, database 
name, tableName, partition name, highest write ID 
+  d. Fetch the aborted directories and delete the directories. 
+  e. Fetch the aborted write IDs from the AcidState and use it to delete 
the associated metadata in the TXN_COMPONENTS table.
+   **/
+  @Override
+  public List getTasks() throws MetaException {
+int abortedThreshold = HiveConf.getIntVar(conf,
+  HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD);
+long abortedTimeThreshold = HiveConf
+  .getTimeVar(conf, 
HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD,
+  TimeUnit.MILLISECONDS);
+List readyToCleanAborts = 
txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold);
+
+if (!readyToCleanAborts.isEmpty()) {
+  return readyToCleanAborts.stream().map(ci -> 
ThrowingRunnable.unchecked(() ->
+  clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE, 

[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855764=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855764
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 11:06
Start Date: 10/Apr/23 11:06
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161638142


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor.handler;
+
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
+import org.apache.hadoop.hive.metastore.metrics.PerfLogger;
+import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable;
+import org.apache.hadoop.hive.ql.txn.compactor.FSRemover;
+import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static java.util.Objects.isNull;
+
+/**
+ * Abort-cleanup based implementation of TaskHandler.
+ * Provides implementation of creation of abort clean tasks.
+ */
+class AbortedTxnCleaner extends TaskHandler {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(AbortedTxnCleaner.class.getName());
+
+  public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler,
+   MetadataCache metadataCache, boolean metricsEnabled,
+   FSRemover fsRemover) {
+super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover);
+  }
+
+  /**
+   The following cleanup is based on the following idea - 
+   1. Aborted cleanup is independent of compaction. This is because 
directories which are written by
+  aborted txns are not visible by any open txns. It is only visible while 
determining the AcidState (which
+  only sees the aborted deltas and does not read the file).
+
+   The following algorithm is used to clean the set of aborted directories - 

+  a. Find the list of entries which are suitable for cleanup (This is done 
in {@link TxnStore#findReadyToCleanForAborts(long, int)}).
+  b. If the table/partition does not exist, then remove the associated 
aborted entry in TXN_COMPONENTS table. 
+  c. Get the AcidState of the table by using the min open txnID, database 
name, tableName, partition name, highest write ID 
+  d. Fetch the aborted directories and delete the directories. 
+  e. Fetch the aborted write IDs from the AcidState and use it to delete 
the associated metadata in the TXN_COMPONENTS table.
+   **/
+  @Override
+  public List getTasks() throws MetaException {
+int abortedThreshold = HiveConf.getIntVar(conf,
+  HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD);
+long abortedTimeThreshold = HiveConf
+  .getTimeVar(conf, 
HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD,
+  TimeUnit.MILLISECONDS);
+List readyToCleanAborts = 
txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold);
+
+if (!readyToCleanAborts.isEmpty()) {
+  return readyToCleanAborts.stream().map(ci -> 
ThrowingRunnable.unchecked(() ->
+  clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE, 

[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855763
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 11:04
Start Date: 10/Apr/23 11:04
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161636770


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java:
##
@@ -0,0 +1,168 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor.handler;
+
+import org.apache.hadoop.hive.common.ValidReaderWriteIdList;
+import org.apache.hadoop.hive.common.ValidTxnList;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.metrics.MetricsConstants;
+import org.apache.hadoop.hive.metastore.metrics.PerfLogger;
+import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil;
+import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable;
+import org.apache.hadoop.hive.ql.txn.compactor.FSRemover;
+import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static java.util.Objects.isNull;
+
+/**
+ * Abort-cleanup based implementation of TaskHandler.
+ * Provides implementation of creation of abort clean tasks.
+ */
+class AbortedTxnCleaner extends TaskHandler {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(AbortedTxnCleaner.class.getName());
+
+  public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler,
+   MetadataCache metadataCache, boolean metricsEnabled,
+   FSRemover fsRemover) {
+super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover);
+  }
+
+  /**
+   The following cleanup is based on the following idea - 
+   1. Aborted cleanup is independent of compaction. This is because 
directories which are written by
+  aborted txns are not visible by any open txns. It is only visible while 
determining the AcidState (which
+  only sees the aborted deltas and does not read the file).
+
+   The following algorithm is used to clean the set of aborted directories - 

+  a. Find the list of entries which are suitable for cleanup (This is done 
in {@link TxnStore#findReadyToCleanForAborts(long, int)}).
+  b. If the table/partition does not exist, then remove the associated 
aborted entry in TXN_COMPONENTS table. 
+  c. Get the AcidState of the table by using the min open txnID, database 
name, tableName, partition name, highest write ID 
+  d. Fetch the aborted directories and delete the directories. 
+  e. Fetch the aborted write IDs from the AcidState and use it to delete 
the associated metadata in the TXN_COMPONENTS table.
+   **/
+  @Override
+  public List getTasks() throws MetaException {
+int abortedThreshold = HiveConf.getIntVar(conf,
+  HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD);
+long abortedTimeThreshold = HiveConf
+  .getTimeVar(conf, 
HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD,
+  TimeUnit.MILLISECONDS);
+List readyToCleanAborts = 
txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold);
+
+if (!readyToCleanAborts.isEmpty()) {
+  return readyToCleanAborts.stream().map(ci -> 
ThrowingRunnable.unchecked(() ->
+  clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE, 

[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855762
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 10:58
Start Date: 10/Apr/23 10:58
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161633548


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java:
##
@@ -162,31 +162,33 @@ public Set findPotentialCompactions(int 
abortedThreshold,
 }
 rs.close();
 
-// Check for aborted txns: number of aborted txns past threshold and 
age of aborted txns
-// past time threshold
-boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0;
-String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", 
\"TC_PARTITION\", " +
-  "MIN(\"TXN_STARTED\"), COUNT(*) FROM \"TXNS\", \"TXN_COMPONENTS\" " +
-  "   WHERE \"TXN_ID\" = \"TC_TXNID\" AND \"TXN_STATE\" = " + 
TxnStatus.ABORTED + " " +
-  "GROUP BY \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\" " +
-  (checkAbortedTimeThreshold ? "" : " HAVING COUNT(*) > " + 
abortedThreshold);
-
-LOG.debug("Going to execute query <{}>", sCheckAborted);
-rs = stmt.executeQuery(sCheckAborted);
-long systemTime = System.currentTimeMillis();
-while (rs.next()) {
-  boolean pastTimeThreshold =
-  checkAbortedTimeThreshold && rs.getLong(4) + 
abortedTimeThreshold < systemTime;
-  int numAbortedTxns = rs.getInt(5);
-  if (numAbortedTxns > abortedThreshold || pastTimeThreshold) {
-CompactionInfo info = new CompactionInfo();
-info.dbname = rs.getString(1);
-info.tableName = rs.getString(2);
-info.partName = rs.getString(3);
-info.tooManyAborts = numAbortedTxns > abortedThreshold;
-info.hasOldAbort = pastTimeThreshold;
-LOG.debug("Found potential compaction: {}", info);
-response.add(info);
+if (!MetastoreConf.getBoolVar(conf, 
ConfVars.COMPACTOR_CLEAN_ABORTS_USING_CLEANER)) {

Review Comment:
   no need for that, leads to code duplication



##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java:
##
@@ -464,6 +466,54 @@ public List findReadyToClean(long 
minOpenTxnWaterMark, long rete
 }
   }
 
+  @Override
+  @RetrySemantics.ReadOnly
+  public List findReadyToCleanForAborts(long 
abortedTimeThreshold, int abortedThreshold) throws MetaException {

Review Comment:
   rename `findReadyToCleanAborts`





Issue Time Tracking
---

Worklog Id: (was: 855762)
Time Spent: 6h 50m  (was: 6h 40m)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855761=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855761
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 10:53
Start Date: 10/Apr/23 10:53
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161630372


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java:
##
@@ -649,6 +649,10 @@ public enum ConfVars {
 
COMPACTOR_CLEANER_TABLECACHE_ON("metastore.compactor.cleaner.tablecache.on",
 "hive.compactor.cleaner.tablecache.on", true,
 "Enable table caching in the cleaner. Currently the cache is 
cleaned after each cycle."),
+
COMPACTOR_CLEAN_ABORTS_USING_CLEANER("metastore.compactor.clean.aborts.using.cleaner",
 "hive.compactor.clean.aborts.using.cleaner", true,

Review Comment:
   no need for extra config





Issue Time Tracking
---

Worklog Id: (was: 855761)
Time Spent: 6h 40m  (was: 6.5h)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855760
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 10:51
Start Date: 10/Apr/23 10:51
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161629018


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java:
##
@@ -61,12 +60,10 @@ public void init(AtomicBoolean stop) throws Exception {
 cleanerExecutor = CompactorUtil.createExecutorWithThreadFactory(
 
conf.getIntVar(HiveConf.ConfVars.HIVE_COMPACTOR_CLEANER_THREADS_NUM),
 COMPACTOR_CLEANER_THREAD_NAME_FORMAT);
-if (CollectionUtils.isEmpty(cleanupHandlers)) {
-  FSRemover fsRemover = new FSRemover(conf, 
ReplChangeManager.getInstance(conf), metadataCache);
-  cleanupHandlers = TaskHandlerFactory.getInstance()
-  .getHandlers(conf, txnHandler, metadataCache,
-  metricsEnabled, fsRemover);
-}
+FSRemover fsRemover = new FSRemover(conf, 
ReplChangeManager.getInstance(conf), metadataCache);
+cleanupHandlers = TaskHandlerFactory.getInstance()
+.getHandlers(conf, txnHandler, metadataCache,
+metricsEnabled, fsRemover);

Review Comment:
   could we move this to above line





Issue Time Tracking
---

Worklog Id: (was: 855760)
Time Spent: 6.5h  (was: 6h 20m)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855759=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855759
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 10:49
Start Date: 10/Apr/23 10:49
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161627935


##
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactorWithAbortCleanupUsingCompactionCycle.java:
##
@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.txn.compactor;
+
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf;
+import org.apache.hadoop.hive.ql.txn.compactor.TestCompactor;
+import org.junit.Before;
+
+public class TestCompactorWithAbortCleanupUsingCompactionCycle extends 
TestCompactor {

Review Comment:
   i don't think that should be supported any more





Issue Time Tracking
---

Worklog Id: (was: 855759)
Time Spent: 6h 20m  (was: 6h 10m)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855758=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855758
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 10:48
Start Date: 10/Apr/23 10:48
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161627204


##
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactorBase.java:
##
@@ -89,6 +89,7 @@ public void setup() throws Exception {
 hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTIMIZEMETADATAQUERIES, false);
 MetastoreConf.setBoolVar(hiveConf, 
MetastoreConf.ConfVars.COMPACTOR_INITIATOR_ON, true);
 MetastoreConf.setBoolVar(hiveConf, 
MetastoreConf.ConfVars.COMPACTOR_CLEANER_ON, true);
+MetastoreConf.setBoolVar(hiveConf, 
MetastoreConf.ConfVars.COMPACTOR_CLEAN_ABORTS_USING_CLEANER, false);

Review Comment:
   why do we need this config? let's keep it simple





Issue Time Tracking
---

Worklog Id: (was: 855758)
Time Spent: 6h 10m  (was: 6h)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855757
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 10:46
Start Date: 10/Apr/23 10:46
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161625343


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -3273,11 +3273,11 @@ public static enum ConfVars {
 
 HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD("hive.compactor.abortedtxn.threshold", 
1000,
 "Number of aborted transactions involving a given table or partition 
that will trigger\n" +
-"a major compaction."),
+"a major compaction / cleanup of aborted directories."),

Review Comment:
   would it actually trigger compaction?
   PS: we should deprecate. this config on HS2 side





Issue Time Tracking
---

Worklog Id: (was: 855757)
Time Spent: 6h  (was: 5h 50m)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855756=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855756
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 10:45
Start Date: 10/Apr/23 10:45
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161625472


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -3273,11 +3273,11 @@ public static enum ConfVars {
 
 HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD("hive.compactor.abortedtxn.threshold", 
1000,
 "Number of aborted transactions involving a given table or partition 
that will trigger\n" +
-"a major compaction."),
+"a major compaction / cleanup of aborted directories."),
 
 
HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD("hive.compactor.aborted.txn.time.threshold",
 "12h",
 new TimeValidator(TimeUnit.HOURS),
-"Age of table/partition's oldest aborted transaction when compaction 
will be triggered. " +
+"Age of table/partition's oldest aborted transaction when compaction / 
cleanup of aborted directories will be triggered. " +

Review Comment:
   same as above





Issue Time Tracking
---

Worklog Id: (was: 855756)
Time Spent: 5h 50m  (was: 5h 40m)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855755
 ]

ASF GitHub Bot logged work on HIVE-27020:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 10:45
Start Date: 10/Apr/23 10:45
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #4091:
URL: https://github.com/apache/hive/pull/4091#discussion_r1161625343


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -3273,11 +3273,11 @@ public static enum ConfVars {
 
 HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD("hive.compactor.abortedtxn.threshold", 
1000,
 "Number of aborted transactions involving a given table or partition 
that will trigger\n" +
-"a major compaction."),
+"a major compaction / cleanup of aborted directories."),

Review Comment:
   would it actually trigger compaction?





Issue Time Tracking
---

Worklog Id: (was: 855755)
Time Spent: 5h 40m  (was: 5.5h)

> Implement a separate handler to handle aborted transaction cleanup
> --
>
> Key: HIVE-27020
> URL: https://issues.apache.org/jira/browse/HIVE-27020
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> As described in the parent task, once the cleaner is separated into different 
> entities, implement a separate handler which can create requests for aborted 
> transactions cleanup. This would move the aborted transaction cleanup 
> exclusively to the cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855729=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855729
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 08:13
Start Date: 10/Apr/23 08:13
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1161530815


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+switch (statsSource) {
+  case ICEBERG:
+// Place holder for iceberg stats
+break;
+  case PUFFIN:
+String snapshotId = table.name() + 
table.currentSnapshot().snapshotId();
+String statsPath = table.location() + STATS + snapshotId;
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0);
+  Map> collect =
+  
Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first,
+  blobMetadataByteBufferPair -> SerializationUtils.deserialize(
+  
ByteBuffers.toByteArray(blobMetadataByteBufferPair.second();
+
+  return 
collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj();
+} catch (IOException e) {
+  LOG.info(String.valueOf(e));
+}
+break;
+  default:
+// fall back to metastore
+}
+return null;
+  }
+
+
+  @Override
+  public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table 
table,
+  List colStats) {
+TableDesc tableDesc = Utilities.getTableDesc(table);
+Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties());
+String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId();
+byte[] serializeColStats = SerializationUtils.serialize((Serializable) 
colStats);

Review Comment:
   We are checking if the colStats is empty here. 
   
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L208





Issue Time Tracking
---

Worklog Id: (was: 855729)
Time Spent: 7h 20m  (was: 7h 10m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27208) Iceberg: Add support for rename table

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27208?focusedWorklogId=855728=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855728
 ]

ASF GitHub Bot logged work on HIVE-27208:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 07:59
Start Date: 10/Apr/23 07:59
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #4185:
URL: https://github.com/apache/hive/pull/4185#issuecomment-1501521654

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=4185)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4185=false=SECURITY_HOTSPOT)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4185=false=SECURITY_HOTSPOT)
 [0 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4185=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=CODE_SMELL)
 [4 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4185=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=4185=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 855728)
Time Spent: 3h 20m  (was: 3h 10m)

> Iceberg: Add support for rename table
> -
>
> Key: HIVE-27208
> URL: https://issues.apache.org/jira/browse/HIVE-27208
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Add support for renaming iceberg tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855727=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855727
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 07:53
Start Date: 10/Apr/23 07:53
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1161519355


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+switch (statsSource) {
+  case ICEBERG:
+// Place holder for iceberg stats
+break;
+  case PUFFIN:
+String snapshotId = table.name() + 
table.currentSnapshot().snapshotId();
+String statsPath = table.location() + STATS + snapshotId;
+LOG.info("Using stats from puffin file at:" + statsPath);
+try (PuffinReader reader = 
Puffin.read(table.io().newInputFile(statsPath)).build()) {
+  BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0);
+  Map> collect =
+  
Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first,

Review Comment:
   Fixed.





Issue Time Tracking
---

Worklog Id: (was: 855727)
Time Spent: 7h 10m  (was: 7h)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855726=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855726
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 07:53
Start Date: 10/Apr/23 07:53
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1161519017


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;

Review Comment:
   fixed



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {
+return true;
+  }
+} catch (IOException e) {
+  LOG.warn(e.getMessage());
+}
+  }
+}
+return false;
+  }
+
+  @Override
+  public List 
getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+switch (statsSource) {
+  case ICEBERG:
+// Place holder for iceberg stats
+break;
+  case PUFFIN:
+String snapshotId = table.name() + 
table.currentSnapshot().snapshotId();
+String statsPath = table.location() + STATS + snapshotId;

Review Comment:
   done





Issue Time Tracking
---

Worklog Id: (was: 855726)
Time Spent: 7h  (was: 6h 50m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855725=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855725
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 07:52
Start Date: 10/Apr/23 07:52
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1161518779


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {

Review Comment:
   Fixed.



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;
+TableDesc tableDesc = Utilities.getTableDesc(hmsTable);
+Table table = Catalogs.loadTable(conf, tableDesc.getProperties());
+if (table.currentSnapshot() != null) {
+  String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase();
+  String statsPath = table.location() + STATS + table.name() + 
table.currentSnapshot().snapshotId();
+  if (statsSource.equals(PUFFIN)) {
+try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) {
+  if (fs.exists(new Path(statsPath))) {

Review Comment:
   Done





Issue Time Tracking
---

Worklog Id: (was: 855725)
Time Spent: 6h 50m  (was: 6h 40m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855724=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855724
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 07:52
Start Date: 10/Apr/23 07:52
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1161518466


##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);

Review Comment:
   Fixed. Added a default fall back to metastore.



##
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##
@@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish 
partish) {
 return stats;
   }
 
+
+  @Override
+  public boolean canSetColStatistics() {
+String statsSource = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase();
+return statsSource.equals(PUFFIN);
+  }
+
+  @Override
+  public boolean 
canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) {
+
+org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl;

Review Comment:
   Fixed.





Issue Time Tracking
---

Worklog Id: (was: 855724)
Time Spent: 6h 40m  (was: 6.5h)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855723=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855723
 ]

ASF GitHub Bot logged work on HIVE-27158:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 07:51
Start Date: 10/Apr/23 07:51
Worklog Time Spent: 10m 
  Work Description: simhadri-g commented on code in PR #4131:
URL: https://github.com/apache/hive/pull/4131#discussion_r1161518101


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -2207,6 +2207,8 @@ public static enum ConfVars {
 "Whether to use codec pool in ORC. Disable if there are bugs with 
codec reuse."),
 HIVE_USE_STATS_FROM("hive.use.stats.from","iceberg","Use stats from 
iceberg table snapshot for query " +
 "planning. This has three values metastore, puffin and iceberg"),
+HIVE_COL_STATS_SOURCE("hive.col.stats.source","metastore","Use stats from 
puffin file for  query " +

Review Comment:
   Fixed, merged the confs to a single conf.





Issue Time Tracking
---

Worklog Id: (was: 855723)
Time Spent: 6.5h  (was: 6h 20m)

> Store hive columns stats in puffin files for iceberg tables
> ---
>
> Key: HIVE-27158
> URL: https://issues.apache.org/jira/browse/HIVE-27158
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri Govindappa
>Assignee: Simhadri Govindappa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27077) upgrade hive grammar to Antlr4

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27077?focusedWorklogId=855720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855720
 ]

ASF GitHub Bot logged work on HIVE-27077:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 07:12
Start Date: 10/Apr/23 07:12
Worklog Time Spent: 10m 
  Work Description: zhangbutao commented on PR #4058:
URL: https://github.com/apache/hive/pull/4058#issuecomment-1501487075

   Antlr3 has been lost support long long ago. It will be great to upgrade to 
Antlr4. Antrl4 grammar is more simple and cleaner than Antlr3. 
   But i think it is not easy to do this, there is maybe some Incompatibilities 
need to be fixed.
   Please see this ticket: https://issues.apache.org/jira/browse/HIVE-23177
   @mlorek Could you please give more feadback? Thanks.




Issue Time Tracking
---

Worklog Id: (was: 855720)
Time Spent: 2h  (was: 1h 50m)

> upgrade hive grammar to Antlr4
> --
>
> Key: HIVE-27077
> URL: https://issues.apache.org/jira/browse/HIVE-27077
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Michal Lorek
>Assignee: Michal Lorek
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Introducing new module parser-v4 that hosts hive grammar defined using Antlr4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-27209) Backport HIVE-24569: LLAP daemon leaks file descriptors/log4j appenders

2023-04-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27209?focusedWorklogId=855697=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855697
 ]

ASF GitHub Bot logged work on HIVE-27209:
-

Author: ASF GitHub Bot
Created on: 10/Apr/23 06:01
Start Date: 10/Apr/23 06:01
Worklog Time Spent: 10m 
  Work Description: guptanikhil007 commented on PR #4193:
URL: https://github.com/apache/hive/pull/4193#issuecomment-1501431789

   @sankarh This is ported from the original branch-3.1 cherry-pick not master.
   




Issue Time Tracking
---

Worklog Id: (was: 855697)
Time Spent: 1h  (was: 50m)

> Backport HIVE-24569: LLAP daemon leaks file descriptors/log4j appenders
> ---
>
> Key: HIVE-27209
> URL: https://issues.apache.org/jira/browse/HIVE-27209
> Project: Hive
>  Issue Type: Sub-task
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)