[jira] [Work logged] (HIVE-27187) Incremental rebuild of materialized view having aggregate and stored by iceberg
[ https://issues.apache.org/jira/browse/HIVE-27187?focusedWorklogId=855998=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855998 ] ASF GitHub Bot logged work on HIVE-27187: - Author: ASF GitHub Bot Created on: 11/Apr/23 05:33 Start Date: 11/Apr/23 05:33 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4166: URL: https://github.com/apache/hive/pull/4166#issuecomment-1502711234 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=4166) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=BUG) [![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png 'C')](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=BUG) [1 Bug](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4166=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4166=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4166=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=CODE_SMELL) [6 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=4166=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=4166=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=4166=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 855998) Time Spent: 3h 40m (was: 3.5h) > Incremental rebuild of materialized view having aggregate and stored by > iceberg > --- > > Key: HIVE-27187 > URL: https://issues.apache.org/jira/browse/HIVE-27187 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration, Materialized views >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently incremental rebuild of materialized view stored by iceberg which > definition query contains aggregate operator is transformed to an insert > overwrite statement which contains a union operator if the source tables > contains insert operations only. One branch of the union scans the view the > other produces the delta. > This can be improved further: transform the statement to a multi insert > statement representing a merge statement to insert new aggregations and > update existing. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-23567) authorization_disallow_transform.q is unstable
[ https://issues.apache.org/jira/browse/HIVE-23567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23567: -- Labels: pull-request-available (was: ) > authorization_disallow_transform.q is unstable > -- > > Key: HIVE-23567 > URL: https://issues.apache.org/jira/browse/HIVE-23567 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-23567) authorization_disallow_transform.q is unstable
[ https://issues.apache.org/jira/browse/HIVE-23567?focusedWorklogId=855992=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855992 ] ASF GitHub Bot logged work on HIVE-23567: - Author: ASF GitHub Bot Created on: 11/Apr/23 04:18 Start Date: 11/Apr/23 04:18 Worklog Time Spent: 10m Work Description: rkirtir opened a new pull request, #4215: URL: https://github.com/apache/hive/pull/4215 ### What changes were proposed in this pull request? HIVE-23567 ### Why are the changes needed? Enabling authorization_disallow_transform.q in test suite ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? via test Issue Time Tracking --- Worklog Id: (was: 855992) Remaining Estimate: 0h Time Spent: 10m > authorization_disallow_transform.q is unstable > -- > > Key: HIVE-23567 > URL: https://issues.apache.org/jira/browse/HIVE-23567 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-23548) TestActivePassiveHA is unstable
[ https://issues.apache.org/jira/browse/HIVE-23548?focusedWorklogId=855991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855991 ] ASF GitHub Bot logged work on HIVE-23548: - Author: ASF GitHub Bot Created on: 11/Apr/23 04:09 Start Date: 11/Apr/23 04:09 Worklog Time Spent: 10m Work Description: rkirtir opened a new pull request, #4214: URL: https://github.com/apache/hive/pull/4214 ### What changes were proposed in this pull request? HIVE-23548 ### Why are the changes needed? Enabling TestActivePassiveHA in test suite ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? via test Issue Time Tracking --- Worklog Id: (was: 855991) Remaining Estimate: 0h Time Spent: 10m > TestActivePassiveHA is unstable > --- > > Key: HIVE-23548 > URL: https://issues.apache.org/jira/browse/HIVE-23548 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: KIRTI RUGE >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-23548) TestActivePassiveHA is unstable
[ https://issues.apache.org/jira/browse/HIVE-23548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23548: -- Labels: pull-request-available (was: ) > TestActivePassiveHA is unstable > --- > > Key: HIVE-23548 > URL: https://issues.apache.org/jira/browse/HIVE-23548 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: KIRTI RUGE >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27164) Create Temp Txn Table As Select is failing at tablePath validation
[ https://issues.apache.org/jira/browse/HIVE-27164?focusedWorklogId=855988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855988 ] ASF GitHub Bot logged work on HIVE-27164: - Author: ASF GitHub Bot Created on: 11/Apr/23 02:24 Start Date: 11/Apr/23 02:24 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4176: URL: https://github.com/apache/hive/pull/4176#issuecomment-1502599259 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=4176) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=BUG) [![C](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/C-16px.png 'C')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=BUG) [1 Bug](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4176=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4176=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4176=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=CODE_SMELL) [11 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=4176=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=4176=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 855988) Time Spent: 3h (was: 2h 50m) > Create Temp Txn Table As Select is failing at tablePath validation > -- > > Key: HIVE-27164 > URL: https://issues.apache.org/jira/browse/HIVE-27164 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore >Reporter: Naresh P R >Assignee: Venugopal Reddy K >Priority: Major > Labels: pull-request-available > Attachments: mm_cttas.q > > Time Spent: 3h > Remaining Estimate: 0h > > After HIVE-25303, every CTAS goes for > HiveMetaStore$HMSHandler#translate_table_dryrun() call to fetch table > location for CTAS queries which fails with following exception for temp > tables if MetastoreDefaultTransformer is set. > {code:java} > 2023-03-17 16:41:23,390 INFO > org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: > [pool-6-thread-196]: Starting translation for CreateTable for processor > HMSClient-@localhost with [EXTWRITE, EXTREAD, HIVEBUCKET2, HIVEFULLACIDREAD, > HIVEFULLACIDWRITE, HIVECACHEINVALIDATE, HIVEMANAGESTATS, > HIVEMANAGEDINSERTWRITE, HIVEMANAGEDINSERTREAD, HIVESQL, HIVEMQT, > HIVEONLYMQTWRITE] on table test_temp > 2023-03-17 16:41:23,392 ERROR > org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-6-thread-196]: > MetaException(message:Illegal location for managed table, it has to be within > database's managed location) >
[jira] [Resolved] (HIVE-27143) Optimize HCatStorer move task
[ https://issues.apache.org/jira/browse/HIVE-27143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved HIVE-27143. --- Fix Version/s: 4.0.0 Hadoop Flags: Reviewed Release Note: PR merged. Resolution: Fixed > Optimize HCatStorer move task > - > > Key: HIVE-27143 > URL: https://issues.apache.org/jira/browse/HIVE-27143 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 3.1.3 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > moveTask in hcatalog is inefficient, it does 2 iterations dryRun and > execution, and is sequential. This can be improved. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26986) A DAG created by OperatorGraph is not equal to the Tez DAG.
[ https://issues.apache.org/jira/browse/HIVE-26986?focusedWorklogId=855981=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855981 ] ASF GitHub Bot logged work on HIVE-26986: - Author: ASF GitHub Bot Created on: 11/Apr/23 00:19 Start Date: 11/Apr/23 00:19 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #3998: HIVE-26986: Fix OperatorGraph when a query plan contains UnionOperator URL: https://github.com/apache/hive/pull/3998 Issue Time Tracking --- Worklog Id: (was: 855981) Time Spent: 50m (was: 40m) > A DAG created by OperatorGraph is not equal to the Tez DAG. > --- > > Key: HIVE-26986 > URL: https://issues.apache.org/jira/browse/HIVE-26986 > Project: Hive > Issue Type: Sub-task >Affects Versions: 4.0.0-alpha-2 >Reporter: Seonggon Namgung >Assignee: Seonggon Namgung >Priority: Major > Labels: hive-4.0.0-must, pull-request-available > Attachments: Query71 OperatorGraph.png, Query71 TezDAG.png > > Time Spent: 50m > Remaining Estimate: 0h > > A DAG created by OperatorGraph is not equal to the corresponding DAG that is > submitted to Tez. > Because of this problem, ParallelEdgeFixer reports a pair of normal edges to > a parallel edge. > We observe this problem by comparing OperatorGraph and Tez DAG when running > TPC-DS query 71 on 1TB ORC format managed table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26957) Add convertCharset(s, from, to) function
[ https://issues.apache.org/jira/browse/HIVE-26957?focusedWorklogId=855982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855982 ] ASF GitHub Bot logged work on HIVE-26957: - Author: ASF GitHub Bot Created on: 11/Apr/23 00:19 Start Date: 11/Apr/23 00:19 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on PR #3982: URL: https://github.com/apache/hive/pull/3982#issuecomment-1502504331 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. Issue Time Tracking --- Worklog Id: (was: 855982) Time Spent: 4h 20m (was: 4h 10m) > Add convertCharset(s, from, to) function > > > Key: HIVE-26957 > URL: https://issues.apache.org/jira/browse/HIVE-26957 > Project: Hive > Issue Type: New Feature >Reporter: Bingye Chen >Assignee: Bingye Chen >Priority: Major > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > > Add convertCharset(s, from, to) function. > The function converts the string `s` from the `from` charset to the `to` > charset.It is already implemented in clickhouse. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26985) Create a trackable hive configuration object
[ https://issues.apache.org/jira/browse/HIVE-26985?focusedWorklogId=855980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855980 ] ASF GitHub Bot logged work on HIVE-26985: - Author: ASF GitHub Bot Created on: 11/Apr/23 00:19 Start Date: 11/Apr/23 00:19 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #4002: HIVE-26985: Create a trackable hive configuration object URL: https://github.com/apache/hive/pull/4002 Issue Time Tracking --- Worklog Id: (was: 855980) Time Spent: 1h (was: 50m) > Create a trackable hive configuration object > > > Key: HIVE-26985 > URL: https://issues.apache.org/jira/browse/HIVE-26985 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: hive.log > > Time Spent: 1h > Remaining Estimate: 0h > > During configuration-related investigations, I want to be able to easily find > out when and how a certain configuration is changed. I'm looking for an > improvement that simply logs if "hive.a.b.c" is changed from "hello" to > "asdf" or even null and on which thread/codepath. > Not sure if there is already a trackable configuration object in hadoop that > we can reuse, or we need to implement it in hive. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27143) Optimize HCatStorer move task
[ https://issues.apache.org/jira/browse/HIVE-27143?focusedWorklogId=855979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855979 ] ASF GitHub Bot logged work on HIVE-27143: - Author: ASF GitHub Bot Created on: 11/Apr/23 00:19 Start Date: 11/Apr/23 00:19 Worklog Time Spent: 10m Work Description: daijy merged PR #4177: URL: https://github.com/apache/hive/pull/4177 Issue Time Tracking --- Worklog Id: (was: 855979) Time Spent: 40m (was: 0.5h) > Optimize HCatStorer move task > - > > Key: HIVE-27143 > URL: https://issues.apache.org/jira/browse/HIVE-27143 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 3.1.3 >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > moveTask in hcatalog is inefficient, it does 2 iterations dryRun and > execution, and is sequential. This can be improved. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27032) Introduce liquibase for HMS schema evolution
[ https://issues.apache.org/jira/browse/HIVE-27032?focusedWorklogId=855976=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855976 ] ASF GitHub Bot logged work on HIVE-27032: - Author: ASF GitHub Bot Created on: 11/Apr/23 00:00 Start Date: 11/Apr/23 00:00 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4060: URL: https://github.com/apache/hive/pull/4060#issuecomment-1502487017 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=4060) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4060=false=SECURITY_HOTSPOT) [![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png 'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4060=false=SECURITY_HOTSPOT) [4 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4060=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=CODE_SMELL) [204 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=4060=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=4060=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=4060=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 855976) Time Spent: 1h 40m (was: 1.5h) > Introduce liquibase for HMS schema evolution > > > Key: HIVE-27032 > URL: https://issues.apache.org/jira/browse/HIVE-27032 > Project: Hive > Issue Type: Improvement >Reporter: László Végh >Assignee: László Végh >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Introduce liquibase, and replace current upgrade procedure with it. > The Schematool CLI API should remain untouched, while under the hood, > liquibase should be used for HMS schema evolution. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27150) Drop single partition can also support direct sql
[ https://issues.apache.org/jira/browse/HIVE-27150?focusedWorklogId=855972=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855972 ] ASF GitHub Bot logged work on HIVE-27150: - Author: ASF GitHub Bot Created on: 10/Apr/23 22:55 Start Date: 10/Apr/23 22:55 Worklog Time Spent: 10m Work Description: saihemanth-cloudera commented on code in PR #4123: URL: https://github.com/apache/hive/pull/4123#discussion_r1162150844 ## standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestObjectStore.java: ## @@ -498,6 +497,68 @@ public void testPartitionOpsWhenTableDoesNotExist() throws InvalidObjectExceptio } } + @Test + public void testDropPartitionByName() throws Exception { +Database db1 = new DatabaseBuilder() +.setName(DB1) +.setDescription("description") +.setLocation("locationurl") +.build(conf); +try (AutoCloseable c = deadline()) { + objectStore.createDatabase(db1); +} +StorageDescriptor sd = createFakeSd("location"); +HashMap tableParams = new HashMap<>(); +tableParams.put("EXTERNAL", "false"); +FieldSchema partitionKey1 = new FieldSchema("Country", ColumnType.STRING_TYPE_NAME, ""); +FieldSchema partitionKey2 = new FieldSchema("State", ColumnType.STRING_TYPE_NAME, ""); +Table tbl1 = +new Table(TABLE1, DB1, "owner", 1, 2, 3, sd, Arrays.asList(partitionKey1, partitionKey2), +tableParams, null, null, "MANAGED_TABLE"); +try (AutoCloseable c = deadline()) { + objectStore.createTable(tbl1); +} +HashMap partitionParams = new HashMap<>(); +partitionParams.put("PARTITION_LEVEL_PRIVILEGE", "true"); +List value1 = Arrays.asList("US", "CA"); +Partition part1 = new Partition(value1, DB1, TABLE1, 111, 111, sd, partitionParams); +part1.setCatName(DEFAULT_CATALOG_NAME); +try (AutoCloseable c = deadline()) { + objectStore.addPartition(part1); +} +List value2 = Arrays.asList("US", "MA"); +Partition part2 = new Partition(value2, DB1, TABLE1, 222, 222, sd, partitionParams); +part2.setCatName(DEFAULT_CATALOG_NAME); +try (AutoCloseable c = deadline()) { + objectStore.addPartition(part2); +} + +List partitions; +try (AutoCloseable c = deadline()) { + objectStore.dropPartition(DEFAULT_CATALOG_NAME, DB1, TABLE1, "country=US/state=CA"); + partitions = objectStore.getPartitions(DEFAULT_CATALOG_NAME, DB1, TABLE1, 10); +} +Assert.assertEquals(1, partitions.size()); +Assert.assertEquals(222, partitions.get(0).getCreateTime()); +try (AutoCloseable c = deadline()) { + objectStore.dropPartition(DEFAULT_CATALOG_NAME, DB1, TABLE1, "country=US/state=MA"); + partitions = objectStore.getPartitions(DEFAULT_CATALOG_NAME, DB1, TABLE1, 10); +} +Assert.assertEquals(0, partitions.size()); + +try (AutoCloseable c = deadline()) { + // Illegal partName will do nothing, it doesn't matter + // because the real HMSHandler will guarantee the partName is legal and exists. + objectStore.dropPartition(DEFAULT_CATALOG_NAME, DB1, TABLE1, "country=US/state=NON_EXIST"); + objectStore.dropPartition(DEFAULT_CATALOG_NAME, DB1, TABLE1, "country=US/st=CA"); Review Comment: If this API is returning false, we could assert false here. Issue Time Tracking --- Worklog Id: (was: 855972) Time Spent: 4h 40m (was: 4.5h) > Drop single partition can also support direct sql > - > > Key: HIVE-27150 > URL: https://issues.apache.org/jira/browse/HIVE-27150 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Wechar >Assignee: Wechar >Priority: Major > Labels: pull-request-available > Time Spent: 4h 40m > Remaining Estimate: 0h > > *Background:* > [HIVE-6980|https://issues.apache.org/jira/browse/HIVE-6980] supports direct > sql for drop_partitions, we can reuse this huge improvement in drop_partition. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27150) Drop single partition can also support direct sql
[ https://issues.apache.org/jira/browse/HIVE-27150?focusedWorklogId=855970=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855970 ] ASF GitHub Bot logged work on HIVE-27150: - Author: ASF GitHub Bot Created on: 10/Apr/23 22:39 Start Date: 10/Apr/23 22:39 Worklog Time Spent: 10m Work Description: saihemanth-cloudera commented on code in PR #4123: URL: https://github.com/apache/hive/pull/4123#discussion_r1162143358 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java: ## @@ -5026,20 +5026,18 @@ private boolean drop_partition_common(RawStore ms, String catName, String db_nam verifyIsWritablePath(partPath); } - if (!ms.dropPartition(catName, db_name, tbl_name, part_vals)) { -throw new MetaException("Unable to drop partition"); - } else { -if (!transactionalListeners.isEmpty()) { + String partName = Warehouse.makePartName(tbl.getPartitionKeys(), part_vals); + ms.dropPartition(catName, db_name, tbl_name, partName); Review Comment: If dropPartition in the object store is not successful, then we should throw a meta exception, right? Issue Time Tracking --- Worklog Id: (was: 855970) Time Spent: 4.5h (was: 4h 20m) > Drop single partition can also support direct sql > - > > Key: HIVE-27150 > URL: https://issues.apache.org/jira/browse/HIVE-27150 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Wechar >Assignee: Wechar >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > *Background:* > [HIVE-6980|https://issues.apache.org/jira/browse/HIVE-6980] supports direct > sql for drop_partitions, we can reuse this huge improvement in drop_partition. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27189) Remove duplicate debug log in Hive.isSubDIr
[ https://issues.apache.org/jira/browse/HIVE-27189?focusedWorklogId=855968=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855968 ] ASF GitHub Bot logged work on HIVE-27189: - Author: ASF GitHub Bot Created on: 10/Apr/23 22:32 Start Date: 10/Apr/23 22:32 Worklog Time Spent: 10m Work Description: saihemanth-cloudera merged PR #4167: URL: https://github.com/apache/hive/pull/4167 Issue Time Tracking --- Worklog Id: (was: 855968) Time Spent: 1h 50m (was: 1h 40m) > Remove duplicate debug log in Hive.isSubDIr > --- > > Key: HIVE-27189 > URL: https://issues.apache.org/jira/browse/HIVE-27189 > Project: Hive > Issue Type: Improvement >Reporter: shuyouZZ >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > In class {{{}org.apache.hadoop.hive.ql.metadata.HIve{}}}, invoke method > {{isSubDir}} will print twice > {code:java} > LOG.debug("The source path is " + fullF1 + " and the destination path is " + > fullF2);{code} > we should remove the duplicate debug log. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27164) Create Temp Txn Table As Select is failing at tablePath validation
[ https://issues.apache.org/jira/browse/HIVE-27164?focusedWorklogId=855966=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855966 ] ASF GitHub Bot logged work on HIVE-27164: - Author: ASF GitHub Bot Created on: 10/Apr/23 22:09 Start Date: 10/Apr/23 22:09 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4176: URL: https://github.com/apache/hive/pull/4176#issuecomment-1502392192 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=4176) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4176=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4176=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4176=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=CODE_SMELL) [11 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=4176=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=4176=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=4176=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 855966) Time Spent: 2h 50m (was: 2h 40m) > Create Temp Txn Table As Select is failing at tablePath validation > -- > > Key: HIVE-27164 > URL: https://issues.apache.org/jira/browse/HIVE-27164 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore >Reporter: Naresh P R >Assignee: Venugopal Reddy K >Priority: Major > Labels: pull-request-available > Attachments: mm_cttas.q > > Time Spent: 2h 50m > Remaining Estimate: 0h > > After HIVE-25303, every CTAS goes for > HiveMetaStore$HMSHandler#translate_table_dryrun() call to fetch table > location for CTAS queries which fails with following exception for temp > tables if MetastoreDefaultTransformer is set. > {code:java} > 2023-03-17 16:41:23,390 INFO > org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: > [pool-6-thread-196]: Starting translation for CreateTable for processor > HMSClient-@localhost with [EXTWRITE, EXTREAD, HIVEBUCKET2, HIVEFULLACIDREAD, > HIVEFULLACIDWRITE, HIVECACHEINVALIDATE, HIVEMANAGESTATS, > HIVEMANAGEDINSERTWRITE, HIVEMANAGEDINSERTREAD, HIVESQL, HIVEMQT, > HIVEONLYMQTWRITE] on table test_temp > 2023-03-17 16:41:23,392 ERROR > org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-6-thread-196]: > MetaException(message:Illegal location for managed table, it has to be within > database's managed location)
[jira] [Updated] (HIVE-27240) NPE on Hive Hook Proto Log Writer
[ https://issues.apache.org/jira/browse/HIVE-27240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Sharma updated HIVE-27240: -- Description: Post deployment of Hive 4.0.0-alpha-1 observed NPE error blocking proto logger to serialize json on HiveHookEventProtoPartialBuilder {code:java} 023-04-10T17:43:44,226 ERROR [Hive Hook Proto Log Writer 0]: hooks.HiveHookEventProtoPartialBuilder (:()) - Unexpected exception while serializing json. java.lang.NullPointerException: null at org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:986) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:908) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:1263) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.exec.ExplainTask.outputStagePlans(ExplainTask.java:1408) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.exec.ExplainTask.getJSONPlan(ExplainTask.java:367) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.exec.ExplainTask.getJSONPlan(ExplainTask.java:268) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.getExplainJSON(HiveHookEventProtoPartialBuilder.java:84) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.addQueryObj(HiveHookEventProtoPartialBuilder.java:75) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.build(HiveHookEventProtoPartialBuilder.java:55) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.hooks.HiveProtoLoggingHook$EventLogger.writeEvent(HiveProtoLoggingHook.java:312) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.hooks.HiveProtoLoggingHook$EventLogger.lambda$handle$1(HiveProtoLoggingHook.java:274) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_362] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_362] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_362] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_362] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_362] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_362] at java.lang.Thread.run(Thread.java:750) [?:1.8.0_362] {code} ExplainTask isn't getting initialised as earlier leading to querystate as null value, attaching earlier init code from HiveProtoLoggingHook {code:java} explain.initialize(hookContext.getQueryState(), plan, null, null); {code} was: Post deployment of Hive 4.0.0-alpha-1 observed NPE error blocking proto logger to serialize json on HiveHookEventProtoPartialBuilder {code:java} 023-04-10T17:43:44,226 ERROR [Hive Hook Proto Log Writer 0]: hooks.HiveHookEventProtoPartialBuilder (:()) - Unexpected exception while serializing json. java.lang.NullPointerException: null at org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:986) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:908) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:1263) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.exec.ExplainTask.outputStagePlans(ExplainTask.java:1408) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.exec.ExplainTask.getJSONPlan(ExplainTask.java:367) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.exec.ExplainTask.getJSONPlan(ExplainTask.java:268) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.getExplainJSON(HiveHookEventProtoPartialBuilder.java:84) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.addQueryObj(HiveHookEventProtoPartialBuilder.java:75) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.build(HiveHookEventProtoPartialBuilder.java:55) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at
[jira] [Created] (HIVE-27240) NPE on Hive Hook Proto Log Writer
Shubham Sharma created HIVE-27240: - Summary: NPE on Hive Hook Proto Log Writer Key: HIVE-27240 URL: https://issues.apache.org/jira/browse/HIVE-27240 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 4.0.0-alpha-2, 4.0.0-alpha-1 Reporter: Shubham Sharma Assignee: Shubham Sharma Post deployment of Hive 4.0.0-alpha-1 observed NPE error blocking proto logger to serialize json on HiveHookEventProtoPartialBuilder {code:java} 023-04-10T17:43:44,226 ERROR [Hive Hook Proto Log Writer 0]: hooks.HiveHookEventProtoPartialBuilder (:()) - Unexpected exception while serializing json. java.lang.NullPointerException: null at org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:986) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:908) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.exec.ExplainTask.outputPlan(ExplainTask.java:1263) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.exec.ExplainTask.outputStagePlans(ExplainTask.java:1408) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.exec.ExplainTask.getJSONPlan(ExplainTask.java:367) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.exec.ExplainTask.getJSONPlan(ExplainTask.java:268) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.getExplainJSON(HiveHookEventProtoPartialBuilder.java:84) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.addQueryObj(HiveHookEventProtoPartialBuilder.java:75) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.hooks.HiveHookEventProtoPartialBuilder.build(HiveHookEventProtoPartialBuilder.java:55) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.hooks.HiveProtoLoggingHook$EventLogger.writeEvent(HiveProtoLoggingHook.java:312) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at org.apache.hadoop.hive.ql.hooks.HiveProtoLoggingHook$EventLogger.lambda$handle$1(HiveProtoLoggingHook.java:274) ~[hive-exec-3.1.4.3.2.2.0-1.jar:3.1.4.3.2.2.0-1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_362] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_362] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_362] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_362] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_362] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_362] at java.lang.Thread.run(Thread.java:750) [?:1.8.0_362] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855950 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 10/Apr/23 20:30 Start Date: 10/Apr/23 20:30 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4131: URL: https://github.com/apache/hive/pull/4131#issuecomment-1502282051 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=4131) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4131=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4131=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4131=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=CODE_SMELL) [12 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=4131=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=4131=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=4131=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 855950) Time Spent: 8h 50m (was: 8h 40m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 8h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26537) Deprecate older APIs in the HMS
[ https://issues.apache.org/jira/browse/HIVE-26537?focusedWorklogId=855937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855937 ] ASF GitHub Bot logged work on HIVE-26537: - Author: ASF GitHub Bot Created on: 10/Apr/23 18:39 Start Date: 10/Apr/23 18:39 Worklog Time Spent: 10m Work Description: saihemanth-cloudera commented on code in PR #3599: URL: https://github.com/apache/hive/pull/3599#discussion_r1161975796 ## standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift: ## @@ -2679,7 +2742,8 @@ PartitionsResponse get_partitions_req(1:PartitionsRequest req) list get_partition_names(1:string db_name, 2:string tbl_name, 3:i16 max_parts=-1) Review Comment: We'll deprecate the older APIs in the next release. Issue Time Tracking --- Worklog Id: (was: 855937) Time Spent: 7h 10m (was: 7h) > Deprecate older APIs in the HMS > --- > > Key: HIVE-26537 > URL: https://issues.apache.org/jira/browse/HIVE-26537 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0-alpha-1, 4.0.0-alpha-2 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Critical > Labels: hive-4.0.0-must, pull-request-available > Time Spent: 7h 10m > Remaining Estimate: 0h > > This Jira is to track the clean-up(deprecate older APIs and point the HMS > client to the newer APIs) work in the hive metastore server. > More details will be added here soon. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26127) INSERT OVERWRITE throws FileNotFound when destination partition is deleted
[ https://issues.apache.org/jira/browse/HIVE-26127?focusedWorklogId=855924=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855924 ] ASF GitHub Bot logged work on HIVE-26127: - Author: ASF GitHub Bot Created on: 10/Apr/23 17:35 Start Date: 10/Apr/23 17:35 Worklog Time Spent: 10m Work Description: vihangk1 opened a new pull request, #3561: URL: https://github.com/apache/hive/pull/3561 …tition is deleted ### What changes were proposed in this pull request? Backports HIVE-26127 to branch-3 from master. ### Why are the changes needed? The issue reported in HIVE-26127 also affects branch-3. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added a q test from the original patch. Issue Time Tracking --- Worklog Id: (was: 855924) Time Spent: 2h 50m (was: 2h 40m) > INSERT OVERWRITE throws FileNotFound when destination partition is deleted > --- > > Key: HIVE-26127 > URL: https://issues.apache.org/jira/browse/HIVE-26127 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Steps to reproduce: > # create external table src (col int) partitioned by (year int); > # create external table dest (col int) partitioned by (year int); > # insert into src partition (year=2022) values (1); > # insert into dest partition (year=2022) values (2); > # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022 > # insert overwrite table dest select * from src; > We will get FileNotFoundException as below. > {code:java} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Directory > file:/home/yuwen/workdir/upstream/hive/itests/qtest/target/localfs/warehouse/ext_part/par=1 > could not be cleaned up. > at > org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:5387) > at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:5282) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2657) > at > org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$6(Hive.java:3143) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} > It is because it call listStatus on a path doesn't exist. We should not fail > insert overwrite because there is nothing to be clean up. > {code:java} > fs.listStatus(path, pathFilter){code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26127) INSERT OVERWRITE throws FileNotFound when destination partition is deleted
[ https://issues.apache.org/jira/browse/HIVE-26127?focusedWorklogId=855923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855923 ] ASF GitHub Bot logged work on HIVE-26127: - Author: ASF GitHub Bot Created on: 10/Apr/23 17:35 Start Date: 10/Apr/23 17:35 Worklog Time Spent: 10m Work Description: vihangk1 commented on PR #3561: URL: https://github.com/apache/hive/pull/3561#issuecomment-1502096693 Unfortunately, I missed the notification of PR being approved and it was marked stale. Let me reopen this. Issue Time Tracking --- Worklog Id: (was: 855923) Time Spent: 2h 40m (was: 2.5h) > INSERT OVERWRITE throws FileNotFound when destination partition is deleted > --- > > Key: HIVE-26127 > URL: https://issues.apache.org/jira/browse/HIVE-26127 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Steps to reproduce: > # create external table src (col int) partitioned by (year int); > # create external table dest (col int) partitioned by (year int); > # insert into src partition (year=2022) values (1); > # insert into dest partition (year=2022) values (2); > # hdfs dfs -rm -r ${hive.metastore.warehouse.external.dir}/dest/year=2022 > # insert overwrite table dest select * from src; > We will get FileNotFoundException as below. > {code:java} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Directory > file:/home/yuwen/workdir/upstream/hive/itests/qtest/target/localfs/warehouse/ext_part/par=1 > could not be cleaned up. > at > org.apache.hadoop.hive.ql.metadata.Hive.deleteOldPathForReplace(Hive.java:5387) > at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:5282) > at > org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:2657) > at > org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$6(Hive.java:3143) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} > It is because it call listStatus on a path doesn't exist. We should not fail > insert overwrite because there is nothing to be clean up. > {code:java} > fs.listStatus(path, pathFilter){code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27239) Upgrade async-profiler libs to recent version
Dmitriy Fingerman created HIVE-27239: Summary: Upgrade async-profiler libs to recent version Key: HIVE-27239 URL: https://issues.apache.org/jira/browse/HIVE-27239 Project: Hive Issue Type: Improvement Environment: Apache Hive has ProfileServlet which uses async-profiler for profiling for various events. It would be good to upgrade async-profiler libs to a recent version. Reporter: Dmitriy Fingerman Assignee: Dmitriy Fingerman -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855905=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855905 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 16:14 Start Date: 10/Apr/23 16:14 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161627935 ## itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactorWithAbortCleanupUsingCompactionCycle.java: ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.txn.compactor; + +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.apache.hadoop.hive.ql.txn.compactor.TestCompactor; +import org.junit.Before; + +public class TestCompactorWithAbortCleanupUsingCompactionCycle extends TestCompactor { Review Comment: I don't think that should be supported anymore Issue Time Tracking --- Worklog Id: (was: 855905) Time Spent: 11h 50m (was: 11h 40m) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 11h 50m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855900=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855900 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 10/Apr/23 16:10 Start Date: 10/Apr/23 16:10 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1161857242 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -361,6 +378,83 @@ private Table getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { return table; } + + @Override + public boolean canSetColStatistics() { +return getStatsSource().equals(ICEBERG); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +if (table.currentSnapshot() != null) { + Path statsPath = getStatsPath(table); + if (getStatsSource().equals(ICEBERG)) { +try (FileSystem fs = statsPath.getFileSystem(conf)) { + if (fs.exists(statsPath)) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +String statsPath = getStatsPath(table).toString(); +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + List blobMetadata = reader.fileMetadata().blobs(); + Map> collect = + Streams.stream(reader.readAll(blobMetadata)).collect(Collectors.toMap(Pair::first, + blobMetadataByteBufferPair -> SerializationUtils.deserialize( + ByteBuffers.toByteArray(blobMetadataByteBufferPair.second(); + return collect.get(blobMetadata.get(0)).get(0).getStatsObj(); +} catch (IOException e) { + LOG.error(String.valueOf(e)); +} +return null; + } + + + @Override + public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table table, + List colStats) { +TableDesc tableDesc = Utilities.getTableDesc(table); +Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties()); +String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId(); +byte[] serializeColStats = SerializationUtils.serialize((Serializable) colStats); + +try (PuffinWriter writer = Puffin.write(tbl.io().newOutputFile(getStatsPath(tbl).toString())) +.createdBy("Hive").build()) { Review Comment: Done ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -361,6 +378,83 @@ private Table getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { return table; } + + @Override + public boolean canSetColStatistics() { +return getStatsSource().equals(ICEBERG); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +if (table.currentSnapshot() != null) { + Path statsPath = getStatsPath(table); + if (getStatsSource().equals(ICEBERG)) { +try (FileSystem fs = statsPath.getFileSystem(conf)) { + if (fs.exists(statsPath)) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +String statsPath = getStatsPath(table).toString(); +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + List blobMetadata = reader.fileMetadata().blobs(); + Map> collect = + Streams.stream(reader.readAll(blobMetadata)).collect(Collectors.toMap(Pair::first, + blobMetadataByteBufferPair -> SerializationUtils.deserialize( + ByteBuffers.toByteArray(blobMetadataByteBufferPair.second(); + return collect.get(blobMetadata.get(0)).get(0).getStatsObj(); +} catch (IOException e) { + LOG.error(String.valueOf(e)); +} +return null; + } + + + @Override + public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table table, + List colStats) { +
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855901=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855901 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 10/Apr/23 16:10 Start Date: 10/Apr/23 16:10 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1161857489 ## ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java: ## @@ -1069,8 +1069,12 @@ public static List getTableColumnStats( } if (fetchColStats && !colStatsToRetrieve.isEmpty()) { try { -List colStat = Hive.get().getTableColumnStatistics( -dbName, tabName, colStatsToRetrieve, false); +List colStat; +if (table != null && table.isNonNative() && table.getStorageHandler().canProvideColStatistics(table)) { Review Comment: Fixed Issue Time Tracking --- Worklog Id: (was: 855901) Time Spent: 8h 40m (was: 8.5h) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 8h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855898=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855898 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 10/Apr/23 16:09 Start Date: 10/Apr/23 16:09 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1161856412 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -361,6 +378,83 @@ private Table getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { return table; } + + @Override + public boolean canSetColStatistics() { +return getStatsSource().equals(ICEBERG); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +if (table.currentSnapshot() != null) { + Path statsPath = getStatsPath(table); + if (getStatsSource().equals(ICEBERG)) { +try (FileSystem fs = statsPath.getFileSystem(conf)) { + if (fs.exists(statsPath)) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +String statsPath = getStatsPath(table).toString(); +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + List blobMetadata = reader.fileMetadata().blobs(); + Map> collect = + Streams.stream(reader.readAll(blobMetadata)).collect(Collectors.toMap(Pair::first, + blobMetadataByteBufferPair -> SerializationUtils.deserialize( + ByteBuffers.toByteArray(blobMetadataByteBufferPair.second(); + return collect.get(blobMetadata.get(0)).get(0).getStatsObj(); +} catch (IOException e) { + LOG.error(String.valueOf(e)); +} +return null; Review Comment: Even in the absence of stats the query can run successfully. I was thinking it would be better to throw an error message rather than filing the entire query. ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -361,6 +378,83 @@ private Table getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { return table; } + + @Override + public boolean canSetColStatistics() { +return getStatsSource().equals(ICEBERG); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +if (table.currentSnapshot() != null) { + Path statsPath = getStatsPath(table); + if (getStatsSource().equals(ICEBERG)) { +try (FileSystem fs = statsPath.getFileSystem(conf)) { + if (fs.exists(statsPath)) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +String statsPath = getStatsPath(table).toString(); +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + List blobMetadata = reader.fileMetadata().blobs(); + Map> collect = + Streams.stream(reader.readAll(blobMetadata)).collect(Collectors.toMap(Pair::first, + blobMetadataByteBufferPair -> SerializationUtils.deserialize( + ByteBuffers.toByteArray(blobMetadataByteBufferPair.second(); + return collect.get(blobMetadata.get(0)).get(0).getStatsObj(); +} catch (IOException e) { + LOG.error(String.valueOf(e)); +} +return null; + } + + + @Override + public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table table, + List colStats) { +TableDesc tableDesc = Utilities.getTableDesc(table); +Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties()); Review Comment: Done Issue Time Tracking --- Worklog Id: (was: 855898) Time Spent: 8h 10m (was: 8h) > Store hive columns stats in puffin files for iceberg tables > --- > >
[jira] [Work logged] (HIVE-27184) Add class name profiling option in ProfileServlet
[ https://issues.apache.org/jira/browse/HIVE-27184?focusedWorklogId=855897=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855897 ] ASF GitHub Bot logged work on HIVE-27184: - Author: ASF GitHub Bot Created on: 10/Apr/23 16:09 Start Date: 10/Apr/23 16:09 Worklog Time Spent: 10m Work Description: difin commented on PR #4196: URL: https://github.com/apache/hive/pull/4196#issuecomment-1502008694 > Thanks @difin. LGTM. > > minor comment: Need to check if parameters with "$" (e.g java classnames) should be decoded. It can be a separate ticket. Hi @rbalamohan, I checked what happens when profiling a method with "$". From a command line the profiling command works if you escape the dollar sign by adding "\" before "$" : `curl "http://localhost:10002/prof?output=tree=30=1=java.util.concurrent.locks.AbstractQueuedSynchronizer\$ConditionObject.awaitNanos"` Than it generates an output file with name like this: `async-prof-pid-73790-java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos-4.tree` To open it in a linux shell the file also needs to be escaped. [async-prof-pid-73790-java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos-4.tree.html.txt](https://github.com/apache/hive/files/11191887/async-prof-pid-73790-java.util.concurrent.locks.AbstractQueuedSynchronizer.ConditionObject.awaitNanos-4.tree.html.txt) The output inside the output file is fine. A sample output file is attached. Issue Time Tracking --- Worklog Id: (was: 855897) Time Spent: 1h 10m (was: 1h) > Add class name profiling option in ProfileServlet > - > > Key: HIVE-27184 > URL: https://issues.apache.org/jira/browse/HIVE-27184 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Rajesh Balamohan >Assignee: Dmitriy Fingerman >Priority: Major > Labels: performance, pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > With async-profiler "-e classame.method", it is possible to profile specific > events. Currently profileServlet supports events like cpu, alloc, lock etc. > It will be good to enhance to support method name profiling as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855899 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 10/Apr/23 16:09 Start Date: 10/Apr/23 16:09 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1161857030 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -361,6 +378,83 @@ private Table getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { return table; } + + @Override + public boolean canSetColStatistics() { +return getStatsSource().equals(ICEBERG); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +if (table.currentSnapshot() != null) { + Path statsPath = getStatsPath(table); + if (getStatsSource().equals(ICEBERG)) { +try (FileSystem fs = statsPath.getFileSystem(conf)) { + if (fs.exists(statsPath)) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +String statsPath = getStatsPath(table).toString(); +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + List blobMetadata = reader.fileMetadata().blobs(); + Map> collect = + Streams.stream(reader.readAll(blobMetadata)).collect(Collectors.toMap(Pair::first, + blobMetadataByteBufferPair -> SerializationUtils.deserialize( + ByteBuffers.toByteArray(blobMetadataByteBufferPair.second(); + return collect.get(blobMetadata.get(0)).get(0).getStatsObj(); +} catch (IOException e) { + LOG.error(String.valueOf(e)); +} +return null; + } + + + @Override + public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table table, + List colStats) { +TableDesc tableDesc = Utilities.getTableDesc(table); +Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties()); +String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId(); Review Comment: Fixed, the null check is moved to canSetColStatistics. Issue Time Tracking --- Worklog Id: (was: 855899) Time Spent: 8h 20m (was: 8h 10m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 8h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855896=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855896 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 10/Apr/23 16:08 Start Date: 10/Apr/23 16:08 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1161856243 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -361,6 +378,83 @@ private Table getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { return table; } + + @Override + public boolean canSetColStatistics() { +return getStatsSource().equals(ICEBERG); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +if (table.currentSnapshot() != null) { + Path statsPath = getStatsPath(table); + if (getStatsSource().equals(ICEBERG)) { +try (FileSystem fs = statsPath.getFileSystem(conf)) { + if (fs.exists(statsPath)) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); Review Comment: Done. ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -361,6 +378,83 @@ private Table getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { return table; } + + @Override + public boolean canSetColStatistics() { +return getStatsSource().equals(ICEBERG); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +if (table.currentSnapshot() != null) { + Path statsPath = getStatsPath(table); + if (getStatsSource().equals(ICEBERG)) { +try (FileSystem fs = statsPath.getFileSystem(conf)) { + if (fs.exists(statsPath)) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); Review Comment: Done ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -361,6 +378,83 @@ private Table getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { return table; } + + @Override + public boolean canSetColStatistics() { +return getStatsSource().equals(ICEBERG); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +if (table.currentSnapshot() != null) { + Path statsPath = getStatsPath(table); + if (getStatsSource().equals(ICEBERG)) { +try (FileSystem fs = statsPath.getFileSystem(conf)) { + if (fs.exists(statsPath)) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +String statsPath = getStatsPath(table).toString(); +LOG.info("Using stats from puffin file at:" + statsPath); Review Comment: Done Issue Time Tracking --- Worklog Id: (was: 855896) Time Spent: 8h (was: 7h 50m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 8h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855895 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 10/Apr/23 16:08 Start Date: 10/Apr/23 16:08 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1161856054 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -361,6 +378,83 @@ private Table getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { return table; } + + @Override + public boolean canSetColStatistics() { +return getStatsSource().equals(ICEBERG); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); Review Comment: Done ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -361,6 +378,83 @@ private Table getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { return table; } + + @Override + public boolean canSetColStatistics() { +return getStatsSource().equals(ICEBERG); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +if (table.currentSnapshot() != null) { + Path statsPath = getStatsPath(table); + if (getStatsSource().equals(ICEBERG)) { Review Comment: Fixed. Issue Time Tracking --- Worklog Id: (was: 855895) Time Spent: 7h 50m (was: 7h 40m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 7h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855893=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855893 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 10/Apr/23 16:08 Start Date: 10/Apr/23 16:08 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1161855860 ## common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: ## @@ -2205,9 +2205,8 @@ public static enum ConfVars { "padding tolerance config (hive.exec.orc.block.padding.tolerance)."), HIVE_ORC_CODEC_POOL("hive.use.orc.codec.pool", false, "Whether to use codec pool in ORC. Disable if there are bugs with codec reuse."), -HIVE_USE_STATS_FROM("hive.use.stats.from","iceberg","Use stats from iceberg table snapshot for query " + -"planning. This has three values metastore, puffin and iceberg"), - +HIVE_ICEBERG_STATS_SOURCE("hive.iceberg.stats.source","iceberg","Use stats from iceberg table snapshot for query " + +"planning. This has three values metastore and iceberg"), Review Comment: Fixed. There will be only 2 values. Issue Time Tracking --- Worklog Id: (was: 855893) Time Spent: 7h 40m (was: 7.5h) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 7h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27238) Avoid Calcite Code generation for RelMetaDataProvider on every query
[ https://issues.apache.org/jira/browse/HIVE-27238?focusedWorklogId=855892=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855892 ] ASF GitHub Bot logged work on HIVE-27238: - Author: ASF GitHub Bot Created on: 10/Apr/23 16:06 Start Date: 10/Apr/23 16:06 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4212: URL: https://github.com/apache/hive/pull/4212#issuecomment-1502004561 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=4212) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4212=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4212=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4212=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=CODE_SMELL) [1 Code Smell](https://sonarcloud.io/project/issues?id=apache_hive=4212=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=4212=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=4212=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 855892) Time Spent: 0.5h (was: 20m) > Avoid Calcite Code generation for RelMetaDataProvider on every query > > > Key: HIVE-27238 > URL: https://issues.apache.org/jira/browse/HIVE-27238 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Steve Carlin >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In CalcitePlanner, we are instantiating a new CachingRelMetadataProvider on > every query. Within the Calcite code, they keep the provider key to prevent > a new MetadataHandler class from being created. But by generating a new > provider, the cache never gets a hit so we keep instantiating new > MetadataHandlers. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27200) Backport HIVE-24928 to branch-3
[ https://issues.apache.org/jira/browse/HIVE-27200?focusedWorklogId=855878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855878 ] ASF GitHub Bot logged work on HIVE-27200: - Author: ASF GitHub Bot Created on: 10/Apr/23 15:19 Start Date: 10/Apr/23 15:19 Worklog Time Spent: 10m Work Description: sunchao merged PR #4175: URL: https://github.com/apache/hive/pull/4175 Issue Time Tracking --- Worklog Id: (was: 855878) Time Spent: 0.5h (was: 20m) > Backport HIVE-24928 to branch-3 > --- > > Key: HIVE-27200 > URL: https://issues.apache.org/jira/browse/HIVE-27200 > Project: Hive > Issue Type: Improvement > Components: StorageHandler >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Critical > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > This is to backport HIVE-24928 so that for HiveStorageHandler table 'ANALYZE > TABLE ... COMPUTE STATISTICS' can use storagehandler to provide basic stats > with BasicStatsNoJobTask -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27200) Backport HIVE-24928 to branch-3
[ https://issues.apache.org/jira/browse/HIVE-27200?focusedWorklogId=855879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855879 ] ASF GitHub Bot logged work on HIVE-27200: - Author: ASF GitHub Bot Created on: 10/Apr/23 15:19 Start Date: 10/Apr/23 15:19 Worklog Time Spent: 10m Work Description: sunchao commented on PR #4175: URL: https://github.com/apache/hive/pull/4175#issuecomment-1501946089 Merged, thanks @yigress ! Issue Time Tracking --- Worklog Id: (was: 855879) Time Spent: 40m (was: 0.5h) > Backport HIVE-24928 to branch-3 > --- > > Key: HIVE-27200 > URL: https://issues.apache.org/jira/browse/HIVE-27200 > Project: Hive > Issue Type: Improvement > Components: StorageHandler >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Critical > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > This is to backport HIVE-24928 so that for HiveStorageHandler table 'ANALYZE > TABLE ... COMPUTE STATISTICS' can use storagehandler to provide basic stats > with BasicStatsNoJobTask -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27200) Backport HIVE-24928 to branch-3
[ https://issues.apache.org/jira/browse/HIVE-27200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved HIVE-27200. - Fix Version/s: 3.2.0 Resolution: Fixed > Backport HIVE-24928 to branch-3 > --- > > Key: HIVE-27200 > URL: https://issues.apache.org/jira/browse/HIVE-27200 > Project: Hive > Issue Type: Improvement > Components: StorageHandler >Reporter: Yi Zhang >Assignee: Yi Zhang >Priority: Critical > Labels: pull-request-available > Fix For: 3.2.0 > > Time Spent: 40m > Remaining Estimate: 0h > > This is to backport HIVE-24928 so that for HiveStorageHandler table 'ANALYZE > TABLE ... COMPUTE STATISTICS' can use storagehandler to provide basic stats > with BasicStatsNoJobTask -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27173) Add method for Spark to be able to trigger DML events
[ https://issues.apache.org/jira/browse/HIVE-27173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam resolved HIVE-27173. -- Fix Version/s: 4.0.0 Resolution: Fixed Fix has been merged to master. > Add method for Spark to be able to trigger DML events > - > > Key: HIVE-27173 > URL: https://issues.apache.org/jira/browse/HIVE-27173 > Project: Hive > Issue Type: Improvement >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Spark currently uses Hive.java from Hive as a convenient way to hide from the > having to deal with HMS Client and the thrift objects. Currently, Hive has > support for DML events (being able to generate events on DML operations but > does not expose a public method to do so). It has a private method that takes > in Hive objects like Table etc. Would be nice if we can have something with > more primitive datatypes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27173) Add method for Spark to be able to trigger DML events
[ https://issues.apache.org/jira/browse/HIVE-27173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam reassigned HIVE-27173: Assignee: Naveen Gangam > Add method for Spark to be able to trigger DML events > - > > Key: HIVE-27173 > URL: https://issues.apache.org/jira/browse/HIVE-27173 > Project: Hive > Issue Type: Improvement >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Spark currently uses Hive.java from Hive as a convenient way to hide from the > having to deal with HMS Client and the thrift objects. Currently, Hive has > support for DML events (being able to generate events on DML operations but > does not expose a public method to do so). It has a private method that takes > in Hive objects like Table etc. Would be nice if we can have something with > more primitive datatypes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27173) Add method for Spark to be able to trigger DML events
[ https://issues.apache.org/jira/browse/HIVE-27173?focusedWorklogId=855861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855861 ] ASF GitHub Bot logged work on HIVE-27173: - Author: ASF GitHub Bot Created on: 10/Apr/23 14:35 Start Date: 10/Apr/23 14:35 Worklog Time Spent: 10m Work Description: nrg4878 merged PR #4201: URL: https://github.com/apache/hive/pull/4201 Issue Time Tracking --- Worklog Id: (was: 855861) Time Spent: 40m (was: 0.5h) > Add method for Spark to be able to trigger DML events > - > > Key: HIVE-27173 > URL: https://issues.apache.org/jira/browse/HIVE-27173 > Project: Hive > Issue Type: Improvement >Reporter: Naveen Gangam >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Spark currently uses Hive.java from Hive as a convenient way to hide from the > having to deal with HMS Client and the thrift objects. Currently, Hive has > support for DML events (being able to generate events on DML operations but > does not expose a public method to do so). It has a private method that takes > in Hive objects like Table etc. Would be nice if we can have something with > more primitive datatypes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27173) Add method for Spark to be able to trigger DML events
[ https://issues.apache.org/jira/browse/HIVE-27173?focusedWorklogId=855862=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855862 ] ASF GitHub Bot logged work on HIVE-27173: - Author: ASF GitHub Bot Created on: 10/Apr/23 14:35 Start Date: 10/Apr/23 14:35 Worklog Time Spent: 10m Work Description: nrg4878 commented on PR #4201: URL: https://github.com/apache/hive/pull/4201#issuecomment-1501893908 Thank you for the review @dengzhhu653 Issue Time Tracking --- Worklog Id: (was: 855862) Time Spent: 50m (was: 40m) > Add method for Spark to be able to trigger DML events > - > > Key: HIVE-27173 > URL: https://issues.apache.org/jira/browse/HIVE-27173 > Project: Hive > Issue Type: Improvement >Reporter: Naveen Gangam >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Spark currently uses Hive.java from Hive as a convenient way to hide from the > having to deal with HMS Client and the thrift objects. Currently, Hive has > support for DML events (being able to generate events on DML operations but > does not expose a public method to do so). It has a private method that takes > in Hive objects like Table etc. Would be nice if we can have something with > more primitive datatypes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855860 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 14:34 Start Date: 10/Apr/23 14:34 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161772310 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java: ## @@ -82,7 +82,29 @@ public static ValidTxnList createValidTxnListForCleaner(GetOpenTxnsResponse txns bitSet.set(0, abortedTxns.length); //add ValidCleanerTxnList? - could be problematic for all the places that read it from // string as they'd have to know which object to instantiate -return new ValidReadTxnList(abortedTxns, bitSet, highWaterMark, Long.MAX_VALUE); +return new ValidReadTxnList(abortedTxns, bitSet, highWatermark, Long.MAX_VALUE); + } + + public static ValidTxnList createValidTxnListForAbortedTxnCleaner(GetOpenTxnsResponse txns, long minOpenTxn) { Review Comment: how is that different from `createValidTxnListForCleaner `, everything in Open_txns list `< minOpenTxn - 1` would be aborted Issue Time Tracking --- Worklog Id: (was: 855860) Time Spent: 11h 40m (was: 11.5h) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 11h 40m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27223) Show Compactions failing with NPE
[ https://issues.apache.org/jira/browse/HIVE-27223?focusedWorklogId=855856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855856 ] ASF GitHub Bot logged work on HIVE-27223: - Author: ASF GitHub Bot Created on: 10/Apr/23 14:28 Start Date: 10/Apr/23 14:28 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4204: URL: https://github.com/apache/hive/pull/4204#issuecomment-1501886365 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=4204) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4204=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4204=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4204=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=CODE_SMELL) [0 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=4204=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=4204=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=4204=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 855856) Time Spent: 0.5h (was: 20m) > Show Compactions failing with NPE > - > > Key: HIVE-27223 > URL: https://issues.apache.org/jira/browse/HIVE-27223 > Project: Hive > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > {noformat} > java.lang.NullPointerException: null > at java.io.DataOutputStream.writeBytes(DataOutputStream.java:274) ~[?:?] > at > org.apache.hadoop.hive.ql.ddl.process.show.compactions.ShowCompactionsOperation.writeRow(ShowCompactionsOperation.java:135) > > at > org.apache.hadoop.hive.ql.ddl.process.show.compactions.ShowCompactionsOperation.execute(ShowCompactionsOperation.java:57) > > at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855855 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 14:28 Start Date: 10/Apr/23 14:28 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161767398 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java: ## @@ -82,7 +82,29 @@ public static ValidTxnList createValidTxnListForCleaner(GetOpenTxnsResponse txns bitSet.set(0, abortedTxns.length); //add ValidCleanerTxnList? - could be problematic for all the places that read it from // string as they'd have to know which object to instantiate -return new ValidReadTxnList(abortedTxns, bitSet, highWaterMark, Long.MAX_VALUE); +return new ValidReadTxnList(abortedTxns, bitSet, highWatermark, Long.MAX_VALUE); + } + + public static ValidTxnList createValidTxnListForAbortedTxnCleaner(GetOpenTxnsResponse txns, long minOpenTxn) { +long highWatermark = minOpenTxn - 1; +long[] exceptions = new long[txns.getOpen_txnsSize()]; +int i = 0; +BitSet abortedBits = BitSet.valueOf(txns.getAbortedBits()); +// getOpen_txns() guarantees that the list contains only aborted & open txns. +// exceptions list must contain both txn types since validWriteIdList filters out the aborted ones and valid ones for that table. +// If a txn is not in exception list, it is considered as a valid one and thought of as an uncompacted write. +// See TxnHandler#getValidWriteIdsForTable() for more details. +for(long txnId : txns.getOpen_txns()) { + if(txnId > highWatermark) { +break; + } + exceptions[i] = txnId; + i++; +} +exceptions = Arrays.copyOf(exceptions, i); +//add ValidCleanerTxnList? - could be problematic for all the places that read it from Review Comment: is this a leftover comment? Issue Time Tracking --- Worklog Id: (was: 855855) Time Spent: 11.5h (was: 11h 20m) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 11.5h > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855852 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 14:27 Start Date: 10/Apr/23 14:27 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161766383 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java: ## @@ -82,7 +82,29 @@ public static ValidTxnList createValidTxnListForCleaner(GetOpenTxnsResponse txns bitSet.set(0, abortedTxns.length); //add ValidCleanerTxnList? - could be problematic for all the places that read it from // string as they'd have to know which object to instantiate -return new ValidReadTxnList(abortedTxns, bitSet, highWaterMark, Long.MAX_VALUE); +return new ValidReadTxnList(abortedTxns, bitSet, highWatermark, Long.MAX_VALUE); + } + + public static ValidTxnList createValidTxnListForAbortedTxnCleaner(GetOpenTxnsResponse txns, long minOpenTxn) { +long highWatermark = minOpenTxn - 1; +long[] exceptions = new long[txns.getOpen_txnsSize()]; +int i = 0; +BitSet abortedBits = BitSet.valueOf(txns.getAbortedBits()); +// getOpen_txns() guarantees that the list contains only aborted & open txns. +// exceptions list must contain both txn types since validWriteIdList filters out the aborted ones and valid ones for that table. +// If a txn is not in exception list, it is considered as a valid one and thought of as an uncompacted write. +// See TxnHandler#getValidWriteIdsForTable() for more details. +for(long txnId : txns.getOpen_txns()) { Review Comment: txns.getOpen_txns() is sorted so no need for whole list scan Issue Time Tracking --- Worklog Id: (was: 855852) Time Spent: 11h 20m (was: 11h 10m) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 11h 20m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855845=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855845 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 14:18 Start Date: 10/Apr/23 14:18 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161759075 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java: ## @@ -82,7 +82,29 @@ public static ValidTxnList createValidTxnListForCleaner(GetOpenTxnsResponse txns bitSet.set(0, abortedTxns.length); //add ValidCleanerTxnList? - could be problematic for all the places that read it from // string as they'd have to know which object to instantiate -return new ValidReadTxnList(abortedTxns, bitSet, highWaterMark, Long.MAX_VALUE); +return new ValidReadTxnList(abortedTxns, bitSet, highWatermark, Long.MAX_VALUE); + } + + public static ValidTxnList createValidTxnListForAbortedTxnCleaner(GetOpenTxnsResponse txns, long minOpenTxn) { +long highWatermark = minOpenTxn - 1; +long[] exceptions = new long[txns.getOpen_txnsSize()]; +int i = 0; +BitSet abortedBits = BitSet.valueOf(txns.getAbortedBits()); +// getOpen_txns() guarantees that the list contains only aborted & open txns. +// exceptions list must contain both txn types since validWriteIdList filters out the aborted ones and valid ones for that table. +// If a txn is not in exception list, it is considered as a valid one and thought of as an uncompacted write. +// See TxnHandler#getValidWriteIdsForTable() for more details. +for(long txnId : txns.getOpen_txns()) { Review Comment: reformat, missing space Issue Time Tracking --- Worklog Id: (was: 855845) Time Spent: 11h 10m (was: 11h) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 11h 10m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855844=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855844 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 14:18 Start Date: 10/Apr/23 14:18 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161758527 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java: ## @@ -60,20 +60,20 @@ public class TxnUtils { private static final Logger LOG = LoggerFactory.getLogger(TxnUtils.class); - public static ValidTxnList createValidTxnListForCleaner(GetOpenTxnsResponse txns, long minOpenTxnGLB) { -long highWaterMark = minOpenTxnGLB - 1; + public static ValidTxnList createValidTxnListForCompactionCleaner(GetOpenTxnsResponse txns, long minOpenTxn) { +long highWatermark = minOpenTxn - 1; long[] abortedTxns = new long[txns.getOpen_txnsSize()]; BitSet abortedBits = BitSet.valueOf(txns.getAbortedBits()); int i = 0; for(long txnId : txns.getOpen_txns()) { - if(txnId > highWaterMark) { + if(txnId > highWatermark) { break; } if(abortedBits.get(i)) { Review Comment: space Issue Time Tracking --- Worklog Id: (was: 855844) Time Spent: 11h (was: 10h 50m) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 11h > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855843=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855843 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 14:17 Start Date: 10/Apr/23 14:17 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161758279 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnUtils.java: ## @@ -60,20 +60,20 @@ public class TxnUtils { private static final Logger LOG = LoggerFactory.getLogger(TxnUtils.class); - public static ValidTxnList createValidTxnListForCleaner(GetOpenTxnsResponse txns, long minOpenTxnGLB) { -long highWaterMark = minOpenTxnGLB - 1; + public static ValidTxnList createValidTxnListForCompactionCleaner(GetOpenTxnsResponse txns, long minOpenTxn) { +long highWatermark = minOpenTxn - 1; long[] abortedTxns = new long[txns.getOpen_txnsSize()]; BitSet abortedBits = BitSet.valueOf(txns.getAbortedBits()); int i = 0; for(long txnId : txns.getOpen_txns()) { - if(txnId > highWaterMark) { + if(txnId > highWatermark) { Review Comment: space Issue Time Tracking --- Worklog Id: (was: 855843) Time Spent: 10h 50m (was: 10h 40m) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 10h 50m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855799=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855799 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 10/Apr/23 12:37 Start Date: 10/Apr/23 12:37 Worklog Time Spent: 10m Work Description: ayushtkn commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1161623163 ## common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: ## @@ -2205,9 +2205,8 @@ public static enum ConfVars { "padding tolerance config (hive.exec.orc.block.padding.tolerance)."), HIVE_ORC_CODEC_POOL("hive.use.orc.codec.pool", false, "Whether to use codec pool in ORC. Disable if there are bugs with codec reuse."), -HIVE_USE_STATS_FROM("hive.use.stats.from","iceberg","Use stats from iceberg table snapshot for query " + -"planning. This has three values metastore, puffin and iceberg"), - +HIVE_ICEBERG_STATS_SOURCE("hive.iceberg.stats.source","iceberg","Use stats from iceberg table snapshot for query " + +"planning. This has three values metastore and iceberg"), Review Comment: > This has three values metastore and iceberg what is the third value,? ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -361,6 +378,83 @@ private Table getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { return table; } + + @Override + public boolean canSetColStatistics() { +return getStatsSource().equals(ICEBERG); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +if (table.currentSnapshot() != null) { + Path statsPath = getStatsPath(table); + if (getStatsSource().equals(ICEBERG)) { Review Comment: can use ```canSetColStatistics()``` ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -361,6 +378,83 @@ private Table getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { return table; } + + @Override + public boolean canSetColStatistics() { +return getStatsSource().equals(ICEBERG); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +if (table.currentSnapshot() != null) { + Path statsPath = getStatsPath(table); + if (getStatsSource().equals(ICEBERG)) { +try (FileSystem fs = statsPath.getFileSystem(conf)) { + if (fs.exists(statsPath)) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +String statsPath = getStatsPath(table).toString(); +LOG.info("Using stats from puffin file at:" + statsPath); Review Comment: Logger format: ``` LOG.info("Using stats from puffin file at: {}", statsPath); ``` ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -361,6 +378,83 @@ private Table getTable(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { return table; } + + @Override + public boolean canSetColStatistics() { +return getStatsSource().equals(ICEBERG); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +if (table.currentSnapshot() != null) { + Path statsPath = getStatsPath(table); + if (getStatsSource().equals(ICEBERG)) { +try (FileSystem fs = statsPath.getFileSystem(conf)) { + if (fs.exists(statsPath)) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table hmsTable) { +Table table = Catalogs.loadTable(conf, Utilities.getTableDesc(hmsTable).getProperties()); +String statsPath = getStatsPath(table).toString(); +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + List blobMetadata = reader.fileMetadata().blobs(); + Map> collect = +
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855793 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 12:14 Start Date: 10/Apr/23 12:14 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161672912 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java: ## @@ -516,6 +516,19 @@ Set findPotentialCompactions(int abortedThreshold, long abortedT @RetrySemantics.ReadOnly List findReadyToClean(long minOpenTxnWaterMark, long retentionTime) throws MetaException; + /** + * Find the aborted entries in TXN_COMPONENTS which can be used to + * clean directories belonging to transactions in aborted state. + * @param abortedTimeThreshold Age of table/partition's oldest aborted transaction involving a given table + *or partition that will trigger cleanup. + * @param abortedThreshold Number of aborted transactions involving a given table or partition + * that will trigger cleanup. + * @return Information of potential abort items that needs to be cleaned. + * @throws MetaException + */ + @RetrySemantics.ReadOnly + List findReadyToCleanForAborts(long abortedTimeThreshold, int abortedThreshold) throws MetaException; Review Comment: maybe `findReadyToCleanAborts` ? Issue Time Tracking --- Worklog Id: (was: 855793) Time Spent: 10.5h (was: 10h 20m) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 10.5h > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855794 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 12:14 Start Date: 10/Apr/23 12:14 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161673317 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java: ## @@ -541,6 +554,15 @@ Set findPotentialCompactions(int abortedThreshold, long abortedT @RetrySemantics.CannotRetry void markCleaned(CompactionInfo info) throws MetaException; + /** + * This will remove an aborted entries from TXN_COMPONENTS table after + * the aborted directories are removed from the filesystem. + * @param info info on the aborted directories cleanup that needs to be removed + * @throws MetaException + */ + @RetrySemantics.CannotRetry + void markCleanedForAborts(AcidTxnInfo info) throws MetaException; Review Comment: I wouldn't create a separate API just for that Issue Time Tracking --- Worklog Id: (was: 855794) Time Spent: 10h 40m (was: 10.5h) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 10h 40m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855792=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855792 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 12:12 Start Date: 10/Apr/23 12:12 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161672116 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java: ## @@ -516,6 +516,19 @@ Set findPotentialCompactions(int abortedThreshold, long abortedT @RetrySemantics.ReadOnly List findReadyToClean(long minOpenTxnWaterMark, long retentionTime) throws MetaException; + /** + * Find the aborted entries in TXN_COMPONENTS which can be used to + * clean directories belonging to transactions in aborted state. + * @param abortedTimeThreshold Age of table/partition's oldest aborted transaction involving a given table + *or partition that will trigger cleanup. + * @param abortedThreshold Number of aborted transactions involving a given table or partition + * that will trigger cleanup. + * @return Information of potential abort items that needs to be cleaned. + * @throws MetaException + */ + @RetrySemantics.ReadOnly + List findReadyToCleanForAborts(long abortedTimeThreshold, int abortedThreshold) throws MetaException; Review Comment: I wouldn't introduce new API for that Issue Time Tracking --- Worklog Id: (was: 855792) Time Spent: 10h 20m (was: 10h 10m) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 10h 20m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855791 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 12:10 Start Date: 10/Apr/23 12:10 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161670816 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java: ## @@ -702,6 +699,102 @@ public void markCleaned(CompactionInfo info) throws MetaException { } } + @Override + public void markCleanedForAborts(AcidTxnInfo info) throws MetaException { +// Do cleanup of TXN_COMPONENTS table +LOG.debug("Running markCleanedForAborts with CompactionInfo: {}", info); +try { + Connection dbConn = null; + try { +dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED, connPoolCompaction); +markAbortCleaned(dbConn, info); +LOG.debug("Going to commit"); +dbConn.commit(); + } catch (SQLException e) { +LOG.error("Unable to delete from txn components due to {}", e.getMessage()); +LOG.debug("Going to rollback"); +rollbackDBConn(dbConn); +checkRetryable(e, "markCleanedForAborts(" + info + ")"); +throw new MetaException("Unable to connect to transaction database " + +e.getMessage()); + } finally { +closeDbConn(dbConn); + } +} catch (RetryException e) { + markCleanedForAborts(info); +} + } + + private void markAbortCleaned(Connection dbConn, AcidTxnInfo info) throws MetaException, RetryException { Review Comment: rename to `removeTxnComponents` Issue Time Tracking --- Worklog Id: (was: 855791) Time Spent: 10h 10m (was: 10h) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 10h 10m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855789=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855789 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 12:03 Start Date: 10/Apr/23 12:03 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161666811 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/AcidTxnInfo.java: ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.metastore.txn; + +import org.apache.commons.lang3.builder.ToStringBuilder; +import org.apache.hadoop.hive.common.ValidCompactorWriteIdList; +import org.apache.hadoop.hive.metastore.api.TableValidWriteIds; + +import java.util.Set; + +/** + * A class used for encapsulating information of abort-cleanup activities and compaction activities. + */ +public class AcidTxnInfo { Review Comment: Can we reuse CompactionInfo object and not create another entity? Issue Time Tracking --- Worklog Id: (was: 855789) Time Spent: 10h (was: 9h 50m) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 10h > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855788 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 12:00 Start Date: 10/Apr/23 12:00 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161665575 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/TaskHandlerFactory.java: ## @@ -43,7 +44,14 @@ private TaskHandlerFactory() { public List getHandlers(HiveConf conf, TxnStore txnHandler, MetadataCache metadataCache, boolean metricsEnabled, FSRemover fsRemover) { -return Arrays.asList(new CompactionCleaner(conf, txnHandler, metadataCache, +boolean useAbortHandler = MetastoreConf.getBoolVar(conf, MetastoreConf.ConfVars.COMPACTOR_CLEAN_ABORTS_USING_CLEANER); +List taskHandlers = new ArrayList<>(); +if (useAbortHandler) { Review Comment: no need for that check, from now on use Cleaner to handle aborts Issue Time Tracking --- Worklog Id: (was: 855788) Time Spent: 9h 50m (was: 9h 40m) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 9h 50m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855785=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855785 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 11:58 Start Date: 10/Apr/23 11:58 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161664755 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/CompactionCleaner.java: ## @@ -259,49 +247,11 @@ private void cleanUsingAcidDir(CompactionInfo ci, String location, long minOpenT */ // Creating 'reader' list since we are interested in the set of 'obsolete' files -ValidReaderWriteIdList validWriteIdList = getValidCleanerWriteIdList(ci, validTxnList); -LOG.debug("Cleaning based on writeIdList: {}", validWriteIdList); - -Path path = new Path(location); -FileSystem fs = path.getFileSystem(conf); - -// Collect all the files/dirs -Map dirSnapshots = AcidUtils.getHdfsDirSnapshotsForCleaner(fs, path); -AcidDirectory dir = AcidUtils.getAcidState(fs, path, conf, validWriteIdList, Ref.from(false), false, -dirSnapshots); +ValidReaderWriteIdList validWriteIdList = getValidCleanerWriteIdListForCompactionCleaner(ci, validTxnList); Table table = metadataCache.computeIfAbsent(ci.getFullTableName(), () -> resolveTable(ci.dbname, ci.tableName)); -boolean isDynPartAbort = CompactorUtil.isDynPartAbort(table, ci.partName); - -List obsoleteDirs = CompactorUtil.getObsoleteDirs(dir, isDynPartAbort); -if (isDynPartAbort || dir.hasUncompactedAborts()) { - ci.setWriteIds(dir.hasUncompactedAborts(), dir.getAbortedWriteIds()); -} - -List deleted = fsRemover.clean(new CleanupRequestBuilder().setLocation(location) - .setDbName(ci.dbname).setFullPartitionName(ci.getFullPartitionName()) -.setRunAs(ci.runAs).setObsoleteDirs(obsoleteDirs).setPurge(true) -.build()); - -if (!deleted.isEmpty()) { - AcidMetricService.updateMetricsFromCleaner(ci.dbname, ci.tableName, ci.partName, dir.getObsolete(), conf, - txnHandler); -} - -// Make sure there are no leftovers below the compacted watermark -boolean success = false; -conf.set(ValidTxnList.VALID_TXNS_KEY, new ValidReadTxnList().toString()); -dir = AcidUtils.getAcidState(fs, path, conf, new ValidReaderWriteIdList( -ci.getFullTableName(), new long[0], new BitSet(), ci.highestWriteId, Long.MAX_VALUE), -Ref.from(false), false, dirSnapshots); +LOG.debug("Cleaning based on writeIdList: {}", validWriteIdList); -List remained = subtract(CompactorUtil.getObsoleteDirs(dir, isDynPartAbort), deleted); -if (!remained.isEmpty()) { - LOG.warn("{} Remained {} obsolete directories from {}. {}", - idWatermark(ci), remained.size(), location, CompactorUtil.getDebugInfo(remained)); -} else { - LOG.debug("{} All cleared below the watermark: {} from {}", idWatermark(ci), ci.highestWriteId, location); - success = true; -} +boolean success = cleanAndVerifyObsoleteDirectories(ci, location, validWriteIdList, table); Review Comment: 1 line below no need to check for `isDynPartAbort ` Issue Time Tracking --- Worklog Id: (was: 855785) Time Spent: 9h 40m (was: 9.5h) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 9h 40m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855783=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855783 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 11:56 Start Date: 10/Apr/23 11:56 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161658033 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/CompactionCleaner.java: ## @@ -337,18 +287,9 @@ private static String idWatermark(CompactionInfo ci) { return " id=" + ci.id; } - private ValidReaderWriteIdList getValidCleanerWriteIdList(CompactionInfo ci, ValidTxnList validTxnList) + private ValidReaderWriteIdList getValidCleanerWriteIdListForCompactionCleaner(CompactionInfo ci, ValidTxnList validTxnList) Review Comment: why rename here, just override the parent method and call super? Issue Time Tracking --- Worklog Id: (was: 855783) Time Spent: 9h 20m (was: 9h 10m) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 9h 20m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855782=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855782 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 11:54 Start Date: 10/Apr/23 11:54 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161662278 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/TaskHandler.java: ## @@ -81,4 +102,63 @@ protected Partition resolvePartition(String dbName, String tableName, String par return null; } } + + protected ValidReaderWriteIdList getValidCleanerWriteIdList(AcidTxnInfo acidTxnInfo, ValidTxnList validTxnList) Review Comment: should we rename this method to getValidWriteIdList? Issue Time Tracking --- Worklog Id: (was: 855782) Time Spent: 9h 10m (was: 9h) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 9h 10m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855784 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 11:58 Start Date: 10/Apr/23 11:58 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161653684 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.txn.compactor.handler; + +import org.apache.hadoop.hive.common.ValidReaderWriteIdList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.metrics.MetricsConstants; +import org.apache.hadoop.hive.metastore.metrics.PerfLogger; +import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo; +import org.apache.hadoop.hive.metastore.txn.TxnStore; +import org.apache.hadoop.hive.metastore.txn.TxnUtils; +import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable; +import org.apache.hadoop.hive.ql.txn.compactor.FSRemover; +import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static java.util.Objects.isNull; + +/** + * Abort-cleanup based implementation of TaskHandler. + * Provides implementation of creation of abort clean tasks. + */ +class AbortedTxnCleaner extends TaskHandler { + + private static final Logger LOG = LoggerFactory.getLogger(AbortedTxnCleaner.class.getName()); + + public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler, + MetadataCache metadataCache, boolean metricsEnabled, + FSRemover fsRemover) { +super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover); + } + + /** + The following cleanup is based on the following idea - + 1. Aborted cleanup is independent of compaction. This is because directories which are written by + aborted txns are not visible by any open txns. It is only visible while determining the AcidState (which + only sees the aborted deltas and does not read the file). + + The following algorithm is used to clean the set of aborted directories - + a. Find the list of entries which are suitable for cleanup (This is done in {@link TxnStore#findReadyToCleanForAborts(long, int)}). + b. If the table/partition does not exist, then remove the associated aborted entry in TXN_COMPONENTS table. + c. Get the AcidState of the table by using the min open txnID, database name, tableName, partition name, highest write ID + d. Fetch the aborted directories and delete the directories. + e. Fetch the aborted write IDs from the AcidState and use it to delete the associated metadata in the TXN_COMPONENTS table. + **/ + @Override + public List getTasks() throws MetaException { +int abortedThreshold = HiveConf.getIntVar(conf, + HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD); +long abortedTimeThreshold = HiveConf + .getTimeVar(conf, HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD, + TimeUnit.MILLISECONDS); +List readyToCleanAborts = txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold); + +if (!readyToCleanAborts.isEmpty()) { + return readyToCleanAborts.stream().map(ci -> ThrowingRunnable.unchecked(() -> + clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE,
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855780 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 11:46 Start Date: 10/Apr/23 11:46 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161658033 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/CompactionCleaner.java: ## @@ -337,18 +287,9 @@ private static String idWatermark(CompactionInfo ci) { return " id=" + ci.id; } - private ValidReaderWriteIdList getValidCleanerWriteIdList(CompactionInfo ci, ValidTxnList validTxnList) + private ValidReaderWriteIdList getValidCleanerWriteIdListForCompactionCleaner(CompactionInfo ci, ValidTxnList validTxnList) Review Comment: why rename here? Issue Time Tracking --- Worklog Id: (was: 855780) Time Spent: 9h (was: 8h 50m) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 9h > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855777 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 11:42 Start Date: 10/Apr/23 11:42 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161655781 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.txn.compactor.handler; + +import org.apache.hadoop.hive.common.ValidReaderWriteIdList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.metrics.MetricsConstants; +import org.apache.hadoop.hive.metastore.metrics.PerfLogger; +import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo; +import org.apache.hadoop.hive.metastore.txn.TxnStore; +import org.apache.hadoop.hive.metastore.txn.TxnUtils; +import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable; +import org.apache.hadoop.hive.ql.txn.compactor.FSRemover; +import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static java.util.Objects.isNull; + +/** + * Abort-cleanup based implementation of TaskHandler. + * Provides implementation of creation of abort clean tasks. + */ +class AbortedTxnCleaner extends TaskHandler { + + private static final Logger LOG = LoggerFactory.getLogger(AbortedTxnCleaner.class.getName()); + + public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler, + MetadataCache metadataCache, boolean metricsEnabled, + FSRemover fsRemover) { +super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover); + } + + /** + The following cleanup is based on the following idea - + 1. Aborted cleanup is independent of compaction. This is because directories which are written by + aborted txns are not visible by any open txns. It is only visible while determining the AcidState (which + only sees the aborted deltas and does not read the file). + + The following algorithm is used to clean the set of aborted directories - + a. Find the list of entries which are suitable for cleanup (This is done in {@link TxnStore#findReadyToCleanForAborts(long, int)}). + b. If the table/partition does not exist, then remove the associated aborted entry in TXN_COMPONENTS table. + c. Get the AcidState of the table by using the min open txnID, database name, tableName, partition name, highest write ID + d. Fetch the aborted directories and delete the directories. + e. Fetch the aborted write IDs from the AcidState and use it to delete the associated metadata in the TXN_COMPONENTS table. + **/ + @Override + public List getTasks() throws MetaException { +int abortedThreshold = HiveConf.getIntVar(conf, + HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD); +long abortedTimeThreshold = HiveConf + .getTimeVar(conf, HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD, + TimeUnit.MILLISECONDS); +List readyToCleanAborts = txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold); + +if (!readyToCleanAborts.isEmpty()) { + return readyToCleanAborts.stream().map(ci -> ThrowingRunnable.unchecked(() -> + clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE,
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855776=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855776 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 11:41 Start Date: 10/Apr/23 11:41 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161655435 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.txn.compactor.handler; + +import org.apache.hadoop.hive.common.ValidReaderWriteIdList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.metrics.MetricsConstants; +import org.apache.hadoop.hive.metastore.metrics.PerfLogger; +import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo; +import org.apache.hadoop.hive.metastore.txn.TxnStore; +import org.apache.hadoop.hive.metastore.txn.TxnUtils; +import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable; +import org.apache.hadoop.hive.ql.txn.compactor.FSRemover; +import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static java.util.Objects.isNull; + +/** + * Abort-cleanup based implementation of TaskHandler. + * Provides implementation of creation of abort clean tasks. + */ +class AbortedTxnCleaner extends TaskHandler { + + private static final Logger LOG = LoggerFactory.getLogger(AbortedTxnCleaner.class.getName()); + + public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler, + MetadataCache metadataCache, boolean metricsEnabled, + FSRemover fsRemover) { +super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover); + } + + /** + The following cleanup is based on the following idea - + 1. Aborted cleanup is independent of compaction. This is because directories which are written by + aborted txns are not visible by any open txns. It is only visible while determining the AcidState (which + only sees the aborted deltas and does not read the file). + + The following algorithm is used to clean the set of aborted directories - + a. Find the list of entries which are suitable for cleanup (This is done in {@link TxnStore#findReadyToCleanForAborts(long, int)}). + b. If the table/partition does not exist, then remove the associated aborted entry in TXN_COMPONENTS table. + c. Get the AcidState of the table by using the min open txnID, database name, tableName, partition name, highest write ID + d. Fetch the aborted directories and delete the directories. + e. Fetch the aborted write IDs from the AcidState and use it to delete the associated metadata in the TXN_COMPONENTS table. + **/ + @Override + public List getTasks() throws MetaException { +int abortedThreshold = HiveConf.getIntVar(conf, + HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD); +long abortedTimeThreshold = HiveConf + .getTimeVar(conf, HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD, + TimeUnit.MILLISECONDS); +List readyToCleanAborts = txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold); + +if (!readyToCleanAborts.isEmpty()) { + return readyToCleanAborts.stream().map(ci -> ThrowingRunnable.unchecked(() -> + clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE,
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855775 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 11:38 Start Date: 10/Apr/23 11:38 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161653684 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.txn.compactor.handler; + +import org.apache.hadoop.hive.common.ValidReaderWriteIdList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.metrics.MetricsConstants; +import org.apache.hadoop.hive.metastore.metrics.PerfLogger; +import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo; +import org.apache.hadoop.hive.metastore.txn.TxnStore; +import org.apache.hadoop.hive.metastore.txn.TxnUtils; +import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable; +import org.apache.hadoop.hive.ql.txn.compactor.FSRemover; +import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static java.util.Objects.isNull; + +/** + * Abort-cleanup based implementation of TaskHandler. + * Provides implementation of creation of abort clean tasks. + */ +class AbortedTxnCleaner extends TaskHandler { + + private static final Logger LOG = LoggerFactory.getLogger(AbortedTxnCleaner.class.getName()); + + public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler, + MetadataCache metadataCache, boolean metricsEnabled, + FSRemover fsRemover) { +super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover); + } + + /** + The following cleanup is based on the following idea - + 1. Aborted cleanup is independent of compaction. This is because directories which are written by + aborted txns are not visible by any open txns. It is only visible while determining the AcidState (which + only sees the aborted deltas and does not read the file). + + The following algorithm is used to clean the set of aborted directories - + a. Find the list of entries which are suitable for cleanup (This is done in {@link TxnStore#findReadyToCleanForAborts(long, int)}). + b. If the table/partition does not exist, then remove the associated aborted entry in TXN_COMPONENTS table. + c. Get the AcidState of the table by using the min open txnID, database name, tableName, partition name, highest write ID + d. Fetch the aborted directories and delete the directories. + e. Fetch the aborted write IDs from the AcidState and use it to delete the associated metadata in the TXN_COMPONENTS table. + **/ + @Override + public List getTasks() throws MetaException { +int abortedThreshold = HiveConf.getIntVar(conf, + HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD); +long abortedTimeThreshold = HiveConf + .getTimeVar(conf, HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD, + TimeUnit.MILLISECONDS); +List readyToCleanAborts = txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold); + +if (!readyToCleanAborts.isEmpty()) { + return readyToCleanAborts.stream().map(ci -> ThrowingRunnable.unchecked(() -> + clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE,
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855774=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855774 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 11:34 Start Date: 10/Apr/23 11:34 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161652023 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.txn.compactor.handler; + +import org.apache.hadoop.hive.common.ValidReaderWriteIdList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.metrics.MetricsConstants; +import org.apache.hadoop.hive.metastore.metrics.PerfLogger; +import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo; +import org.apache.hadoop.hive.metastore.txn.TxnStore; +import org.apache.hadoop.hive.metastore.txn.TxnUtils; +import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable; +import org.apache.hadoop.hive.ql.txn.compactor.FSRemover; +import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static java.util.Objects.isNull; + +/** + * Abort-cleanup based implementation of TaskHandler. + * Provides implementation of creation of abort clean tasks. + */ +class AbortedTxnCleaner extends TaskHandler { + + private static final Logger LOG = LoggerFactory.getLogger(AbortedTxnCleaner.class.getName()); + + public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler, + MetadataCache metadataCache, boolean metricsEnabled, + FSRemover fsRemover) { +super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover); + } + + /** + The following cleanup is based on the following idea - + 1. Aborted cleanup is independent of compaction. This is because directories which are written by + aborted txns are not visible by any open txns. It is only visible while determining the AcidState (which + only sees the aborted deltas and does not read the file). + + The following algorithm is used to clean the set of aborted directories - + a. Find the list of entries which are suitable for cleanup (This is done in {@link TxnStore#findReadyToCleanForAborts(long, int)}). + b. If the table/partition does not exist, then remove the associated aborted entry in TXN_COMPONENTS table. + c. Get the AcidState of the table by using the min open txnID, database name, tableName, partition name, highest write ID + d. Fetch the aborted directories and delete the directories. + e. Fetch the aborted write IDs from the AcidState and use it to delete the associated metadata in the TXN_COMPONENTS table. + **/ + @Override + public List getTasks() throws MetaException { +int abortedThreshold = HiveConf.getIntVar(conf, + HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD); +long abortedTimeThreshold = HiveConf + .getTimeVar(conf, HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD, + TimeUnit.MILLISECONDS); +List readyToCleanAborts = txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold); + +if (!readyToCleanAborts.isEmpty()) { + return readyToCleanAborts.stream().map(ci -> ThrowingRunnable.unchecked(() -> + clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE,
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855773 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 11:26 Start Date: 10/Apr/23 11:26 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161643395 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.txn.compactor.handler; + +import org.apache.hadoop.hive.common.ValidReaderWriteIdList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.metrics.MetricsConstants; +import org.apache.hadoop.hive.metastore.metrics.PerfLogger; +import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo; +import org.apache.hadoop.hive.metastore.txn.TxnStore; +import org.apache.hadoop.hive.metastore.txn.TxnUtils; +import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable; +import org.apache.hadoop.hive.ql.txn.compactor.FSRemover; +import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static java.util.Objects.isNull; + +/** + * Abort-cleanup based implementation of TaskHandler. + * Provides implementation of creation of abort clean tasks. + */ +class AbortedTxnCleaner extends AcidTxnCleaner { + + private static final Logger LOG = LoggerFactory.getLogger(AbortedTxnCleaner.class.getName()); + + public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler, + MetadataCache metadataCache, boolean metricsEnabled, + FSRemover fsRemover) { +super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover); + } + + /** + The following cleanup is based on the following idea - + 1. Aborted cleanup is independent of compaction. This is because directories which are written by + aborted txns are not visible by any open txns. It is only visible while determining the AcidState (which + only sees the aborted deltas and does not read the file). + + The following algorithm is used to clean the set of aborted directories - + a. Find the list of entries which are suitable for cleanup (This is done in {@link TxnStore#findReadyToCleanForAborts(long, int)}). + b. If the table/partition does not exist, then remove the associated aborted entry in TXN_COMPONENTS table. + c. Get the AcidState of the table by using the min open txnID, database name, tableName, partition name, highest write ID + d. Fetch the aborted directories and delete the directories. + e. Fetch the aborted write IDs from the AcidState and use it to delete the associated metadata in the TXN_COMPONENTS table. + **/ + @Override + public List getTasks() throws MetaException { +int abortedThreshold = HiveConf.getIntVar(conf, + HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD); +long abortedTimeThreshold = HiveConf + .getTimeVar(conf, HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD, + TimeUnit.MILLISECONDS); +List readyToCleanAborts = txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold); + +if (!readyToCleanAborts.isEmpty()) { + return readyToCleanAborts.stream().map(ci -> ThrowingRunnable.unchecked(() -> + clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE,
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855771=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855771 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 11:25 Start Date: 10/Apr/23 11:25 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161645900 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.txn.compactor.handler; + +import org.apache.hadoop.hive.common.ValidReaderWriteIdList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.metrics.MetricsConstants; +import org.apache.hadoop.hive.metastore.metrics.PerfLogger; +import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo; +import org.apache.hadoop.hive.metastore.txn.TxnStore; +import org.apache.hadoop.hive.metastore.txn.TxnUtils; +import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable; +import org.apache.hadoop.hive.ql.txn.compactor.FSRemover; +import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static java.util.Objects.isNull; + +/** + * Abort-cleanup based implementation of TaskHandler. + * Provides implementation of creation of abort clean tasks. + */ +class AbortedTxnCleaner extends AcidTxnCleaner { + + private static final Logger LOG = LoggerFactory.getLogger(AbortedTxnCleaner.class.getName()); + + public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler, + MetadataCache metadataCache, boolean metricsEnabled, + FSRemover fsRemover) { +super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover); + } + + /** + The following cleanup is based on the following idea - + 1. Aborted cleanup is independent of compaction. This is because directories which are written by + aborted txns are not visible by any open txns. It is only visible while determining the AcidState (which + only sees the aborted deltas and does not read the file). + + The following algorithm is used to clean the set of aborted directories - + a. Find the list of entries which are suitable for cleanup (This is done in {@link TxnStore#findReadyToCleanForAborts(long, int)}). + b. If the table/partition does not exist, then remove the associated aborted entry in TXN_COMPONENTS table. + c. Get the AcidState of the table by using the min open txnID, database name, tableName, partition name, highest write ID + d. Fetch the aborted directories and delete the directories. + e. Fetch the aborted write IDs from the AcidState and use it to delete the associated metadata in the TXN_COMPONENTS table. + **/ + @Override + public List getTasks() throws MetaException { +int abortedThreshold = HiveConf.getIntVar(conf, + HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD); +long abortedTimeThreshold = HiveConf + .getTimeVar(conf, HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD, + TimeUnit.MILLISECONDS); +List readyToCleanAborts = txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold); + +if (!readyToCleanAborts.isEmpty()) { + return readyToCleanAborts.stream().map(ci -> ThrowingRunnable.unchecked(() -> + clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE,
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855772=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855772 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 11:25 Start Date: 10/Apr/23 11:25 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161643395 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.txn.compactor.handler; + +import org.apache.hadoop.hive.common.ValidReaderWriteIdList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.metrics.MetricsConstants; +import org.apache.hadoop.hive.metastore.metrics.PerfLogger; +import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo; +import org.apache.hadoop.hive.metastore.txn.TxnStore; +import org.apache.hadoop.hive.metastore.txn.TxnUtils; +import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable; +import org.apache.hadoop.hive.ql.txn.compactor.FSRemover; +import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static java.util.Objects.isNull; + +/** + * Abort-cleanup based implementation of TaskHandler. + * Provides implementation of creation of abort clean tasks. + */ +class AbortedTxnCleaner extends AcidTxnCleaner { + + private static final Logger LOG = LoggerFactory.getLogger(AbortedTxnCleaner.class.getName()); + + public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler, + MetadataCache metadataCache, boolean metricsEnabled, + FSRemover fsRemover) { +super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover); + } + + /** + The following cleanup is based on the following idea - + 1. Aborted cleanup is independent of compaction. This is because directories which are written by + aborted txns are not visible by any open txns. It is only visible while determining the AcidState (which + only sees the aborted deltas and does not read the file). + + The following algorithm is used to clean the set of aborted directories - + a. Find the list of entries which are suitable for cleanup (This is done in {@link TxnStore#findReadyToCleanForAborts(long, int)}). + b. If the table/partition does not exist, then remove the associated aborted entry in TXN_COMPONENTS table. + c. Get the AcidState of the table by using the min open txnID, database name, tableName, partition name, highest write ID + d. Fetch the aborted directories and delete the directories. + e. Fetch the aborted write IDs from the AcidState and use it to delete the associated metadata in the TXN_COMPONENTS table. + **/ + @Override + public List getTasks() throws MetaException { +int abortedThreshold = HiveConf.getIntVar(conf, + HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD); +long abortedTimeThreshold = HiveConf + .getTimeVar(conf, HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD, + TimeUnit.MILLISECONDS); +List readyToCleanAborts = txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold); + +if (!readyToCleanAborts.isEmpty()) { + return readyToCleanAborts.stream().map(ci -> ThrowingRunnable.unchecked(() -> + clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE,
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855770 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 11:22 Start Date: 10/Apr/23 11:22 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161645900 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.txn.compactor.handler; + +import org.apache.hadoop.hive.common.ValidReaderWriteIdList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.metrics.MetricsConstants; +import org.apache.hadoop.hive.metastore.metrics.PerfLogger; +import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo; +import org.apache.hadoop.hive.metastore.txn.TxnStore; +import org.apache.hadoop.hive.metastore.txn.TxnUtils; +import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable; +import org.apache.hadoop.hive.ql.txn.compactor.FSRemover; +import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static java.util.Objects.isNull; + +/** + * Abort-cleanup based implementation of TaskHandler. + * Provides implementation of creation of abort clean tasks. + */ +class AbortedTxnCleaner extends AcidTxnCleaner { + + private static final Logger LOG = LoggerFactory.getLogger(AbortedTxnCleaner.class.getName()); + + public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler, + MetadataCache metadataCache, boolean metricsEnabled, + FSRemover fsRemover) { +super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover); + } + + /** + The following cleanup is based on the following idea - + 1. Aborted cleanup is independent of compaction. This is because directories which are written by + aborted txns are not visible by any open txns. It is only visible while determining the AcidState (which + only sees the aborted deltas and does not read the file). + + The following algorithm is used to clean the set of aborted directories - + a. Find the list of entries which are suitable for cleanup (This is done in {@link TxnStore#findReadyToCleanForAborts(long, int)}). + b. If the table/partition does not exist, then remove the associated aborted entry in TXN_COMPONENTS table. + c. Get the AcidState of the table by using the min open txnID, database name, tableName, partition name, highest write ID + d. Fetch the aborted directories and delete the directories. + e. Fetch the aborted write IDs from the AcidState and use it to delete the associated metadata in the TXN_COMPONENTS table. + **/ + @Override + public List getTasks() throws MetaException { +int abortedThreshold = HiveConf.getIntVar(conf, + HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD); +long abortedTimeThreshold = HiveConf + .getTimeVar(conf, HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD, + TimeUnit.MILLISECONDS); +List readyToCleanAborts = txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold); + +if (!readyToCleanAborts.isEmpty()) { + return readyToCleanAborts.stream().map(ci -> ThrowingRunnable.unchecked(() -> + clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE,
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855769=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855769 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 11:17 Start Date: 10/Apr/23 11:17 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161643395 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.txn.compactor.handler; + +import org.apache.hadoop.hive.common.ValidReaderWriteIdList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.metrics.MetricsConstants; +import org.apache.hadoop.hive.metastore.metrics.PerfLogger; +import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo; +import org.apache.hadoop.hive.metastore.txn.TxnStore; +import org.apache.hadoop.hive.metastore.txn.TxnUtils; +import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable; +import org.apache.hadoop.hive.ql.txn.compactor.FSRemover; +import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static java.util.Objects.isNull; + +/** + * Abort-cleanup based implementation of TaskHandler. + * Provides implementation of creation of abort clean tasks. + */ +class AbortedTxnCleaner extends AcidTxnCleaner { + + private static final Logger LOG = LoggerFactory.getLogger(AbortedTxnCleaner.class.getName()); + + public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler, + MetadataCache metadataCache, boolean metricsEnabled, + FSRemover fsRemover) { +super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover); + } + + /** + The following cleanup is based on the following idea - + 1. Aborted cleanup is independent of compaction. This is because directories which are written by + aborted txns are not visible by any open txns. It is only visible while determining the AcidState (which + only sees the aborted deltas and does not read the file). + + The following algorithm is used to clean the set of aborted directories - + a. Find the list of entries which are suitable for cleanup (This is done in {@link TxnStore#findReadyToCleanForAborts(long, int)}). + b. If the table/partition does not exist, then remove the associated aborted entry in TXN_COMPONENTS table. + c. Get the AcidState of the table by using the min open txnID, database name, tableName, partition name, highest write ID + d. Fetch the aborted directories and delete the directories. + e. Fetch the aborted write IDs from the AcidState and use it to delete the associated metadata in the TXN_COMPONENTS table. + **/ + @Override + public List getTasks() throws MetaException { +int abortedThreshold = HiveConf.getIntVar(conf, + HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD); +long abortedTimeThreshold = HiveConf + .getTimeVar(conf, HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD, + TimeUnit.MILLISECONDS); +List readyToCleanAborts = txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold); + +if (!readyToCleanAborts.isEmpty()) { + return readyToCleanAborts.stream().map(ci -> ThrowingRunnable.unchecked(() -> + clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE,
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855768=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855768 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 11:14 Start Date: 10/Apr/23 11:14 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161641913 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.txn.compactor.handler; + +import org.apache.hadoop.hive.common.ValidReaderWriteIdList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.metrics.MetricsConstants; +import org.apache.hadoop.hive.metastore.metrics.PerfLogger; +import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo; +import org.apache.hadoop.hive.metastore.txn.TxnStore; +import org.apache.hadoop.hive.metastore.txn.TxnUtils; +import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable; +import org.apache.hadoop.hive.ql.txn.compactor.FSRemover; +import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static java.util.Objects.isNull; + +/** + * Abort-cleanup based implementation of TaskHandler. + * Provides implementation of creation of abort clean tasks. + */ +class AbortedTxnCleaner extends TaskHandler { + + private static final Logger LOG = LoggerFactory.getLogger(AbortedTxnCleaner.class.getName()); + + public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler, + MetadataCache metadataCache, boolean metricsEnabled, + FSRemover fsRemover) { +super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover); + } + + /** + The following cleanup is based on the following idea - + 1. Aborted cleanup is independent of compaction. This is because directories which are written by + aborted txns are not visible by any open txns. It is only visible while determining the AcidState (which + only sees the aborted deltas and does not read the file). + + The following algorithm is used to clean the set of aborted directories - + a. Find the list of entries which are suitable for cleanup (This is done in {@link TxnStore#findReadyToCleanForAborts(long, int)}). + b. If the table/partition does not exist, then remove the associated aborted entry in TXN_COMPONENTS table. + c. Get the AcidState of the table by using the min open txnID, database name, tableName, partition name, highest write ID + d. Fetch the aborted directories and delete the directories. + e. Fetch the aborted write IDs from the AcidState and use it to delete the associated metadata in the TXN_COMPONENTS table. + **/ + @Override + public List getTasks() throws MetaException { +int abortedThreshold = HiveConf.getIntVar(conf, + HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD); +long abortedTimeThreshold = HiveConf + .getTimeVar(conf, HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD, + TimeUnit.MILLISECONDS); +List readyToCleanAborts = txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold); + +if (!readyToCleanAborts.isEmpty()) { + return readyToCleanAborts.stream().map(ci -> ThrowingRunnable.unchecked(() -> + clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE,
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855764=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855764 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 11:06 Start Date: 10/Apr/23 11:06 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161638142 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.txn.compactor.handler; + +import org.apache.hadoop.hive.common.ValidReaderWriteIdList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.metrics.MetricsConstants; +import org.apache.hadoop.hive.metastore.metrics.PerfLogger; +import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo; +import org.apache.hadoop.hive.metastore.txn.TxnStore; +import org.apache.hadoop.hive.metastore.txn.TxnUtils; +import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable; +import org.apache.hadoop.hive.ql.txn.compactor.FSRemover; +import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static java.util.Objects.isNull; + +/** + * Abort-cleanup based implementation of TaskHandler. + * Provides implementation of creation of abort clean tasks. + */ +class AbortedTxnCleaner extends TaskHandler { + + private static final Logger LOG = LoggerFactory.getLogger(AbortedTxnCleaner.class.getName()); + + public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler, + MetadataCache metadataCache, boolean metricsEnabled, + FSRemover fsRemover) { +super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover); + } + + /** + The following cleanup is based on the following idea - + 1. Aborted cleanup is independent of compaction. This is because directories which are written by + aborted txns are not visible by any open txns. It is only visible while determining the AcidState (which + only sees the aborted deltas and does not read the file). + + The following algorithm is used to clean the set of aborted directories - + a. Find the list of entries which are suitable for cleanup (This is done in {@link TxnStore#findReadyToCleanForAborts(long, int)}). + b. If the table/partition does not exist, then remove the associated aborted entry in TXN_COMPONENTS table. + c. Get the AcidState of the table by using the min open txnID, database name, tableName, partition name, highest write ID + d. Fetch the aborted directories and delete the directories. + e. Fetch the aborted write IDs from the AcidState and use it to delete the associated metadata in the TXN_COMPONENTS table. + **/ + @Override + public List getTasks() throws MetaException { +int abortedThreshold = HiveConf.getIntVar(conf, + HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD); +long abortedTimeThreshold = HiveConf + .getTimeVar(conf, HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD, + TimeUnit.MILLISECONDS); +List readyToCleanAborts = txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold); + +if (!readyToCleanAborts.isEmpty()) { + return readyToCleanAborts.stream().map(ci -> ThrowingRunnable.unchecked(() -> + clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE,
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855763 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 11:04 Start Date: 10/Apr/23 11:04 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161636770 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/handler/AbortedTxnCleaner.java: ## @@ -0,0 +1,168 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.txn.compactor.handler; + +import org.apache.hadoop.hive.common.ValidReaderWriteIdList; +import org.apache.hadoop.hive.common.ValidTxnList; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.metastore.api.MetaException; +import org.apache.hadoop.hive.metastore.api.Partition; +import org.apache.hadoop.hive.metastore.api.Table; +import org.apache.hadoop.hive.metastore.metrics.MetricsConstants; +import org.apache.hadoop.hive.metastore.metrics.PerfLogger; +import org.apache.hadoop.hive.metastore.txn.AcidTxnInfo; +import org.apache.hadoop.hive.metastore.txn.TxnStore; +import org.apache.hadoop.hive.metastore.txn.TxnUtils; +import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil; +import org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil.ThrowingRunnable; +import org.apache.hadoop.hive.ql.txn.compactor.FSRemover; +import org.apache.hadoop.hive.ql.txn.compactor.MetadataCache; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static java.util.Objects.isNull; + +/** + * Abort-cleanup based implementation of TaskHandler. + * Provides implementation of creation of abort clean tasks. + */ +class AbortedTxnCleaner extends TaskHandler { + + private static final Logger LOG = LoggerFactory.getLogger(AbortedTxnCleaner.class.getName()); + + public AbortedTxnCleaner(HiveConf conf, TxnStore txnHandler, + MetadataCache metadataCache, boolean metricsEnabled, + FSRemover fsRemover) { +super(conf, txnHandler, metadataCache, metricsEnabled, fsRemover); + } + + /** + The following cleanup is based on the following idea - + 1. Aborted cleanup is independent of compaction. This is because directories which are written by + aborted txns are not visible by any open txns. It is only visible while determining the AcidState (which + only sees the aborted deltas and does not read the file). + + The following algorithm is used to clean the set of aborted directories - + a. Find the list of entries which are suitable for cleanup (This is done in {@link TxnStore#findReadyToCleanForAborts(long, int)}). + b. If the table/partition does not exist, then remove the associated aborted entry in TXN_COMPONENTS table. + c. Get the AcidState of the table by using the min open txnID, database name, tableName, partition name, highest write ID + d. Fetch the aborted directories and delete the directories. + e. Fetch the aborted write IDs from the AcidState and use it to delete the associated metadata in the TXN_COMPONENTS table. + **/ + @Override + public List getTasks() throws MetaException { +int abortedThreshold = HiveConf.getIntVar(conf, + HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD); +long abortedTimeThreshold = HiveConf + .getTimeVar(conf, HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD, + TimeUnit.MILLISECONDS); +List readyToCleanAborts = txnHandler.findReadyToCleanForAborts(abortedTimeThreshold, abortedThreshold); + +if (!readyToCleanAborts.isEmpty()) { + return readyToCleanAborts.stream().map(ci -> ThrowingRunnable.unchecked(() -> + clean(ci, ci.txnId > 0 ? ci.txnId : Long.MAX_VALUE,
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855762 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 10:58 Start Date: 10/Apr/23 10:58 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161633548 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java: ## @@ -162,31 +162,33 @@ public Set findPotentialCompactions(int abortedThreshold, } rs.close(); -// Check for aborted txns: number of aborted txns past threshold and age of aborted txns -// past time threshold -boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0; -String sCheckAborted = "SELECT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\", " + - "MIN(\"TXN_STARTED\"), COUNT(*) FROM \"TXNS\", \"TXN_COMPONENTS\" " + - " WHERE \"TXN_ID\" = \"TC_TXNID\" AND \"TXN_STATE\" = " + TxnStatus.ABORTED + " " + - "GROUP BY \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\" " + - (checkAbortedTimeThreshold ? "" : " HAVING COUNT(*) > " + abortedThreshold); - -LOG.debug("Going to execute query <{}>", sCheckAborted); -rs = stmt.executeQuery(sCheckAborted); -long systemTime = System.currentTimeMillis(); -while (rs.next()) { - boolean pastTimeThreshold = - checkAbortedTimeThreshold && rs.getLong(4) + abortedTimeThreshold < systemTime; - int numAbortedTxns = rs.getInt(5); - if (numAbortedTxns > abortedThreshold || pastTimeThreshold) { -CompactionInfo info = new CompactionInfo(); -info.dbname = rs.getString(1); -info.tableName = rs.getString(2); -info.partName = rs.getString(3); -info.tooManyAborts = numAbortedTxns > abortedThreshold; -info.hasOldAbort = pastTimeThreshold; -LOG.debug("Found potential compaction: {}", info); -response.add(info); +if (!MetastoreConf.getBoolVar(conf, ConfVars.COMPACTOR_CLEAN_ABORTS_USING_CLEANER)) { Review Comment: no need for that, leads to code duplication ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java: ## @@ -464,6 +466,54 @@ public List findReadyToClean(long minOpenTxnWaterMark, long rete } } + @Override + @RetrySemantics.ReadOnly + public List findReadyToCleanForAborts(long abortedTimeThreshold, int abortedThreshold) throws MetaException { Review Comment: rename `findReadyToCleanAborts` Issue Time Tracking --- Worklog Id: (was: 855762) Time Spent: 6h 50m (was: 6h 40m) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 6h 50m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855761=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855761 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 10:53 Start Date: 10/Apr/23 10:53 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161630372 ## standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java: ## @@ -649,6 +649,10 @@ public enum ConfVars { COMPACTOR_CLEANER_TABLECACHE_ON("metastore.compactor.cleaner.tablecache.on", "hive.compactor.cleaner.tablecache.on", true, "Enable table caching in the cleaner. Currently the cache is cleaned after each cycle."), + COMPACTOR_CLEAN_ABORTS_USING_CLEANER("metastore.compactor.clean.aborts.using.cleaner", "hive.compactor.clean.aborts.using.cleaner", true, Review Comment: no need for extra config Issue Time Tracking --- Worklog Id: (was: 855761) Time Spent: 6h 40m (was: 6.5h) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 6h 40m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855760 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 10:51 Start Date: 10/Apr/23 10:51 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161629018 ## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java: ## @@ -61,12 +60,10 @@ public void init(AtomicBoolean stop) throws Exception { cleanerExecutor = CompactorUtil.createExecutorWithThreadFactory( conf.getIntVar(HiveConf.ConfVars.HIVE_COMPACTOR_CLEANER_THREADS_NUM), COMPACTOR_CLEANER_THREAD_NAME_FORMAT); -if (CollectionUtils.isEmpty(cleanupHandlers)) { - FSRemover fsRemover = new FSRemover(conf, ReplChangeManager.getInstance(conf), metadataCache); - cleanupHandlers = TaskHandlerFactory.getInstance() - .getHandlers(conf, txnHandler, metadataCache, - metricsEnabled, fsRemover); -} +FSRemover fsRemover = new FSRemover(conf, ReplChangeManager.getInstance(conf), metadataCache); +cleanupHandlers = TaskHandlerFactory.getInstance() +.getHandlers(conf, txnHandler, metadataCache, +metricsEnabled, fsRemover); Review Comment: could we move this to above line Issue Time Tracking --- Worklog Id: (was: 855760) Time Spent: 6.5h (was: 6h 20m) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 6.5h > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855759=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855759 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 10:49 Start Date: 10/Apr/23 10:49 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161627935 ## itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactorWithAbortCleanupUsingCompactionCycle.java: ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hive.ql.txn.compactor; + +import org.apache.hadoop.hive.metastore.conf.MetastoreConf; +import org.apache.hadoop.hive.ql.txn.compactor.TestCompactor; +import org.junit.Before; + +public class TestCompactorWithAbortCleanupUsingCompactionCycle extends TestCompactor { Review Comment: i don't think that should be supported any more Issue Time Tracking --- Worklog Id: (was: 855759) Time Spent: 6h 20m (was: 6h 10m) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 6h 20m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855758=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855758 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 10:48 Start Date: 10/Apr/23 10:48 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161627204 ## itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactorBase.java: ## @@ -89,6 +89,7 @@ public void setup() throws Exception { hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTIMIZEMETADATAQUERIES, false); MetastoreConf.setBoolVar(hiveConf, MetastoreConf.ConfVars.COMPACTOR_INITIATOR_ON, true); MetastoreConf.setBoolVar(hiveConf, MetastoreConf.ConfVars.COMPACTOR_CLEANER_ON, true); +MetastoreConf.setBoolVar(hiveConf, MetastoreConf.ConfVars.COMPACTOR_CLEAN_ABORTS_USING_CLEANER, false); Review Comment: why do we need this config? let's keep it simple Issue Time Tracking --- Worklog Id: (was: 855758) Time Spent: 6h 10m (was: 6h) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 6h 10m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855757 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 10:46 Start Date: 10/Apr/23 10:46 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161625343 ## common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: ## @@ -3273,11 +3273,11 @@ public static enum ConfVars { HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD("hive.compactor.abortedtxn.threshold", 1000, "Number of aborted transactions involving a given table or partition that will trigger\n" + -"a major compaction."), +"a major compaction / cleanup of aborted directories."), Review Comment: would it actually trigger compaction? PS: we should deprecate. this config on HS2 side Issue Time Tracking --- Worklog Id: (was: 855757) Time Spent: 6h (was: 5h 50m) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 6h > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855756=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855756 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 10:45 Start Date: 10/Apr/23 10:45 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161625472 ## common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: ## @@ -3273,11 +3273,11 @@ public static enum ConfVars { HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD("hive.compactor.abortedtxn.threshold", 1000, "Number of aborted transactions involving a given table or partition that will trigger\n" + -"a major compaction."), +"a major compaction / cleanup of aborted directories."), HIVE_COMPACTOR_ABORTEDTXN_TIME_THRESHOLD("hive.compactor.aborted.txn.time.threshold", "12h", new TimeValidator(TimeUnit.HOURS), -"Age of table/partition's oldest aborted transaction when compaction will be triggered. " + +"Age of table/partition's oldest aborted transaction when compaction / cleanup of aborted directories will be triggered. " + Review Comment: same as above Issue Time Tracking --- Worklog Id: (was: 855756) Time Spent: 5h 50m (was: 5h 40m) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 5h 50m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27020) Implement a separate handler to handle aborted transaction cleanup
[ https://issues.apache.org/jira/browse/HIVE-27020?focusedWorklogId=855755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855755 ] ASF GitHub Bot logged work on HIVE-27020: - Author: ASF GitHub Bot Created on: 10/Apr/23 10:45 Start Date: 10/Apr/23 10:45 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #4091: URL: https://github.com/apache/hive/pull/4091#discussion_r1161625343 ## common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: ## @@ -3273,11 +3273,11 @@ public static enum ConfVars { HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD("hive.compactor.abortedtxn.threshold", 1000, "Number of aborted transactions involving a given table or partition that will trigger\n" + -"a major compaction."), +"a major compaction / cleanup of aborted directories."), Review Comment: would it actually trigger compaction? Issue Time Tracking --- Worklog Id: (was: 855755) Time Spent: 5h 40m (was: 5.5h) > Implement a separate handler to handle aborted transaction cleanup > -- > > Key: HIVE-27020 > URL: https://issues.apache.org/jira/browse/HIVE-27020 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 5h 40m > Remaining Estimate: 0h > > As described in the parent task, once the cleaner is separated into different > entities, implement a separate handler which can create requests for aborted > transactions cleanup. This would move the aborted transaction cleanup > exclusively to the cleaner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855729=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855729 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 10/Apr/23 08:13 Start Date: 10/Apr/23 08:13 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1161530815 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); +switch (statsSource) { + case ICEBERG: +// Place holder for iceberg stats +break; + case PUFFIN: +String snapshotId = table.name() + table.currentSnapshot().snapshotId(); +String statsPath = table.location() + STATS + snapshotId; +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0); + Map> collect = + Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first, + blobMetadataByteBufferPair -> SerializationUtils.deserialize( + ByteBuffers.toByteArray(blobMetadataByteBufferPair.second(); + + return collect.entrySet().stream().iterator().next().getValue().get(0).getStatsObj(); +} catch (IOException e) { + LOG.info(String.valueOf(e)); +} +break; + default: +// fall back to metastore +} +return null; + } + + + @Override + public boolean setColStatistics(org.apache.hadoop.hive.ql.metadata.Table table, + List colStats) { +TableDesc tableDesc = Utilities.getTableDesc(table); +Table tbl = Catalogs.loadTable(conf, tableDesc.getProperties()); +String snapshotId = tbl.name() + tbl.currentSnapshot().snapshotId(); +byte[] serializeColStats = SerializationUtils.serialize((Serializable) colStats); Review Comment: We are checking if the colStats is empty here. https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/ColStatsProcessor.java#L208 Issue Time Tracking --- Worklog Id: (was: 855729) Time Spent: 7h 20m (was: 7h 10m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 7h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27208) Iceberg: Add support for rename table
[ https://issues.apache.org/jira/browse/HIVE-27208?focusedWorklogId=855728=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855728 ] ASF GitHub Bot logged work on HIVE-27208: - Author: ASF GitHub Bot Created on: 10/Apr/23 07:59 Start Date: 10/Apr/23 07:59 Worklog Time Spent: 10m Work Description: sonarcloud[bot] commented on PR #4185: URL: https://github.com/apache/hive/pull/4185#issuecomment-1501521654 Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive=4185) [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=BUG) [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=VULNERABILITY) [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4185=false=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4185=false=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=4185=false=SECURITY_HOTSPOT) [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=CODE_SMELL) [4 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive=4185=false=CODE_SMELL) [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive=4185=coverage=list) No Coverage information [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive=4185=duplicated_lines_density=list) No Duplication information Issue Time Tracking --- Worklog Id: (was: 855728) Time Spent: 3h 20m (was: 3h 10m) > Iceberg: Add support for rename table > - > > Key: HIVE-27208 > URL: https://issues.apache.org/jira/browse/HIVE-27208 > Project: Hive > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > > Add support for renaming iceberg tables. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855727=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855727 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 10/Apr/23 07:53 Start Date: 10/Apr/23 07:53 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1161519355 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); +switch (statsSource) { + case ICEBERG: +// Place holder for iceberg stats +break; + case PUFFIN: +String snapshotId = table.name() + table.currentSnapshot().snapshotId(); +String statsPath = table.location() + STATS + snapshotId; +LOG.info("Using stats from puffin file at:" + statsPath); +try (PuffinReader reader = Puffin.read(table.io().newInputFile(statsPath)).build()) { + BlobMetadata blobMetadata = reader.fileMetadata().blobs().get(0); + Map> collect = + Streams.stream(reader.readAll(ImmutableList.of(blobMetadata))).collect(Collectors.toMap(Pair::first, Review Comment: Fixed. Issue Time Tracking --- Worklog Id: (was: 855727) Time Spent: 7h 10m (was: 7h) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 7h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855726=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855726 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 10/Apr/23 07:53 Start Date: 10/Apr/23 07:53 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1161519017 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; Review Comment: fixed ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { +return true; + } +} catch (IOException e) { + LOG.warn(e.getMessage()); +} + } +} +return false; + } + + @Override + public List getColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); +switch (statsSource) { + case ICEBERG: +// Place holder for iceberg stats +break; + case PUFFIN: +String snapshotId = table.name() + table.currentSnapshot().snapshotId(); +String statsPath = table.location() + STATS + snapshotId; Review Comment: done Issue Time Tracking --- Worklog Id: (was: 855726) Time Spent: 7h (was: 6h 50m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 7h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855725=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855725 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 10/Apr/23 07:52 Start Date: 10/Apr/23 07:52 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1161518779 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { Review Comment: Fixed. ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; +TableDesc tableDesc = Utilities.getTableDesc(hmsTable); +Table table = Catalogs.loadTable(conf, tableDesc.getProperties()); +if (table.currentSnapshot() != null) { + String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_COL_STATS_SOURCE).toLowerCase(); + String statsPath = table.location() + STATS + table.name() + table.currentSnapshot().snapshotId(); + if (statsSource.equals(PUFFIN)) { +try (FileSystem fs = new Path(table.location()).getFileSystem(conf)) { + if (fs.exists(new Path(statsPath))) { Review Comment: Done Issue Time Tracking --- Worklog Id: (was: 855725) Time Spent: 6h 50m (was: 6h 40m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 6h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855724=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855724 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 10/Apr/23 07:52 Start Date: 10/Apr/23 07:52 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1161518466 ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); Review Comment: Fixed. Added a default fall back to metastore. ## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -349,6 +365,96 @@ public Map getBasicStatistics(Partish partish) { return stats; } + + @Override + public boolean canSetColStatistics() { +String statsSource = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_USE_STATS_FROM).toLowerCase(); +return statsSource.equals(PUFFIN); + } + + @Override + public boolean canProvideColStatistics(org.apache.hadoop.hive.ql.metadata.Table tbl) { + +org.apache.hadoop.hive.ql.metadata.Table hmsTable = tbl; Review Comment: Fixed. Issue Time Tracking --- Worklog Id: (was: 855724) Time Spent: 6h 40m (was: 6.5h) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 6h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-27158?focusedWorklogId=855723=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855723 ] ASF GitHub Bot logged work on HIVE-27158: - Author: ASF GitHub Bot Created on: 10/Apr/23 07:51 Start Date: 10/Apr/23 07:51 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #4131: URL: https://github.com/apache/hive/pull/4131#discussion_r1161518101 ## common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: ## @@ -2207,6 +2207,8 @@ public static enum ConfVars { "Whether to use codec pool in ORC. Disable if there are bugs with codec reuse."), HIVE_USE_STATS_FROM("hive.use.stats.from","iceberg","Use stats from iceberg table snapshot for query " + "planning. This has three values metastore, puffin and iceberg"), +HIVE_COL_STATS_SOURCE("hive.col.stats.source","metastore","Use stats from puffin file for query " + Review Comment: Fixed, merged the confs to a single conf. Issue Time Tracking --- Worklog Id: (was: 855723) Time Spent: 6.5h (was: 6h 20m) > Store hive columns stats in puffin files for iceberg tables > --- > > Key: HIVE-27158 > URL: https://issues.apache.org/jira/browse/HIVE-27158 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri Govindappa >Assignee: Simhadri Govindappa >Priority: Major > Labels: pull-request-available > Time Spent: 6.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27077) upgrade hive grammar to Antlr4
[ https://issues.apache.org/jira/browse/HIVE-27077?focusedWorklogId=855720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855720 ] ASF GitHub Bot logged work on HIVE-27077: - Author: ASF GitHub Bot Created on: 10/Apr/23 07:12 Start Date: 10/Apr/23 07:12 Worklog Time Spent: 10m Work Description: zhangbutao commented on PR #4058: URL: https://github.com/apache/hive/pull/4058#issuecomment-1501487075 Antlr3 has been lost support long long ago. It will be great to upgrade to Antlr4. Antrl4 grammar is more simple and cleaner than Antlr3. But i think it is not easy to do this, there is maybe some Incompatibilities need to be fixed. Please see this ticket: https://issues.apache.org/jira/browse/HIVE-23177 @mlorek Could you please give more feadback? Thanks. Issue Time Tracking --- Worklog Id: (was: 855720) Time Spent: 2h (was: 1h 50m) > upgrade hive grammar to Antlr4 > -- > > Key: HIVE-27077 > URL: https://issues.apache.org/jira/browse/HIVE-27077 > Project: Hive > Issue Type: Improvement > Components: Parser >Reporter: Michal Lorek >Assignee: Michal Lorek >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > Introducing new module parser-v4 that hosts hive grammar defined using Antlr4. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-27209) Backport HIVE-24569: LLAP daemon leaks file descriptors/log4j appenders
[ https://issues.apache.org/jira/browse/HIVE-27209?focusedWorklogId=855697=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-855697 ] ASF GitHub Bot logged work on HIVE-27209: - Author: ASF GitHub Bot Created on: 10/Apr/23 06:01 Start Date: 10/Apr/23 06:01 Worklog Time Spent: 10m Work Description: guptanikhil007 commented on PR #4193: URL: https://github.com/apache/hive/pull/4193#issuecomment-1501431789 @sankarh This is ported from the original branch-3.1 cherry-pick not master. Issue Time Tracking --- Worklog Id: (was: 855697) Time Spent: 1h (was: 50m) > Backport HIVE-24569: LLAP daemon leaks file descriptors/log4j appenders > --- > > Key: HIVE-27209 > URL: https://issues.apache.org/jira/browse/HIVE-27209 > Project: Hive > Issue Type: Sub-task > Components: llap >Affects Versions: 2.2.0 >Reporter: Nikhil Gupta >Assignee: Nikhil Gupta >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)