[jira] [Work started] (HIVE-24609) Fix ArrayIndexOutOfBoundsException when execute full outer join
[ https://issues.apache.org/jira/browse/HIVE-24609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-24609 started by jufeng li. > Fix ArrayIndexOutOfBoundsException when execute full outer join > --- > > Key: HIVE-24609 > URL: https://issues.apache.org/jira/browse/HIVE-24609 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 3.1.0 >Reporter: jufeng li >Assignee: jufeng li >Priority: Blocker > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > here is my hive-sql: > {code:java} > select > .. > from A > full outer join B on A.id = B.id > {code} > > It can not be execute,I got an ArrayIndexOutOfBoundsException.Then I debug > HiveServer2,found when compile sql in some situation and contains full outer > join,there is an ArrayIndexOutOfBoundsException. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24609) Fix ArrayIndexOutOfBoundsException when execute full outer join
[ https://issues.apache.org/jira/browse/HIVE-24609?focusedWorklogId=533313=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533313 ] ASF GitHub Bot logged work on HIVE-24609: - Author: ASF GitHub Bot Created on: 09/Jan/21 01:39 Start Date: 09/Jan/21 01:39 Worklog Time Spent: 10m Work Description: lijufeng2016 opened a new pull request #1844: URL: https://github.com/apache/hive/pull/1844 What changes were proposed in this pull request? Add a judgement for position. Why are the changes needed? To aviod ArrayIndexOutOfBoundsException Does this PR introduce any user-facing change? No How was this patch tested? It's OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533313) Remaining Estimate: 0h Time Spent: 10m > Fix ArrayIndexOutOfBoundsException when execute full outer join > --- > > Key: HIVE-24609 > URL: https://issues.apache.org/jira/browse/HIVE-24609 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 3.1.0 >Reporter: jufeng li >Assignee: jufeng li >Priority: Blocker > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > here is my hive-sql: > {code:java} > select > .. > from A > full outer join B on A.id = B.id > {code} > > It can not be execute,I got an ArrayIndexOutOfBoundsException.Then I debug > HiveServer2,found when compile sql in some situation and contains full outer > join,there is an ArrayIndexOutOfBoundsException. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24609) Fix ArrayIndexOutOfBoundsException when execute full outer join
[ https://issues.apache.org/jira/browse/HIVE-24609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24609: -- Labels: pull-request-available (was: ) > Fix ArrayIndexOutOfBoundsException when execute full outer join > --- > > Key: HIVE-24609 > URL: https://issues.apache.org/jira/browse/HIVE-24609 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 3.1.0 >Reporter: jufeng li >Assignee: jufeng li >Priority: Blocker > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > here is my hive-sql: > {code:java} > select > .. > from A > full outer join B on A.id = B.id > {code} > > It can not be execute,I got an ArrayIndexOutOfBoundsException.Then I debug > HiveServer2,found when compile sql in some situation and contains full outer > join,there is an ArrayIndexOutOfBoundsException. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24609) Fix ArrayIndexOutOfBoundsException when execute full outer join
[ https://issues.apache.org/jira/browse/HIVE-24609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jufeng li updated HIVE-24609: - Target Version/s: 4.0.0 > Fix ArrayIndexOutOfBoundsException when execute full outer join > --- > > Key: HIVE-24609 > URL: https://issues.apache.org/jira/browse/HIVE-24609 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 3.1.0 >Reporter: jufeng li >Assignee: jufeng li >Priority: Blocker > Fix For: 4.0.0 > > > here is my hive-sql: > {code:java} > select > .. > from A > full outer join B on A.id = B.id > {code} > > It can not be execute,I got an ArrayIndexOutOfBoundsException.Then I debug > HiveServer2,found when compile sql in some situation and contains full outer > join,there is an ArrayIndexOutOfBoundsException. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24609) Fix ArrayIndexOutOfBoundsException when execute full outer join
[ https://issues.apache.org/jira/browse/HIVE-24609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jufeng li updated HIVE-24609: - Description: here is my hive-sql: {code:java} select .. from A full outer join B on A.id = B.id {code} It can not be execute,I got an ArrayIndexOutOfBoundsException.Then I debug HiveServer2,found when compile sql in some situation and contains full outer join,there is an ArrayIndexOutOfBoundsException. was: here is my hive-sql: ```sql select .. from A full outer join B on A.id = B.id ``` It can not be execute,I got an ArrayIndexOutOfBoundsException.Then I debug HiveServer2,found when compile sql in some situation and contains full outer join,there is an > Fix ArrayIndexOutOfBoundsException when execute full outer join > --- > > Key: HIVE-24609 > URL: https://issues.apache.org/jira/browse/HIVE-24609 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 3.1.0 >Reporter: jufeng li >Assignee: jufeng li >Priority: Blocker > Fix For: 4.0.0 > > > here is my hive-sql: > {code:java} > select > .. > from A > full outer join B on A.id = B.id > {code} > > It can not be execute,I got an ArrayIndexOutOfBoundsException.Then I debug > HiveServer2,found when compile sql in some situation and contains full outer > join,there is an ArrayIndexOutOfBoundsException. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24609) Fix ArrayIndexOutOfBoundsException when execute full outer join
[ https://issues.apache.org/jira/browse/HIVE-24609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jufeng li updated HIVE-24609: - Description: here is my hive-sql: ```sql select .. from A full outer join B on A.id = B.id ``` It can not be execute,I got an ArrayIndexOutOfBoundsException.Then I debug HiveServer2,found when compile sql in some situation and contains full outer join,there is an was: here is my hive-sql: ```sql select .. from A full outer join B on A.id = B.id ``` It can not be execute,I got > Fix ArrayIndexOutOfBoundsException when execute full outer join > --- > > Key: HIVE-24609 > URL: https://issues.apache.org/jira/browse/HIVE-24609 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 3.1.0 >Reporter: jufeng li >Assignee: jufeng li >Priority: Blocker > Fix For: 4.0.0 > > > here is my hive-sql: > ```sql > select > .. > from A > full outer join B on A.id = B.id > ``` > It can not be execute,I got an ArrayIndexOutOfBoundsException.Then I debug > HiveServer2,found when compile sql in some situation and contains full outer > join,there is an -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24109) Load partitions in batches for managed tables in the bootstrap phase
[ https://issues.apache.org/jira/browse/HIVE-24109?focusedWorklogId=533301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533301 ] ASF GitHub Bot logged work on HIVE-24109: - Author: ASF GitHub Bot Created on: 09/Jan/21 01:13 Start Date: 09/Jan/21 01:13 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1529: URL: https://github.com/apache/hive/pull/1529 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533301) Time Spent: 1h 50m (was: 1h 40m) > Load partitions in batches for managed tables in the bootstrap phase > > > Key: HIVE-24109 > URL: https://issues.apache.org/jira/browse/HIVE-24109 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24109.01.patch, HIVE-24109.02.patch, > HIVE-24109.03.patch, HIVE-24109.04.patch, Replication Performance > Improvements.pdf > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-16352) Ability to skip or repair out of sync blocks with HIVE at runtime
[ https://issues.apache.org/jira/browse/HIVE-16352?focusedWorklogId=533302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533302 ] ASF GitHub Bot logged work on HIVE-16352: - Author: ASF GitHub Bot Created on: 09/Jan/21 01:13 Start Date: 09/Jan/21 01:13 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1436: URL: https://github.com/apache/hive/pull/1436#issuecomment-757069854 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533302) Time Spent: 1h 20m (was: 1h 10m) > Ability to skip or repair out of sync blocks with HIVE at runtime > - > > Key: HIVE-16352 > URL: https://issues.apache.org/jira/browse/HIVE-16352 > Project: Hive > Issue Type: New Feature > Components: Avro, File Formats, Reader >Affects Versions: 3.1.2 >Reporter: Navdeep Poonia >Assignee: gabrywu >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > When a file is corrupted it raises the error java.io.IOException: Invalid > sync! with hive. > Can we have some functionality to skip or repair such blocks at runtime to > make avro more error resilient in case of data corruption. > Error: java.io.IOException: java.io.IOException: java.io.IOException: While > processing file > s3n:///navdeepp/warehouse/avro_test/354dc34474404f4bbc0d8013fc8e6e4b_42. > java.io.IOException: Invalid sync! > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:334) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24273) grouping key is case sensitive
[ https://issues.apache.org/jira/browse/HIVE-24273?focusedWorklogId=533300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533300 ] ASF GitHub Bot logged work on HIVE-24273: - Author: ASF GitHub Bot Created on: 09/Jan/21 01:13 Start Date: 09/Jan/21 01:13 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1579: URL: https://github.com/apache/hive/pull/1579#issuecomment-757069838 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533300) Time Spent: 0.5h (was: 20m) > grouping key is case sensitive > --- > > Key: HIVE-24273 > URL: https://issues.apache.org/jira/browse/HIVE-24273 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.0, 4.0.0 >Reporter: zhaolong >Assignee: zhaolong >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: 0001-fix-HIVE-24273-grouping-key-is-case-sensitive.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > grouping key is case sensitive, the follow step can reproduce > 1.create table testaa(name string, age int); > 2.select GROUPING(name) from testaa group by name; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24212) Refactor to take advantage of list* optimisations in cloud storage connectors
[ https://issues.apache.org/jira/browse/HIVE-24212?focusedWorklogId=533299=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533299 ] ASF GitHub Bot logged work on HIVE-24212: - Author: ASF GitHub Bot Created on: 09/Jan/21 01:13 Start Date: 09/Jan/21 01:13 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1538: URL: https://github.com/apache/hive/pull/1538#issuecomment-757069843 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533299) Time Spent: 20m (was: 10m) > Refactor to take advantage of list* optimisations in cloud storage connectors > - > > Key: HIVE-24212 > URL: https://issues.apache.org/jira/browse/HIVE-24212 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > https://issues.apache.org/jira/browse/HADOOP-17022, > https://issues.apache.org/jira/browse/HADOOP-17281, > https://issues.apache.org/jira/browse/HADOOP-16830 etc help in reducing > number of roundtrips to remote systems in cloud storage. > Creating this ticket to do minor refactoring to take advantage of the above > optimizations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24327) AtlasServer entity may not be present during first Atlas metadata dump
[ https://issues.apache.org/jira/browse/HIVE-24327?focusedWorklogId=533297=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533297 ] ASF GitHub Bot logged work on HIVE-24327: - Author: ASF GitHub Bot Created on: 09/Jan/21 01:13 Start Date: 09/Jan/21 01:13 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1623: URL: https://github.com/apache/hive/pull/1623#issuecomment-757069832 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533297) Time Spent: 50m (was: 40m) > AtlasServer entity may not be present during first Atlas metadata dump > -- > > Key: HIVE-24327 > URL: https://issues.apache.org/jira/browse/HIVE-24327 > Project: Hive > Issue Type: Bug >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24327.01.patch, HIVE-24327.02.patch > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24330) Automate setting permissions on cmRoot directories.
[ https://issues.apache.org/jira/browse/HIVE-24330?focusedWorklogId=533298=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533298 ] ASF GitHub Bot logged work on HIVE-24330: - Author: ASF GitHub Bot Created on: 09/Jan/21 01:13 Start Date: 09/Jan/21 01:13 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1630: URL: https://github.com/apache/hive/pull/1630#issuecomment-757069827 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533298) Time Spent: 1h (was: 50m) > Automate setting permissions on cmRoot directories. > --- > > Key: HIVE-24330 > URL: https://issues.apache.org/jira/browse/HIVE-24330 > Project: Hive > Issue Type: Bug >Reporter: Arko Sharma >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24330.01.patch, HIVE-24330.02.patch, > HIVE-24330.03.patch, HIVE-24330.04.patch, HIVE-24330.05.patch, > HIVE-24330.06.patch > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24609) Fix ArrayIndexOutOfBoundsException when execute full outer join
[ https://issues.apache.org/jira/browse/HIVE-24609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jufeng li updated HIVE-24609: - Description: here is my hive-sql: ```sql select .. from A full outer join B on A.id = B.id ``` It can not be execute,I got > Fix ArrayIndexOutOfBoundsException when execute full outer join > --- > > Key: HIVE-24609 > URL: https://issues.apache.org/jira/browse/HIVE-24609 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 3.1.0 >Reporter: jufeng li >Assignee: jufeng li >Priority: Blocker > Fix For: 4.0.0 > > > here is my hive-sql: > ```sql > select > .. > from A > full outer join B on A.id = B.id > ``` > It can not be execute,I got -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24609) Fix ArrayIndexOutOfBoundsException when execute full outer join
[ https://issues.apache.org/jira/browse/HIVE-24609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jufeng li reassigned HIVE-24609: > Fix ArrayIndexOutOfBoundsException when execute full outer join > --- > > Key: HIVE-24609 > URL: https://issues.apache.org/jira/browse/HIVE-24609 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 3.1.0 >Reporter: jufeng li >Assignee: jufeng li >Priority: Blocker > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24559) Fix some spelling issues
[ https://issues.apache.org/jira/browse/HIVE-24559?focusedWorklogId=533280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533280 ] ASF GitHub Bot logged work on HIVE-24559: - Author: ASF GitHub Bot Created on: 08/Jan/21 23:27 Start Date: 08/Jan/21 23:27 Worklog Time Spent: 10m Work Description: sunchao merged pull request #1818: URL: https://github.com/apache/hive/pull/1818 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533280) Time Spent: 1h 40m (was: 1.5h) > Fix some spelling issues > > > Key: HIVE-24559 > URL: https://issues.apache.org/jira/browse/HIVE-24559 > Project: Hive > Issue Type: Improvement >Reporter: RickyMa >Assignee: RickyMa >Priority: Trivial > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > There are some minor typos: > [https://github.com/apache/hive/pull/1805/fileshttps://github.com/apache/hive/pull/1805/fileshttps://github.com/apache/hive/blob/branch-2.3/metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L858|https://github.com/apache/hive/blob/branch-2.3/metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L858] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24559) Fix some spelling issues
[ https://issues.apache.org/jira/browse/HIVE-24559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved HIVE-24559. - Fix Version/s: 4.0.0 2.3.8 Resolution: Fixed > Fix some spelling issues > > > Key: HIVE-24559 > URL: https://issues.apache.org/jira/browse/HIVE-24559 > Project: Hive > Issue Type: Improvement >Reporter: RickyMa >Assignee: RickyMa >Priority: Trivial > Labels: pull-request-available > Fix For: 2.3.8, 4.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > There are some minor typos: > [https://github.com/apache/hive/pull/1805/fileshttps://github.com/apache/hive/pull/1805/fileshttps://github.com/apache/hive/blob/branch-2.3/metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L858|https://github.com/apache/hive/blob/branch-2.3/metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L858] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24559) Fix some spelling issues
[ https://issues.apache.org/jira/browse/HIVE-24559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned HIVE-24559: --- Assignee: RickyMa > Fix some spelling issues > > > Key: HIVE-24559 > URL: https://issues.apache.org/jira/browse/HIVE-24559 > Project: Hive > Issue Type: Improvement >Reporter: RickyMa >Assignee: RickyMa >Priority: Trivial > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > There are some minor typos: > [https://github.com/apache/hive/pull/1805/fileshttps://github.com/apache/hive/pull/1805/fileshttps://github.com/apache/hive/blob/branch-2.3/metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L858|https://github.com/apache/hive/blob/branch-2.3/metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L858] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24608) Switch back to get_table in HMS client for Hive 2.3.x
[ https://issues.apache.org/jira/browse/HIVE-24608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-24608: Summary: Switch back to get_table in HMS client for Hive 2.3.x (was: Switch back to get_table in HMS client) > Switch back to get_table in HMS client for Hive 2.3.x > - > > Key: HIVE-24608 > URL: https://issues.apache.org/jira/browse/HIVE-24608 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.3.7 >Reporter: Chao Sun >Priority: Major > > HIVE-15062 introduced a backward-incompatible change by replacing > {{get_table}} with {{get_table_req}}. As consequence, when HMS client w/ > version > 2.3 talks to a HMS w/ version < 2.3, it will get error similar to > the following: > {code} > AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable > to fetch table testpartitiondata. Invalid method name: 'get_table_req'; > {code} > Looking at HIVE-15062, the {{get_table_req}} is to introduce client-side > check for capabilities. However in branch-2.3 the check is a no-op since > there is no capability yet (it is assigned to null). Therefore, this JIRA > proposes to switch back to {{get_table}} in branch-2.3 to fix the > compatibility issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.0
[ https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=533225=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533225 ] ASF GitHub Bot logged work on HIVE-24484: - Author: ASF GitHub Bot Created on: 08/Jan/21 21:09 Start Date: 08/Jan/21 21:09 Worklog Time Spent: 10m Work Description: vyaslav commented on pull request #1742: URL: https://github.com/apache/hive/pull/1742#issuecomment-756999283 > @wangyum Stuck. > > There are two big issues here: > > 1. Hive integration tests fire up Druid, Kafka, HDFS, LLAP, etc. all in the same JVM and their 3rd party dependencies are all over the place. Using a higher version of a dependency breaks one product, but using a lower version breaks the other. To make this work well, there probably needs to be a way to launch each service in their own JVM class loader. In lieu of that, I've been trying to move the ball closer to the goal post and getting dependencies closer together. > > [apache/druid#10683](https://github.com/apache/druid/pull/10683) > [HIVE-24542](https://issues.apache.org/jira/browse/HIVE-24542) > > 1. In HDFS 3.3.0, Hadoop team introduced `ProtobufRpcEngine2` in addition to `ProtobufRpcEngine` (sigh). Some of the Hive LLAP stuff is using this Hadoop Protobuf RPC engine (`ProtobufRpcEngine`). There's some `static` logic in the protocol engines that prohibits loading both RPC engines into the same JVM at the same time, I'm not sure why. HDFS was migrated to `ProtobufRpcEngine2`. So, again, in the integration tests, when the HDFS mini cluster is loaded, version 2 of the RPC engine is loaded into the JVM. When LLAP is later loaded, it fails to start because version 1 cannot be registered at the same time. Regarding 1st, I faced the same issues in my PR for upgrade to 3.1.3 - https://github.com/apache/hive/pull/1638 But, regarding 2nd I'am curious if it would be hard to replace `ProtobufRpcEngine` with `ProtobufRpcEngine2` in Hive itself. As I understand they have upgraded from PB2 to PB3 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533225) Time Spent: 1h (was: 50m) > Upgrade Hadoop to 3.3.0 > --- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-24509) Move show specific codes under DDL and cut MetaDataFormatter classes to pieces
[ https://issues.apache.org/jira/browse/HIVE-24509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-24509 started by Miklos Gergely. - > Move show specific codes under DDL and cut MetaDataFormatter classes to pieces > -- > > Key: HIVE-24509 > URL: https://issues.apache.org/jira/browse/HIVE-24509 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 8h 20m > Remaining Estimate: 0h > > Lot of show ... specific codes are under the > org.apache.hadoop.hive.ql.metadata.formatting package which are used only by > these commands. Also the two MetaDataFormatters (JsonMetaDataFormatter, > TextMetaDataFormatter) are trying to do everything, while they contain a lot > of code duplications. Their functionalities should be put under the > directories of the appropriate show commands. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24509) Move show specific codes under DDL and cut MetaDataFormatter classes to pieces
[ https://issues.apache.org/jira/browse/HIVE-24509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely resolved HIVE-24509. --- Resolution: Fixed Pushed to master, thank you [~belugabehr]! > Move show specific codes under DDL and cut MetaDataFormatter classes to pieces > -- > > Key: HIVE-24509 > URL: https://issues.apache.org/jira/browse/HIVE-24509 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 8h 20m > Remaining Estimate: 0h > > Lot of show ... specific codes are under the > org.apache.hadoop.hive.ql.metadata.formatting package which are used only by > these commands. Also the two MetaDataFormatters (JsonMetaDataFormatter, > TextMetaDataFormatter) are trying to do everything, while they contain a lot > of code duplications. Their functionalities should be put under the > directories of the appropriate show commands. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24509) Move show specific codes under DDL and cut MetaDataFormatter classes to pieces
[ https://issues.apache.org/jira/browse/HIVE-24509?focusedWorklogId=533217=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533217 ] ASF GitHub Bot logged work on HIVE-24509: - Author: ASF GitHub Bot Created on: 08/Jan/21 20:52 Start Date: 08/Jan/21 20:52 Worklog Time Spent: 10m Work Description: miklosgergely merged pull request #1756: URL: https://github.com/apache/hive/pull/1756 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533217) Time Spent: 8h 20m (was: 8h 10m) > Move show specific codes under DDL and cut MetaDataFormatter classes to pieces > -- > > Key: HIVE-24509 > URL: https://issues.apache.org/jira/browse/HIVE-24509 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 8h 20m > Remaining Estimate: 0h > > Lot of show ... specific codes are under the > org.apache.hadoop.hive.ql.metadata.formatting package which are used only by > these commands. Also the two MetaDataFormatters (JsonMetaDataFormatter, > TextMetaDataFormatter) are trying to do everything, while they contain a lot > of code duplications. Their functionalities should be put under the > directories of the appropriate show commands. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24608) Switch back to get_table in HMS client
[ https://issues.apache.org/jira/browse/HIVE-24608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261527#comment-17261527 ] Chao Sun commented on HIVE-24608: - cc [~sershe], [~thejas] > Switch back to get_table in HMS client > -- > > Key: HIVE-24608 > URL: https://issues.apache.org/jira/browse/HIVE-24608 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.3.7 >Reporter: Chao Sun >Priority: Major > > HIVE-15062 introduced a backward-incompatible change by replacing > {{get_table}} with {{get_table_req}}. As consequence, when HMS client w/ > version > 2.3 talks to a HMS w/ version < 2.3, it will get error similar to > the following: > {code} > AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable > to fetch table testpartitiondata. Invalid method name: 'get_table_req'; > {code} > Looking at HIVE-15062, the {{get_table_req}} is to introduce client-side > check for capabilities. However in branch-2.3 the check is a no-op since > there is no capability yet (it is assigned to null). Therefore, this JIRA > proposes to switch back to {{get_table}} in branch-2.3 to fix the > compatibility issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-24603) ALTER TABLE RENAME is not modifying the location of managed table
[ https://issues.apache.org/jira/browse/HIVE-24603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-24603 started by Sai Hemanth Gantasala. > ALTER TABLE RENAME is not modifying the location of managed table > - > > Key: HIVE-24603 > URL: https://issues.apache.org/jira/browse/HIVE-24603 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The location of the managed table is not changing when the table is renamed. > This causes correctness issues as well like the following - > create table abc (id int); > insert into abc values (1); > rename table abc to def; > create table abc (id int); // This should be empty > insert into abc values (2); > select * from abc ; // now returns the 1 and 2, (ie the old results as well) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24607) Add JUnit annotation for running tests only if ports are available
[ https://issues.apache.org/jira/browse/HIVE-24607?focusedWorklogId=533157=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533157 ] ASF GitHub Bot logged work on HIVE-24607: - Author: ASF GitHub Bot Created on: 08/Jan/21 18:10 Start Date: 08/Jan/21 18:10 Worklog Time Spent: 10m Work Description: miklosgergely commented on pull request #1843: URL: https://github.com/apache/hive/pull/1843#issuecomment-756914601 How are you planning to use this annotation? I mean are there going to be tests which will be skipped if the ports are not available? That would mean that patches that are breaking some tests may be merged in the event the test that is broken is just skipped due to the used port! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533157) Time Spent: 20m (was: 10m) > Add JUnit annotation for running tests only if ports are available > -- > > Key: HIVE-24607 > URL: https://issues.apache.org/jira/browse/HIVE-24607 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Some unit tests tend to rely on some specific ports assuming that they are > available. Moreover, in some cases it is necessary to create explicitly a > socket bound to some specific port. > The goal of this Jira is to add a JUnit annotation that will run a test only > if the requested ports are available (skip it otherwise). > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24607) Add JUnit annotation for running tests only if ports are available
[ https://issues.apache.org/jira/browse/HIVE-24607?focusedWorklogId=533151=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533151 ] ASF GitHub Bot logged work on HIVE-24607: - Author: ASF GitHub Bot Created on: 08/Jan/21 17:51 Start Date: 08/Jan/21 17:51 Worklog Time Spent: 10m Work Description: zabetak opened a new pull request #1843: URL: https://github.com/apache/hive/pull/1843 ### What changes were proposed in this pull request? New JUnit annotation for running/skipping tests when ports are available/taken ### Why are the changes needed? Avoid unexpected failures in tests when ports required in tests are taken. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `mvn test -pl testutils -Dtest=TestEnabledIfPortsAvailableCondition` [WARNING] Tests run: 3, Failures: 0, Errors: 0, Skipped: 1 ``` nc -l 2001 & mvn test -pl testutils -Dtest=TestEnabledIfPortsAvailableCondition ``` [WARNING] Tests run: 3, Failures: 0, Errors: 0, Skipped: 2 ``` nc -l 5050 & mvn test -pl testutils -Dtest=TestEnabledIfPortsAvailableCondition ``` [WARNING] Tests run: 3, Failures: 0, Errors: 0, Skipped: 3 ``` nc -l 2000 & mvn test -pl testutils -Dtest=TestEnabledIfPortsAvailableCondition ``` [WARNING] Tests run: 3, Failures: 0, Errors: 0, Skipped: 3 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533151) Remaining Estimate: 0h Time Spent: 10m > Add JUnit annotation for running tests only if ports are available > -- > > Key: HIVE-24607 > URL: https://issues.apache.org/jira/browse/HIVE-24607 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Some unit tests tend to rely on some specific ports assuming that they are > available. Moreover, in some cases it is necessary to create explicitly a > socket bound to some specific port. > The goal of this Jira is to add a JUnit annotation that will run a test only > if the requested ports are available (skip it otherwise). > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24607) Add JUnit annotation for running tests only if ports are available
[ https://issues.apache.org/jira/browse/HIVE-24607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24607: -- Labels: pull-request-available (was: ) > Add JUnit annotation for running tests only if ports are available > -- > > Key: HIVE-24607 > URL: https://issues.apache.org/jira/browse/HIVE-24607 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Some unit tests tend to rely on some specific ports assuming that they are > available. Moreover, in some cases it is necessary to create explicitly a > socket bound to some specific port. > The goal of this Jira is to add a JUnit annotation that will run a test only > if the requested ports are available (skip it otherwise). > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24607) Add JUnit annotation for running tests only if ports are available
[ https://issues.apache.org/jira/browse/HIVE-24607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis reassigned HIVE-24607: -- > Add JUnit annotation for running tests only if ports are available > -- > > Key: HIVE-24607 > URL: https://issues.apache.org/jira/browse/HIVE-24607 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > Some unit tests tend to rely on some specific ports assuming that they are > available. Moreover, in some cases it is necessary to create explicitly a > socket bound to some specific port. > The goal of this Jira is to add a JUnit annotation that will run a test only > if the requested ports are available (skip it otherwise). > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24588) Run tests using specific log4j2 configuration conveniently
[ https://issues.apache.org/jira/browse/HIVE-24588?focusedWorklogId=533090=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533090 ] ASF GitHub Bot logged work on HIVE-24588: - Author: ASF GitHub Bot Created on: 08/Jan/21 15:57 Start Date: 08/Jan/21 15:57 Worklog Time Spent: 10m Work Description: zabetak opened a new pull request #1842: URL: https://github.com/apache/hive/pull/1842 ### What changes were proposed in this pull request? Add new Junit Jupiter annotation/extension. ### Why are the changes needed? Run easily unit tests using a specific log4j configuration with minimal side-effects. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `mvn test -pl testutils -Dtest=TestLog4jConfigExtension` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533090) Remaining Estimate: 0h Time Spent: 10m > Run tests using specific log4j2 configuration conveniently > -- > > Key: HIVE-24588 > URL: https://issues.apache.org/jira/browse/HIVE-24588 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In order to reproduce a problem (e.g., HIVE-24569) or validate that a log4j2 > configuration is working as expected it is necessary to run a test and > explicitly specify which configuration should be used. Moreover, after the > end of the test in question it is desirable to restore the old logging > configuration that was used before launching the test to avoid affecting the > overall logging output. > The goal of this issue is to introduce a convenient & declarative way of > running tests with log4j2 configurations based on Jupiter extensions and > annotations. The test could like below: > {code:java} > @Test > @Log4jConfig("test-log4j2.properties") > void testUseExplicitConfig() { > // Do something and assert > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24588) Run tests using specific log4j2 configuration conveniently
[ https://issues.apache.org/jira/browse/HIVE-24588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24588: -- Labels: pull-request-available (was: ) > Run tests using specific log4j2 configuration conveniently > -- > > Key: HIVE-24588 > URL: https://issues.apache.org/jira/browse/HIVE-24588 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In order to reproduce a problem (e.g., HIVE-24569) or validate that a log4j2 > configuration is working as expected it is necessary to run a test and > explicitly specify which configuration should be used. Moreover, after the > end of the test in question it is desirable to restore the old logging > configuration that was used before launching the test to avoid affecting the > overall logging output. > The goal of this issue is to introduce a convenient & declarative way of > running tests with log4j2 configurations based on Jupiter extensions and > annotations. The test could like below: > {code:java} > @Test > @Log4jConfig("test-log4j2.properties") > void testUseExplicitConfig() { > // Do something and assert > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-22318) Java.io.exception:Two readers for
[ https://issues.apache.org/jira/browse/HIVE-22318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261355#comment-17261355 ] Shubham Sharma edited comment on HIVE-22318 at 1/8/21, 3:26 PM: [~kecheung] Faced this issue today with one of our user, here is the workaround to fix this issue: * Connect with beeline * Run below property in session: {code:java} set hive.fetch.task.conversion=none{code} * Now you'll be able to run select statements over the mentioned table. * Run below statement to create a backup for the table {code:java} create table as select * from ;{code} * Once you have the backup ready, logout from session and check the backup without setting any property (check count and table consistency from data quality perspective) {code:java} select * from ;{code} * Now you can drop problem table and replace with backup table {code:java} drop table ; alter table rename to ;{code} *Note:* To avoid this issue in future, create the table with a bucketing column in DDL was (Author: shubh_init): [~kecheung] Faced this issue today with one of our user, here is the workaround to fix this issue: * Connect with beeline * Run below property in session: {code:java} set hive.fetch.task.conversion=none{code} * Now you'll be able to run select statements over the mentioned table. * Run below statement to create a backup for the table {code:java} create table as select * from ;{code} * Once you have the backup ready, logout from session and check the backup without setting any property {code:java} select * from ;{code} * Now you can drop problem table and replace with backup table {code:java} drop table ; alter table rename to ;{code} *Note:* To avoid this issue in future, create the table with a bucketing column in DDL > Java.io.exception:Two readers for > - > > Key: HIVE-22318 > URL: https://issues.apache.org/jira/browse/HIVE-22318 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2 >Affects Versions: 3.1.0 >Reporter: max_c >Priority: Major > Attachments: hiveserver2 for exception.log > > > I create a ACID table with ORC format: > > {noformat} > CREATE TABLE `some.TableA`( > >) > ROW FORMAT SERDE >'org.apache.hadoop.hive.ql.io.orc.OrcSerde' > STORED AS INPUTFORMAT >'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > OUTPUTFORMAT >'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' > TBLPROPERTIES ( >'bucketing_version'='2', >'orc.compress'='SNAPPY', >'transactional'='true', >'transactional_properties'='default'){noformat} > After executing merge into operation: > {noformat} > MERGE INTO some.TableA AS a USING (SELECT vend_no FROM some.TableB UNION ALL > SELECT vend_no FROM some.TableC) AS b ON a.vend_no=b.vend_no WHEN MATCHED > THEN DELETE > {noformat} > the problem happend(when selecting the TableA, the exception happens too): > {noformat} > java.io.IOException: java.io.IOException: Two readers for {originalWriteId: > 4, bucket: 536870912(1.0.0), row: 2434, currentWriteId 25}: new > [key={originalWriteId: 4, bucket: 536870912(1.0.0), row: 2434, currentWriteId > 25}, nextRecord={2, 4, 536870912, 2434, 25, null}, reader=Hive ORC > Reader(hdfs://hdpprod/warehouse/tablespace/managed/hive/some.db/tableA/delete_delta_015_026/bucket_1, > 9223372036854775807)], old [key={originalWriteId: 4, bucket: > 536870912(1.0.0), row: 2434, currentWriteId 25}, nextRecord={2, 4, 536870912, > 2434, 25, null}, reader=Hive ORC > Reader(hdfs://hdpprod/warehouse/tablespace/managed/hive/some.db/tableA/delete_delta_015_026/bucket_0{noformat} > Through orc_tools I scan all the > files(bucket_0,bucket_1,bucket_2) under delete_delta and find all > rows of files are the same.I think this will cause the same > key(RecordIdentifer) when scan the bucket_1 after bucket_0 but I > don't know why all the rows are the same in these bucket files. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-22318) Java.io.exception:Two readers for
[ https://issues.apache.org/jira/browse/HIVE-22318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261355#comment-17261355 ] Shubham Sharma edited comment on HIVE-22318 at 1/8/21, 3:18 PM: [~kecheung] Faced this issue today with one of our user, here is the workaround to fix this issue: * Connect with beeline * Run below property in session: {code:java} set hive.fetch.task.conversion=none{code} * Now you'll be able to run select statements over the mentioned table. * Run below statement to create a backup for the table {code:java} create table as select * from ;{code} * Once you have the backup ready, logout from session and check the backup without setting any property {code:java} select * from ;{code} * Now you can drop problem table and replace with backup table {code:java} drop table ; alter table rename to ;{code} *Note:* To avoid this issue in future, create the table with a bucketing column in DDL was (Author: shubh_init): [~kecheung] Faced this issue today with one of our user, here is the workaround to fix this issue: * Connect with beeline * Run below property in session: {code:java} set hive.fetch.task.conversion=none{code} * Now you'll be able to run select statements over the mentioned table. * Run below statement to create a backup for the table {code:java} create table as select * from ;{code} * Once you have the backup ready, logout from session and check the backup without setting any property {code:java} select * from ;{code} * Now you can drop problem table and replace with backup table {code:java} drop table ; alter table rename to ;{code} *Note:* To avoid this issue in future, create the backup table with a bucketing column in DDL > Java.io.exception:Two readers for > - > > Key: HIVE-22318 > URL: https://issues.apache.org/jira/browse/HIVE-22318 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2 >Affects Versions: 3.1.0 >Reporter: max_c >Priority: Major > Attachments: hiveserver2 for exception.log > > > I create a ACID table with ORC format: > > {noformat} > CREATE TABLE `some.TableA`( > >) > ROW FORMAT SERDE >'org.apache.hadoop.hive.ql.io.orc.OrcSerde' > STORED AS INPUTFORMAT >'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > OUTPUTFORMAT >'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' > TBLPROPERTIES ( >'bucketing_version'='2', >'orc.compress'='SNAPPY', >'transactional'='true', >'transactional_properties'='default'){noformat} > After executing merge into operation: > {noformat} > MERGE INTO some.TableA AS a USING (SELECT vend_no FROM some.TableB UNION ALL > SELECT vend_no FROM some.TableC) AS b ON a.vend_no=b.vend_no WHEN MATCHED > THEN DELETE > {noformat} > the problem happend(when selecting the TableA, the exception happens too): > {noformat} > java.io.IOException: java.io.IOException: Two readers for {originalWriteId: > 4, bucket: 536870912(1.0.0), row: 2434, currentWriteId 25}: new > [key={originalWriteId: 4, bucket: 536870912(1.0.0), row: 2434, currentWriteId > 25}, nextRecord={2, 4, 536870912, 2434, 25, null}, reader=Hive ORC > Reader(hdfs://hdpprod/warehouse/tablespace/managed/hive/some.db/tableA/delete_delta_015_026/bucket_1, > 9223372036854775807)], old [key={originalWriteId: 4, bucket: > 536870912(1.0.0), row: 2434, currentWriteId 25}, nextRecord={2, 4, 536870912, > 2434, 25, null}, reader=Hive ORC > Reader(hdfs://hdpprod/warehouse/tablespace/managed/hive/some.db/tableA/delete_delta_015_026/bucket_0{noformat} > Through orc_tools I scan all the > files(bucket_0,bucket_1,bucket_2) under delete_delta and find all > rows of files are the same.I think this will cause the same > key(RecordIdentifer) when scan the bucket_1 after bucket_0 but I > don't know why all the rows are the same in these bucket files. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-22318) Java.io.exception:Two readers for
[ https://issues.apache.org/jira/browse/HIVE-22318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261355#comment-17261355 ] Shubham Sharma edited comment on HIVE-22318 at 1/8/21, 3:18 PM: [~kecheung] Faced this issue today with one of our user, here is the workaround to fix this issue: * Connect with beeline * Run below property in session: {code:java} set hive.fetch.task.conversion=none{code} * Now you'll be able to run select statements over the mentioned table. * Run below statement to create a backup for the table {code:java} create table as select * from ;{code} * Once you have the backup ready, logout from session and check the backup without setting any property {code:java} select * from ;{code} * Now you can drop problem table and replace with backup table {code:java} drop table ; alter table rename to ;{code} *Note:* To avoid this issue in future, create the backup table with a bucketing column in DDL was (Author: shubh_init): [~kecheung] Faced this issue today with one of our user, here is the workaround to fix this issue: # Connect with beeline # Run below property in session: {code:java} set hive.fetch.task.conversion=none{code} # Now you'll be able to run select statements over the mentioned table. # Run below statement to create a backup for the table {code:java} create table as select * from ;{code} # Once you have the backup ready, logout from session and check the backup without setting any property {code:java} select * from ;{code} # Now you can drop problem table and replace with backup table {code:java} drop table ; alter table rename to ;{code} # To avoid this issue in future, create the backup table with a bucketing column in DDL > Java.io.exception:Two readers for > - > > Key: HIVE-22318 > URL: https://issues.apache.org/jira/browse/HIVE-22318 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2 >Affects Versions: 3.1.0 >Reporter: max_c >Priority: Major > Attachments: hiveserver2 for exception.log > > > I create a ACID table with ORC format: > > {noformat} > CREATE TABLE `some.TableA`( > >) > ROW FORMAT SERDE >'org.apache.hadoop.hive.ql.io.orc.OrcSerde' > STORED AS INPUTFORMAT >'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > OUTPUTFORMAT >'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' > TBLPROPERTIES ( >'bucketing_version'='2', >'orc.compress'='SNAPPY', >'transactional'='true', >'transactional_properties'='default'){noformat} > After executing merge into operation: > {noformat} > MERGE INTO some.TableA AS a USING (SELECT vend_no FROM some.TableB UNION ALL > SELECT vend_no FROM some.TableC) AS b ON a.vend_no=b.vend_no WHEN MATCHED > THEN DELETE > {noformat} > the problem happend(when selecting the TableA, the exception happens too): > {noformat} > java.io.IOException: java.io.IOException: Two readers for {originalWriteId: > 4, bucket: 536870912(1.0.0), row: 2434, currentWriteId 25}: new > [key={originalWriteId: 4, bucket: 536870912(1.0.0), row: 2434, currentWriteId > 25}, nextRecord={2, 4, 536870912, 2434, 25, null}, reader=Hive ORC > Reader(hdfs://hdpprod/warehouse/tablespace/managed/hive/some.db/tableA/delete_delta_015_026/bucket_1, > 9223372036854775807)], old [key={originalWriteId: 4, bucket: > 536870912(1.0.0), row: 2434, currentWriteId 25}, nextRecord={2, 4, 536870912, > 2434, 25, null}, reader=Hive ORC > Reader(hdfs://hdpprod/warehouse/tablespace/managed/hive/some.db/tableA/delete_delta_015_026/bucket_0{noformat} > Through orc_tools I scan all the > files(bucket_0,bucket_1,bucket_2) under delete_delta and find all > rows of files are the same.I think this will cause the same > key(RecordIdentifer) when scan the bucket_1 after bucket_0 but I > don't know why all the rows are the same in these bucket files. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22318) Java.io.exception:Two readers for
[ https://issues.apache.org/jira/browse/HIVE-22318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261355#comment-17261355 ] Shubham Sharma commented on HIVE-22318: --- [~kecheung] Faced this issue today with one of our user, here is the workaround to fix this issue: # Connect with beeline # Run below property in session: {code:java} set hive.fetch.task.conversion=none{code} # Now you'll be able to run select statements over the mentioned table. # Run below statement to create a backup for the table {code:java} create table as select * from ;{code} # Once you have the backup ready, logout from session and check the backup without setting any property {code:java} select * from ;{code} # Now you can drop problem table and replace with backup table {code:java} drop table ; alter table rename to ;{code} # To avoid this issue in future, create the backup table with a bucketing column in DDL > Java.io.exception:Two readers for > - > > Key: HIVE-22318 > URL: https://issues.apache.org/jira/browse/HIVE-22318 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2 >Affects Versions: 3.1.0 >Reporter: max_c >Priority: Major > Attachments: hiveserver2 for exception.log > > > I create a ACID table with ORC format: > > {noformat} > CREATE TABLE `some.TableA`( > >) > ROW FORMAT SERDE >'org.apache.hadoop.hive.ql.io.orc.OrcSerde' > STORED AS INPUTFORMAT >'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > OUTPUTFORMAT >'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' > TBLPROPERTIES ( >'bucketing_version'='2', >'orc.compress'='SNAPPY', >'transactional'='true', >'transactional_properties'='default'){noformat} > After executing merge into operation: > {noformat} > MERGE INTO some.TableA AS a USING (SELECT vend_no FROM some.TableB UNION ALL > SELECT vend_no FROM some.TableC) AS b ON a.vend_no=b.vend_no WHEN MATCHED > THEN DELETE > {noformat} > the problem happend(when selecting the TableA, the exception happens too): > {noformat} > java.io.IOException: java.io.IOException: Two readers for {originalWriteId: > 4, bucket: 536870912(1.0.0), row: 2434, currentWriteId 25}: new > [key={originalWriteId: 4, bucket: 536870912(1.0.0), row: 2434, currentWriteId > 25}, nextRecord={2, 4, 536870912, 2434, 25, null}, reader=Hive ORC > Reader(hdfs://hdpprod/warehouse/tablespace/managed/hive/some.db/tableA/delete_delta_015_026/bucket_1, > 9223372036854775807)], old [key={originalWriteId: 4, bucket: > 536870912(1.0.0), row: 2434, currentWriteId 25}, nextRecord={2, 4, 536870912, > 2434, 25, null}, reader=Hive ORC > Reader(hdfs://hdpprod/warehouse/tablespace/managed/hive/some.db/tableA/delete_delta_015_026/bucket_0{noformat} > Through orc_tools I scan all the > files(bucket_0,bucket_1,bucket_2) under delete_delta and find all > rows of files are the same.I think this will cause the same > key(RecordIdentifer) when scan the bucket_1 after bucket_0 but I > don't know why all the rows are the same in these bucket files. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24581) Remove AcidUtils call from OrcInputformat for non transactional tables
[ https://issues.apache.org/jira/browse/HIVE-24581?focusedWorklogId=533046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533046 ] ASF GitHub Bot logged work on HIVE-24581: - Author: ASF GitHub Bot Created on: 08/Jan/21 14:42 Start Date: 08/Jan/21 14:42 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1826: URL: https://github.com/apache/hive/pull/1826#discussion_r553982610 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -1866,92 +1870,6 @@ private static boolean isDirUsable(Path child, long visibilityTxnId, List return true; } - public static class HdfsFileStatusWithoutId implements HdfsFileStatusWithId { -private final FileStatus fs; - -public HdfsFileStatusWithoutId(FileStatus fs) { - this.fs = fs; -} - -@Override -public FileStatus getFileStatus() { - return fs; -} - -@Override -public Long getFileId() { - return null; -} - } - - /** - * Find the original files (non-ACID layout) recursively under the partition directory. - * @param fs the file system - * @param dir the directory to add - * @return the list of original files - * @throws IOException - */ - public static List findOriginals(FileSystem fs, Path dir, Ref useFileIds, - boolean ignoreEmptyFiles, boolean recursive) throws IOException { -List originals = new ArrayList<>(); -List childrenWithId = tryListLocatedHdfsStatus(useFileIds, fs, dir, hiddenFileFilter); -if (childrenWithId != null) { - for (HdfsFileStatusWithId child : childrenWithId) { -if (child.getFileStatus().isDirectory()) { - if (recursive) { -originals.addAll(findOriginals(fs, child.getFileStatus().getPath(), useFileIds, -ignoreEmptyFiles, true)); - } -} else { - if (!ignoreEmptyFiles || child.getFileStatus().getLen() > 0) { -originals.add(child); - } -} - } -} else { - List children = HdfsUtils.listLocatedStatus(fs, dir, hiddenFileFilter); - for (FileStatus child : children) { -if (child.isDirectory()) { - if (recursive) { -originals.addAll(findOriginals(fs, child.getPath(), useFileIds, ignoreEmptyFiles, true)); - } -} else { - if (!ignoreEmptyFiles || child.getLen() > 0) { Review comment: The findoriginals was called with ignoreEmpty = true always, so i removed it. If I understand correctly this parameter could be removed from every other acidutils call. It was there to handle MR, which created empty files for every bucket. https://issues.apache.org/jira/browse/HIVE-13040?focusedCommentId=15159223=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15159223 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533046) Time Spent: 0.5h (was: 20m) > Remove AcidUtils call from OrcInputformat for non transactional tables > -- > > Key: HIVE-24581 > URL: https://issues.apache.org/jira/browse/HIVE-24581 > Project: Hive > Issue Type: Improvement >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently the split generation in OrcInputformat is tightly coupled with acid > and AcidUtils.getAcidState is called even if the table is not transactional. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24602) Retry compaction after configured time
[ https://issues.apache.org/jira/browse/HIVE-24602?focusedWorklogId=533038=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533038 ] ASF GitHub Bot logged work on HIVE-24602: - Author: ASF GitHub Bot Created on: 08/Jan/21 14:23 Start Date: 08/Jan/21 14:23 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1839: URL: https://github.com/apache/hive/pull/1839#discussion_r553971662 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -1006,16 +1007,23 @@ public boolean checkFailedCompactions(CompactionInfo ci) throws MetaException { rs = pStmt.executeQuery(); int numFailed = 0; int numTotal = 0; +long lastEnqueueTime = -1; int failedThreshold = MetastoreConf.getIntVar(conf, ConfVars.COMPACTOR_INITIATOR_FAILED_THRESHOLD); while(rs.next() && ++numTotal <= failedThreshold) { + long enqueueTime = rs.getLong(2); + if (enqueueTime > lastEnqueueTime) { +lastEnqueueTime = enqueueTime; + } if(rs.getString(1).charAt(0) == FAILED_STATE) { numFailed++; } else { numFailed--; } } -return numFailed == failedThreshold; +// If the last attempt was too long ago, ignore the failed treshold and try compaction again +long retryTime = MetastoreConf.getTimeVar(conf, ConfVars.COMPACTOR_INITIATOR_FAILED_RETRY_TIME, TimeUnit.MILLISECONDS); Review comment: Still, this does not initiate anything in the conf, it is already there in memory. Just run some test on my machine the average time for a getTimeVar call is 0.48 microsecond This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533038) Time Spent: 1.5h (was: 1h 20m) > Retry compaction after configured time > -- > > Key: HIVE-24602 > URL: https://issues.apache.org/jira/browse/HIVE-24602 > Project: Hive > Issue Type: Improvement >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently if compaction fails two consecutive times it will stop compaction > forever for the given partition / table unless someone manually intervenes. > See COMPACTOR_INITIATOR_FAILED_THRESHOLD. > The Initiator should retry again after a configurable amount of time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24602) Retry compaction after configured time
[ https://issues.apache.org/jira/browse/HIVE-24602?focusedWorklogId=533033=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533033 ] ASF GitHub Bot logged work on HIVE-24602: - Author: ASF GitHub Bot Created on: 08/Jan/21 14:03 Start Date: 08/Jan/21 14:03 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #1839: URL: https://github.com/apache/hive/pull/1839#discussion_r553960649 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java ## @@ -1006,16 +1007,23 @@ public boolean checkFailedCompactions(CompactionInfo ci) throws MetaException { rs = pStmt.executeQuery(); int numFailed = 0; int numTotal = 0; +long lastEnqueueTime = -1; int failedThreshold = MetastoreConf.getIntVar(conf, ConfVars.COMPACTOR_INITIATOR_FAILED_THRESHOLD); while(rs.next() && ++numTotal <= failedThreshold) { + long enqueueTime = rs.getLong(2); + if (enqueueTime > lastEnqueueTime) { +lastEnqueueTime = enqueueTime; + } if(rs.getString(1).charAt(0) == FAILED_STATE) { numFailed++; } else { numFailed--; } } -return numFailed == failedThreshold; +// If the last attempt was too long ago, ignore the failed treshold and try compaction again +long retryTime = MetastoreConf.getTimeVar(conf, ConfVars.COMPACTOR_INITIATOR_FAILED_RETRY_TIME, TimeUnit.MILLISECONDS); Review comment: I'm not 100% convinced about the time critical part, since effectively, because of mutexing, only one Initiator thread runs on the whole system. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533033) Time Spent: 1h 20m (was: 1h 10m) > Retry compaction after configured time > -- > > Key: HIVE-24602 > URL: https://issues.apache.org/jira/browse/HIVE-24602 > Project: Hive > Issue Type: Improvement >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently if compaction fails two consecutive times it will stop compaction > forever for the given partition / table unless someone manually intervenes. > See COMPACTOR_INITIATOR_FAILED_THRESHOLD. > The Initiator should retry again after a configurable amount of time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24606) Multi-stage materialized CTEs can lost intermediate data
[ https://issues.apache.org/jira/browse/HIVE-24606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] okumin reassigned HIVE-24606: - > Multi-stage materialized CTEs can lost intermediate data > > > Key: HIVE-24606 > URL: https://issues.apache.org/jira/browse/HIVE-24606 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 3.1.2, 2.3.7, 4.0.0 >Reporter: okumin >Assignee: okumin >Priority: Major > > With complex multi-stage CTEs, Hive can start a latter stage before its > previous stage finishes. > That's because `SemanticAnalyzer#toRealRootTasks` can fail to resolve > dependency between multistage materialized CTEs when a non-materialized CTE > cuts in. > > [https://github.com/apache/hive/blob/425e1ff7c054f87c4db87e77d004282d529599ae/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L1414] > > For example, when submitting this query, > {code:sql} > SET hive.optimize.cte.materialize.threshold=2; > SET hive.optimize.cte.materialize.full.aggregate.only=false; > WITH x AS ( SELECT 'x' AS id ), -- not materialized > a1 AS ( SELECT 'a1' AS id ), -- materialized by a2 and the root > a2 AS ( SELECT 'a2 <- ' || id AS id FROM a1) -- materialized by the root > SELECT * FROM a1 > UNION ALL > SELECT * FROM x > UNION ALL > SELECT * FROM a2 > UNION ALL > SELECT * FROM a2; > {code} > `toRealRootTask` will traverse the CTEs in order of `a1`, `x`, and `a2`. It > means the dependency between `a1` and `a2` will be ignored and `a2` can start > without waiting for `a1`. As a result, the above query returns the following > result. > {code:java} > +-+ > | id | > +-+ > | a1 | > | x | > +-+ > {code} > For your information, I ran this test with revision = > 425e1ff7c054f87c4db87e77d004282d529599ae. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23553) Upgrade ORC version to 1.6.6
[ https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=533018=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533018 ] ASF GitHub Bot logged work on HIVE-23553: - Author: ASF GitHub Bot Created on: 08/Jan/21 13:38 Start Date: 08/Jan/21 13:38 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #1823: URL: https://github.com/apache/hive/pull/1823#issuecomment-756759959 the thing is that before I added that repo I pressed the test button which reported the error - because I think it also does the same for blackout detection - it would have probably not worked... I've added the repsy to the artifactory and added to the virtual repo precommit uses This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533018) Time Spent: 2h 20m (was: 2h 10m) > Upgrade ORC version to 1.6.6 > > > Key: HIVE-23553 > URL: https://issues.apache.org/jira/browse/HIVE-23553 > Project: Hive > Issue Type: Improvement >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > Apache Hive is currently on 1.5.X version and in order to take advantage of > the latest ORC improvements such as column encryption we have to bump to > 1.6.X. > https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin > Even though ORC reader could work out of the box, HIVE LLAP is heavily > depending on internal ORC APIs e.g., to retrieve and store File Footers, > Tails, streams – un/compress RG data etc. As there ware many internal changes > from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the > upgrade is not straightforward. > This Umbrella Jira tracks this upgrade effort. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24565) Implement standard trim function
[ https://issues.apache.org/jira/browse/HIVE-24565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa resolved HIVE-24565. --- Resolution: Fixed Pushed to master. Thanks [~jcamachorodriguez], [~kgyrtkirk] for review. > Implement standard trim function > > > Key: HIVE-24565 > URL: https://issues.apache.org/jira/browse/HIVE-24565 > Project: Hive > Issue Type: Improvement > Components: Parser, UDF >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > {code} > ::= > TRIM > ::= > [ [ ] [ ] FROM ] > ::= > > ::= > LEADING > | TRAILING > | BOTH > ::= > > {code} > Example > {code} > SELECT TRIM(LEADING '0' FROM '000123'); > 123 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24565) Implement standard trim function
[ https://issues.apache.org/jira/browse/HIVE-24565?focusedWorklogId=533016=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533016 ] ASF GitHub Bot logged work on HIVE-24565: - Author: ASF GitHub Bot Created on: 08/Jan/21 13:37 Start Date: 08/Jan/21 13:37 Worklog Time Spent: 10m Work Description: kasakrisz merged pull request #1810: URL: https://github.com/apache/hive/pull/1810 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533016) Time Spent: 1h (was: 50m) > Implement standard trim function > > > Key: HIVE-24565 > URL: https://issues.apache.org/jira/browse/HIVE-24565 > Project: Hive > Issue Type: Improvement > Components: Parser, UDF >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > {code} > ::= > TRIM > ::= > [ [ ] [ ] FROM ] > ::= > > ::= > LEADING > | TRAILING > | BOTH > ::= > > {code} > Example > {code} > SELECT TRIM(LEADING '0' FROM '000123'); > 123 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24510) Vectorize compute_bit_vector
[ https://issues.apache.org/jira/browse/HIVE-24510?focusedWorklogId=533014=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533014 ] ASF GitHub Bot logged work on HIVE-24510: - Author: ASF GitHub Bot Created on: 08/Jan/21 13:35 Start Date: 08/Jan/21 13:35 Worklog Time Spent: 10m Work Description: abstractdog edited a comment on pull request #1824: URL: https://github.com/apache/hive/pull/1824#issuecomment-756758376 > I made a quick fix to allow that in early versions of this patch. Then I decided to not pursue it because I did not see the need for allowing constant argument in runtime. > > > you can still do something like: > > if compute_bit_vector: -> handle constant parameter > > We do exactly that. Not in vectorizer but earlier in `ColumnStatsSemanticAnalyzer.java `. I am reluctant to implement extra functionality or add special cases unless it is necessary. Note that compute_bit_vector is a newly added UDF in 4.0. So there is no backward compatibility concern either. > Do you see any other benefit than preserving the earlier q.out outputs? no, I'm concerned only about the qout changes you're right, if compute_bit_vector is a relatively new thing then we can also ignore backward compatibility problems and go on with compute_bit_vector_hll I would personally keep pursuing a smaller patch as having "compute_bit_vector_hll" has no benefits either, but it's up to you, I think if the default hll algo won't be changed in the near future for stats, we can go with the updated qouts :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533014) Time Spent: 3h (was: 2h 50m) > Vectorize compute_bit_vector > > > Key: HIVE-24510 > URL: https://issues.apache.org/jira/browse/HIVE-24510 > Project: Hive > Issue Type: Improvement >Reporter: Mustafa İman >Assignee: Mustafa İman >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > After https://issues.apache.org/jira/browse/HIVE-23530 , almost all compute > stats functions are vectorizable. Only function that is not vectorizable is > "compute_bit_vector" for ndv statistics computation. This causes "create > table as select" and "insert overwrite select" queries to run in > non-vectorized mode. > Even a very naive implementation of vectorized compute_bit_vector gives about > 50% performance improvement on simple "insert overwrite select" queries. That > is because entire mapper or reducer can run in vectorized mode. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24510) Vectorize compute_bit_vector
[ https://issues.apache.org/jira/browse/HIVE-24510?focusedWorklogId=533013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533013 ] ASF GitHub Bot logged work on HIVE-24510: - Author: ASF GitHub Bot Created on: 08/Jan/21 13:35 Start Date: 08/Jan/21 13:35 Worklog Time Spent: 10m Work Description: abstractdog commented on pull request #1824: URL: https://github.com/apache/hive/pull/1824#issuecomment-756758376 > I made a quick fix to allow that in early versions of this patch. Then I decided to not pursue it because I did not see the need for allowing constant argument in runtime. > > > you can still do something like: > > if compute_bit_vector: -> handle constant parameter > > We do exactly that. Not in vectorizer but earlier in `ColumnStatsSemanticAnalyzer.java `. I am reluctant to implement extra functionality or add special cases unless it is necessary. Note that compute_bit_vector is a newly added UDF in 4.0. So there is no backward compatibility concern either. > Do you see any other benefit than preserving the earlier q.out outputs? no, I'm concerned only about the qout changes you're right, if compute_bit_vector is a relatively new thing then we can also ignore backward compatibility problems and go on with compute_bit_vector_hll I would personally keep pursuing a smaller patch as having "compute_bit_vector_hll" has no benefits either, but it's up to you, I think if the default hll algo won't be changed in the near future, we can go with the updated qouts :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533013) Time Spent: 2h 50m (was: 2h 40m) > Vectorize compute_bit_vector > > > Key: HIVE-24510 > URL: https://issues.apache.org/jira/browse/HIVE-24510 > Project: Hive > Issue Type: Improvement >Reporter: Mustafa İman >Assignee: Mustafa İman >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > After https://issues.apache.org/jira/browse/HIVE-23530 , almost all compute > stats functions are vectorizable. Only function that is not vectorizable is > "compute_bit_vector" for ndv statistics computation. This causes "create > table as select" and "insert overwrite select" queries to run in > non-vectorized mode. > Even a very naive implementation of vectorized compute_bit_vector gives about > 50% performance improvement on simple "insert overwrite select" queries. That > is because entire mapper or reducer can run in vectorized mode. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24581) Remove AcidUtils call from OrcInputformat for non transactional tables
[ https://issues.apache.org/jira/browse/HIVE-24581?focusedWorklogId=533010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533010 ] ASF GitHub Bot logged work on HIVE-24581: - Author: ASF GitHub Bot Created on: 08/Jan/21 13:31 Start Date: 08/Jan/21 13:31 Worklog Time Spent: 10m Work Description: kuczoram commented on a change in pull request #1826: URL: https://github.com/apache/hive/pull/1826#discussion_r553944336 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java ## @@ -1866,92 +1870,6 @@ private static boolean isDirUsable(Path child, long visibilityTxnId, List return true; } - public static class HdfsFileStatusWithoutId implements HdfsFileStatusWithId { -private final FileStatus fs; - -public HdfsFileStatusWithoutId(FileStatus fs) { - this.fs = fs; -} - -@Override -public FileStatus getFileStatus() { - return fs; -} - -@Override -public Long getFileId() { - return null; -} - } - - /** - * Find the original files (non-ACID layout) recursively under the partition directory. - * @param fs the file system - * @param dir the directory to add - * @return the list of original files - * @throws IOException - */ - public static List findOriginals(FileSystem fs, Path dir, Ref useFileIds, - boolean ignoreEmptyFiles, boolean recursive) throws IOException { -List originals = new ArrayList<>(); -List childrenWithId = tryListLocatedHdfsStatus(useFileIds, fs, dir, hiddenFileFilter); -if (childrenWithId != null) { - for (HdfsFileStatusWithId child : childrenWithId) { -if (child.getFileStatus().isDirectory()) { - if (recursive) { -originals.addAll(findOriginals(fs, child.getFileStatus().getPath(), useFileIds, -ignoreEmptyFiles, true)); - } -} else { - if (!ignoreEmptyFiles || child.getFileStatus().getLen() > 0) { -originals.add(child); - } -} - } -} else { - List children = HdfsUtils.listLocatedStatus(fs, dir, hiddenFileFilter); - for (FileStatus child : children) { -if (child.isDirectory()) { - if (recursive) { -originals.addAll(findOriginals(fs, child.getPath(), useFileIds, ignoreEmptyFiles, true)); - } -} else { - if (!ignoreEmptyFiles || child.getLen() > 0) { Review comment: As far as I see, the ignoreEmptyFiles parameter is removed and it is not present in the newly added methods. Is this parameter not used any more? (Just want to make sure that removing it won't have any side effect.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 533010) Time Spent: 20m (was: 10m) > Remove AcidUtils call from OrcInputformat for non transactional tables > -- > > Key: HIVE-24581 > URL: https://issues.apache.org/jira/browse/HIVE-24581 > Project: Hive > Issue Type: Improvement >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Currently the split generation in OrcInputformat is tightly coupled with acid > and AcidUtils.getAcidState is called even if the table is not transactional. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24590) Operation Logging still leaks the log4j Appenders
[ https://issues.apache.org/jira/browse/HIVE-24590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261298#comment-17261298 ] Stamatis Zampetakis commented on HIVE-24590: I may have misunderstood what IdlePurgePolicy does. In my mind, I was thinking that if the appender is removed due to inactivity and then at some point in time there is a request to write to the same route then a new appender (pointing to the same file) could open without problem. > Operation Logging still leaks the log4j Appenders > - > > Key: HIVE-24590 > URL: https://issues.apache.org/jira/browse/HIVE-24590 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Eugene Chung >Assignee: Stamatis Zampetakis >Priority: Major > Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot > 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen > Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, > Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch > > > I'm using Hive 3.1.2 with options below. > * hive.server2.logging.operation.enabled=true > * hive.server2.logging.operation.level=VERBOSE > * hive.async.log.enabled=false > I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 > but HS2 still leaks log4j RandomAccessFileManager. > !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197! > I checked the operation log file which is not closed/deleted properly. > !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272! > Then there's the log, > {code:java} > client.TezClient: Shutting down Tez Session, sessionName= {code} > !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24590) Operation Logging still leaks the log4j Appenders
[ https://issues.apache.org/jira/browse/HIVE-24590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261280#comment-17261280 ] Eugene Chung commented on HIVE-24590: - > are the log messages directed to the proper files or not? I observed the situation. Sometimes logs for HDFS delegation token generated by other accounts are shown in my operation log. (I am using kerberized Hadoop & Hive.) > Operation Logging still leaks the log4j Appenders > - > > Key: HIVE-24590 > URL: https://issues.apache.org/jira/browse/HIVE-24590 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Eugene Chung >Assignee: Stamatis Zampetakis >Priority: Major > Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot > 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen > Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, > Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch > > > I'm using Hive 3.1.2 with options below. > * hive.server2.logging.operation.enabled=true > * hive.server2.logging.operation.level=VERBOSE > * hive.async.log.enabled=false > I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 > but HS2 still leaks log4j RandomAccessFileManager. > !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197! > I checked the operation log file which is not closed/deleted properly. > !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272! > Then there's the log, > {code:java} > client.TezClient: Shutting down Tez Session, sessionName= {code} > !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24510) Vectorize compute_bit_vector
[ https://issues.apache.org/jira/browse/HIVE-24510?focusedWorklogId=532991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-532991 ] ASF GitHub Bot logged work on HIVE-24510: - Author: ASF GitHub Bot Created on: 08/Jan/21 12:15 Start Date: 08/Jan/21 12:15 Worklog Time Spent: 10m Work Description: mustafaiman edited a comment on pull request #1824: URL: https://github.com/apache/hive/pull/1824#issuecomment-756725412 I made a quick fix to allow that in early versions of this patch. Then I decided to not pursue it because I did not see the need for allowing constant argument in runtime. > you can still do something like: > if compute_bit_vector: -> handle constant parameter We do exactly that. Not in vectorizer but earlier in `ColumnStatsSemanticAnalyzer.java `. I am reluctant to implement extra functionality or add special cases unless it is necessary. Note that compute_bit_vector is a newly added UDF in 4.0. So there is no backward compatibility concern either. Do you see any other benefit than preserving the earlier q.out outputs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 532991) Time Spent: 2h 40m (was: 2.5h) > Vectorize compute_bit_vector > > > Key: HIVE-24510 > URL: https://issues.apache.org/jira/browse/HIVE-24510 > Project: Hive > Issue Type: Improvement >Reporter: Mustafa İman >Assignee: Mustafa İman >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > After https://issues.apache.org/jira/browse/HIVE-23530 , almost all compute > stats functions are vectorizable. Only function that is not vectorizable is > "compute_bit_vector" for ndv statistics computation. This causes "create > table as select" and "insert overwrite select" queries to run in > non-vectorized mode. > Even a very naive implementation of vectorized compute_bit_vector gives about > 50% performance improvement on simple "insert overwrite select" queries. That > is because entire mapper or reducer can run in vectorized mode. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24510) Vectorize compute_bit_vector
[ https://issues.apache.org/jira/browse/HIVE-24510?focusedWorklogId=532990=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-532990 ] ASF GitHub Bot logged work on HIVE-24510: - Author: ASF GitHub Bot Created on: 08/Jan/21 12:15 Start Date: 08/Jan/21 12:15 Worklog Time Spent: 10m Work Description: mustafaiman commented on pull request #1824: URL: https://github.com/apache/hive/pull/1824#issuecomment-756725412 I made a quick fix to allow that in early versions of this patch. Then I decided to not pursue it because I did not see the need for allowing constant argument in runtime. > you can still do something like: > if compute_bit_vector: -> handle constant parameter We do exactly that. Not in vectorizer but earlier in `ColumnStatsSemanticAnalyzer.java `. I am reluctant to implement extra functionality or add special cases unless it is necessary. Note that compute_bit_vector is a newly added UDF in 4.0. So there is no backward compatibility concern either. Do you see any other benefit than preserving the earlier q.out outputs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 532990) Time Spent: 2.5h (was: 2h 20m) > Vectorize compute_bit_vector > > > Key: HIVE-24510 > URL: https://issues.apache.org/jira/browse/HIVE-24510 > Project: Hive > Issue Type: Improvement >Reporter: Mustafa İman >Assignee: Mustafa İman >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > After https://issues.apache.org/jira/browse/HIVE-23530 , almost all compute > stats functions are vectorizable. Only function that is not vectorizable is > "compute_bit_vector" for ndv statistics computation. This causes "create > table as select" and "insert overwrite select" queries to run in > non-vectorized mode. > Even a very naive implementation of vectorized compute_bit_vector gives about > 50% performance improvement on simple "insert overwrite select" queries. That > is because entire mapper or reducer can run in vectorized mode. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24590) Operation Logging still leaks the log4j Appenders
[ https://issues.apache.org/jira/browse/HIVE-24590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261266#comment-17261266 ] Eugene Chung edited comment on HIVE-24590 at 1/8/21, 12:11 PM: --- !Screen Shot 2021-01-08 at 21.01.40.png|width=1486,height=34! I have a question. Couldn't operation logs be idle for an hour like screenshot or days if the query time is very long? My concern is that setting the time for purge policy could be difficult. was (Author: euigeun_chung): !Screen Shot 2021-01-08 at 21.01.40.png|width=1486,height=34! I have a question. Couldn't operation logs be idle for an hour like screenshot or days if the query time is very long? > Operation Logging still leaks the log4j Appenders > - > > Key: HIVE-24590 > URL: https://issues.apache.org/jira/browse/HIVE-24590 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Eugene Chung >Assignee: Stamatis Zampetakis >Priority: Major > Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot > 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen > Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, > Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch > > > I'm using Hive 3.1.2 with options below. > * hive.server2.logging.operation.enabled=true > * hive.server2.logging.operation.level=VERBOSE > * hive.async.log.enabled=false > I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 > but HS2 still leaks log4j RandomAccessFileManager. > !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197! > I checked the operation log file which is not closed/deleted properly. > !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272! > Then there's the log, > {code:java} > client.TezClient: Shutting down Tez Session, sessionName= {code} > !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24590) Operation Logging still leaks the log4j Appenders
[ https://issues.apache.org/jira/browse/HIVE-24590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261266#comment-17261266 ] Eugene Chung commented on HIVE-24590: - !Screen Shot 2021-01-08 at 21.01.40.png|width=1486,height=34! I have a question. Couldn't operation logs be idle for an hour like screenshot or days if the query time is very long? > Operation Logging still leaks the log4j Appenders > - > > Key: HIVE-24590 > URL: https://issues.apache.org/jira/browse/HIVE-24590 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Eugene Chung >Assignee: Stamatis Zampetakis >Priority: Major > Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot > 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen > Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, > Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch > > > I'm using Hive 3.1.2 with options below. > * hive.server2.logging.operation.enabled=true > * hive.server2.logging.operation.level=VERBOSE > * hive.async.log.enabled=false > I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 > but HS2 still leaks log4j RandomAccessFileManager. > !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197! > I checked the operation log file which is not closed/deleted properly. > !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272! > Then there's the log, > {code:java} > client.TezClient: Shutting down Tez Session, sessionName= {code} > !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24590) Operation Logging still leaks the log4j Appenders
[ https://issues.apache.org/jira/browse/HIVE-24590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Chung updated HIVE-24590: Attachment: Screen Shot 2021-01-08 at 21.01.40.png > Operation Logging still leaks the log4j Appenders > - > > Key: HIVE-24590 > URL: https://issues.apache.org/jira/browse/HIVE-24590 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Eugene Chung >Assignee: Stamatis Zampetakis >Priority: Major > Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot > 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen > Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, > Screen Shot 2021-01-08 at 21.01.40.png, add_debug_log_and_trace.patch > > > I'm using Hive 3.1.2 with options below. > * hive.server2.logging.operation.enabled=true > * hive.server2.logging.operation.level=VERBOSE > * hive.async.log.enabled=false > I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 > but HS2 still leaks log4j RandomAccessFileManager. > !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197! > I checked the operation log file which is not closed/deleted properly. > !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272! > Then there's the log, > {code:java} > client.TezClient: Shutting down Tez Session, sessionName= {code} > !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24590) Operation Logging still leaks the log4j Appenders
[ https://issues.apache.org/jira/browse/HIVE-24590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261251#comment-17261251 ] Stamatis Zampetakis commented on HIVE-24590: Thanks for looking into this [~euigeun_chung]. The main problem is the leak of appenders and file descriptors and I think this can be solved by adopting an appropriate purge policy as I wrote earlier. If this happens then we may remove LogUtils.unregisterLoggingContext() altogether. If the leak is solved then the next question is: are the log messages directed to the proper files or not? Depending on the answer we may need to look on how to clear/set the log4j context. > Operation Logging still leaks the log4j Appenders > - > > Key: HIVE-24590 > URL: https://issues.apache.org/jira/browse/HIVE-24590 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Eugene Chung >Assignee: Stamatis Zampetakis >Priority: Major > Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot > 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen > Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, > add_debug_log_and_trace.patch > > > I'm using Hive 3.1.2 with options below. > * hive.server2.logging.operation.enabled=true > * hive.server2.logging.operation.level=VERBOSE > * hive.async.log.enabled=false > I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 > but HS2 still leaks log4j RandomAccessFileManager. > !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197! > I checked the operation log file which is not closed/deleted properly. > !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272! > Then there's the log, > {code:java} > client.TezClient: Shutting down Tez Session, sessionName= {code} > !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24590) Operation Logging still leaks the log4j Appenders
[ https://issues.apache.org/jira/browse/HIVE-24590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261235#comment-17261235 ] Eugene Chung edited comment on HIVE-24590 at 1/8/21, 11:29 AM: --- I've been digging it for days, and have found that MDC is not cleared. Even I call MDC.clear() at org.apache.hive.service.cli.session.HiveSessionImpl.close(), the MDC context set by LogUtils.registerLoggingContext() still exists at org.apache.hadoop.hive.ql.log.HushableRandomAccessFileAppender.createAppender(). While following the codes related to slf4j MDC, it actually calls Log4jMDCAdapter and finally accesses log4j ThreadContext. It is basically stack-based. I don't know how log4j ThreadContext works exactly right now, but the log4j MDC stacks at the point of HiveSessionImpl.close() and HushableRandomAccessFileAppender.createAppender() seems to be different. When I call log4j ThreadContext.clearAll() instead of MDC.clear()(=ThreadContext.clearMap()), HushableRandomAccessFileAppender.createAppender() is not called anymore at the time of closing session. was (Author: euigeun_chung): I've been digging it for days, and have found that MDC is not cleared. Even I call MDC.clear() at org.apache.hive.service.cli.session.HiveSessionImpl.close(), the MDC context set by LogUtils.registerLoggingContext() still exists at org.apache.hadoop.hive.ql.log.HushableRandomAccessFileAppender.createAppender(). While following the codes related to slf4j MDC, I know that it actually calls Log4jMDCAdapter and it finally uses log4j ThreadContext. It is basically stack-based. I don't know how log4j ThreadContext works exactly right now, but the log4j MDC stacks at the point of HiveSessionImpl.close() and HushableRandomAccessFileAppender.createAppender() seems to be different. When I call log4j ThreadContext.clearAll() instead of MDC.clear()(=ThreadContext.clearMap()), HushableRandomAccessFileAppender.createAppender() is not called anymore at the time of closing session. > Operation Logging still leaks the log4j Appenders > - > > Key: HIVE-24590 > URL: https://issues.apache.org/jira/browse/HIVE-24590 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Eugene Chung >Assignee: Stamatis Zampetakis >Priority: Major > Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot > 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen > Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, > add_debug_log_and_trace.patch > > > I'm using Hive 3.1.2 with options below. > * hive.server2.logging.operation.enabled=true > * hive.server2.logging.operation.level=VERBOSE > * hive.async.log.enabled=false > I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 > but HS2 still leaks log4j RandomAccessFileManager. > !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197! > I checked the operation log file which is not closed/deleted properly. > !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272! > Then there's the log, > {code:java} > client.TezClient: Shutting down Tez Session, sessionName= {code} > !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24590) Operation Logging still leaks the log4j Appenders
[ https://issues.apache.org/jira/browse/HIVE-24590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261248#comment-17261248 ] Eugene Chung commented on HIVE-24590: - I think * we can set disableThreadContextStack to true if MDC is not necessary to be stack-based. [https://logging.apache.org/log4j/2.x/manual/thread-context.html] * we can call ThreadContext.clearAll() instead of MDC.clear() at LogUtils.unregisterLoggingContext(). > Operation Logging still leaks the log4j Appenders > - > > Key: HIVE-24590 > URL: https://issues.apache.org/jira/browse/HIVE-24590 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Eugene Chung >Assignee: Stamatis Zampetakis >Priority: Major > Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot > 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen > Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, > add_debug_log_and_trace.patch > > > I'm using Hive 3.1.2 with options below. > * hive.server2.logging.operation.enabled=true > * hive.server2.logging.operation.level=VERBOSE > * hive.async.log.enabled=false > I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 > but HS2 still leaks log4j RandomAccessFileManager. > !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197! > I checked the operation log file which is not closed/deleted properly. > !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272! > Then there's the log, > {code:java} > client.TezClient: Shutting down Tez Session, sessionName= {code} > !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24590) Operation Logging still leaks the log4j Appenders
[ https://issues.apache.org/jira/browse/HIVE-24590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261235#comment-17261235 ] Eugene Chung commented on HIVE-24590: - I've been digging it for days, and have found that MDC is not cleared. Even I call MDC.clear() at org.apache.hive.service.cli.session.HiveSessionImpl.close(), the MDC context set by LogUtils.registerLoggingContext() still exists at org.apache.hadoop.hive.ql.log.HushableRandomAccessFileAppender.createAppender(). While following the codes related to slf4j MDC, I know that it actually calls Log4jMDCAdapter and it finally uses log4j ThreadContext. It is basically stack-based. I don't know how log4j ThreadContext works exactly right now, but the log4j MDC stacks at the point of HiveSessionImpl.close() and HushableRandomAccessFileAppender.createAppender() seems to be different. When I call log4j ThreadContext.clearAll() instead of MDC.clear()(=ThreadContext.clearMap()), HushableRandomAccessFileAppender.createAppender() is not called anymore at the time of closing session. > Operation Logging still leaks the log4j Appenders > - > > Key: HIVE-24590 > URL: https://issues.apache.org/jira/browse/HIVE-24590 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Eugene Chung >Assignee: Stamatis Zampetakis >Priority: Major > Attachments: Screen Shot 2021-01-06 at 18.42.05.png, Screen Shot > 2021-01-06 at 18.42.24.png, Screen Shot 2021-01-06 at 18.42.55.png, Screen > Shot 2021-01-06 at 21.38.32.png, Screen Shot 2021-01-06 at 21.47.28.png, > add_debug_log_and_trace.patch > > > I'm using Hive 3.1.2 with options below. > * hive.server2.logging.operation.enabled=true > * hive.server2.logging.operation.level=VERBOSE > * hive.async.log.enabled=false > I already know the ticket, https://issues.apache.org/jira/browse/HIVE-17128 > but HS2 still leaks log4j RandomAccessFileManager. > !Screen Shot 2021-01-06 at 18.42.05.png|width=756,height=197! > I checked the operation log file which is not closed/deleted properly. > !Screen Shot 2021-01-06 at 18.42.24.png|width=603,height=272! > Then there's the log, > {code:java} > client.TezClient: Shutting down Tez Session, sessionName= {code} > !Screen Shot 2021-01-06 at 18.42.55.png|width=1372,height=26! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-15820) comment at the head of beeline -e
[ https://issues.apache.org/jira/browse/HIVE-15820?focusedWorklogId=532906=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-532906 ] ASF GitHub Bot logged work on HIVE-15820: - Author: ASF GitHub Bot Created on: 08/Jan/21 08:44 Start Date: 08/Jan/21 08:44 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #1814: URL: https://github.com/apache/hive/pull/1814 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 532906) Time Spent: 1h (was: 50m) > comment at the head of beeline -e > - > > Key: HIVE-15820 > URL: https://issues.apache.org/jira/browse/HIVE-15820 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 1.2.1, 2.1.1 >Reporter: muxin >Assignee: Robbie Zhang >Priority: Major > Labels: patch, pull-request-available > Attachments: HIVE-15820.patch > > Time Spent: 1h > Remaining Estimate: 0h > > $ beeline -u jdbc:hive2://localhost:1 -n test -e " > > --asdfasdfasdfasdf > > select * from test_table; > > " > expected result of the above command should be all rows of test_table(same as > run in beeline interactive mode),but it does not output anything. > the cause is that -e option will read commands as one string, and in method > dispatch(String line) it calls function isComment(String line) in the first, > which using > 'lineTrimmed.startsWith("#") || lineTrimmed.startsWith("--")' > to regard commands as a comment. > two ways can be considered to fix this problem: > 1. in method initArgs(String[] args), split command by '\n' into command list > before dispatch when cl.getOptionValues('e') != null > 2. in method dispatch(String line), remove comments using this: > static String removeComments(String line) { > if (line == null || line.isEmpty()) { > return line; > } > StringBuilder builder = new StringBuilder(); > int escape = -1; > for (int index = 0; index < line.length(); index++) { > if (index < line.length() - 1 && line.charAt(index) == > line.charAt(index + 1)) { > if (escape == -1 && line.charAt(index) == '-') { > //find \n as the end of comment > index = line.indexOf('\n',index+1); > //there is no sql after this comment,so just break out > if (-1==index){ > break; > } > } > } > char letter = line.charAt(index); > if (letter == escape) { > escape = -1; // Turn escape off. > } else if (escape == -1 && (letter == '\'' || letter == '"')) { > escape = letter; // Turn escape on. > } > builder.append(letter); > } > return builder.toString(); > } > the second way can be a general solution to remove all comments start with > '--' in a sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-15820) comment at the head of beeline -e
[ https://issues.apache.org/jira/browse/HIVE-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-15820: Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) merged into master. Thank you [~robbiezhang] for fixing this and Vihang for reviewing the changes! > comment at the head of beeline -e > - > > Key: HIVE-15820 > URL: https://issues.apache.org/jira/browse/HIVE-15820 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 1.2.1, 2.1.1 >Reporter: muxin >Assignee: Robbie Zhang >Priority: Major > Labels: patch, pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-15820.patch > > Time Spent: 1h > Remaining Estimate: 0h > > $ beeline -u jdbc:hive2://localhost:1 -n test -e " > > --asdfasdfasdfasdf > > select * from test_table; > > " > expected result of the above command should be all rows of test_table(same as > run in beeline interactive mode),but it does not output anything. > the cause is that -e option will read commands as one string, and in method > dispatch(String line) it calls function isComment(String line) in the first, > which using > 'lineTrimmed.startsWith("#") || lineTrimmed.startsWith("--")' > to regard commands as a comment. > two ways can be considered to fix this problem: > 1. in method initArgs(String[] args), split command by '\n' into command list > before dispatch when cl.getOptionValues('e') != null > 2. in method dispatch(String line), remove comments using this: > static String removeComments(String line) { > if (line == null || line.isEmpty()) { > return line; > } > StringBuilder builder = new StringBuilder(); > int escape = -1; > for (int index = 0; index < line.length(); index++) { > if (index < line.length() - 1 && line.charAt(index) == > line.charAt(index + 1)) { > if (escape == -1 && line.charAt(index) == '-') { > //find \n as the end of comment > index = line.indexOf('\n',index+1); > //there is no sql after this comment,so just break out > if (-1==index){ > break; > } > } > } > char letter = line.charAt(index); > if (letter == escape) { > escape = -1; // Turn escape off. > } else if (escape == -1 && (letter == '\'' || letter == '"')) { > escape = letter; // Turn escape on. > } > builder.append(letter); > } > return builder.toString(); > } > the second way can be a general solution to remove all comments start with > '--' in a sql -- This message was sent by Atlassian Jira (v8.3.4#803005)