[jira] [Commented] (HIVE-28013) No space left on device when running precommit tests
[ https://issues.apache.org/jira/browse/HIVE-28013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814329#comment-17814329 ] Zoltan Haindrich commented on HIVE-28013: - oh sorry - I should have noticed your excellent comment there... :) seems like nothing is easy nowadays...even a build tool dependency upgrade could hold surprises! :) > No space left on device when running precommit tests > > > Key: HIVE-28013 > URL: https://issues.apache.org/jira/browse/HIVE-28013 > Project: Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Blocker > Fix For: 4.1.0 > > Attachments: orphaned_item_strategy.png > > > The Hive precommit tests fail due to lack of space. Few of the most recent > failures below: > * > http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-4744/23/console > * > http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5005/10/console > {noformat} > java.io.IOException: No space left on device > at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at > java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62) > at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113) > at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79) > at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280) > at > org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.(RiverWriter.java:109) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:560) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:537) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:520) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:444) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:97) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:315) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:279) > at > org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139) > at > jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) > at > jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28013) No space left on device when running precommit tests
[ https://issues.apache.org/jira/browse/HIVE-28013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813797#comment-17813797 ] Zoltan Haindrich commented on HIVE-28013: - that's unfortunate; if you look at split-03 in http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5019/1/pipeline/950 seems like something have crashed in the postprocess thingy: {code} [2024-01-22T19:36:53.771Z] ./standalone-metastore/metastore-server/target/surefire-reports/TEST-org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStoreZKBindHost.xml:47233.57: internal error: Huge input lookup [2024-01-22T19:36:53.771Z] [DEBUG] 2024-01-22 19:14:03.393 [Metastore-Handler-Pool: Thread-84] Persistence [2024-01-22T19:36:53.771Z] ^ {code} not sure what was having a bad day there - `xmlstarlet` ? this issue could easily explain why it have started running out of space; I think analyzing and running the post script locally on the contents of the tgz could give further details what went wrong - and could possibly help creating a solution for it > No space left on device when running precommit tests > > > Key: HIVE-28013 > URL: https://issues.apache.org/jira/browse/HIVE-28013 > Project: Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Blocker > Fix For: 4.1.0 > > Attachments: orphaned_item_strategy.png > > > The Hive precommit tests fail due to lack of space. Few of the most recent > failures below: > * > http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-4744/23/console > * > http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5005/10/console > {noformat} > java.io.IOException: No space left on device > at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at > java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62) > at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113) > at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79) > at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280) > at > org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.(RiverWriter.java:109) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:560) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:537) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:520) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:444) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:97) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:315) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:279) > at > org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139) > at > jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) > at > jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28013) No space left on device when running precommit tests
[ https://issues.apache.org/jira/browse/HIVE-28013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809345#comment-17809345 ] Zoltan Haindrich commented on HIVE-28013: - I believe the cleanup runs during the daily repo scan > I checked the sizes of builds for master from 2021 to now and I didn't see > any huge spikes. It was always around 100M as I noted in a comment above. I think those lines are about 10 *failed* tests - I think on the master there don't supposed to be failed tests :D > No space left on device when running precommit tests > > > Key: HIVE-28013 > URL: https://issues.apache.org/jira/browse/HIVE-28013 > Project: Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Blocker > Fix For: 4.1.0 > > Attachments: orphaned_item_strategy.png > > > The Hive precommit tests fail due to lack of space. Few of the most recent > failures below: > * > http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-4744/23/console > * > http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5005/10/console > {noformat} > java.io.IOException: No space left on device > at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at > java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62) > at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113) > at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79) > at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280) > at > org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.(RiverWriter.java:109) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:560) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:537) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:520) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:444) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:97) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:315) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:279) > at > org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139) > at > jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) > at > jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28013) No space left on device when running precommit tests
[ https://issues.apache.org/jira/browse/HIVE-28013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808666#comment-17808666 ] Zoltan Haindrich commented on HIVE-28013: - fyi the amount of disk used by a build was estimated earlier; and it was working according to those estimates for around 2023 Feb ; I think there might be some ballast in the builds | 2021 September | 141G | http://ci.hive.apache.org/job/space-check/100/ | | 2022 Jul | 134G | http://ci.hive.apache.org/job/space-check/400/ | | 2023 Feb | 141G | http://ci.hive.apache.org/job/space-check/600/ | | 2023 Aug | 170G | http://ci.hive.apache.org/job/space-check/800/| | 2023 Nov | 194G | http://ci.hive.apache.org/job/space-check/900/| | 2024 Jan19 | 209G | http://ci.hive.apache.org/job/space-check/950/| > No space left on device when running precommit tests > > > Key: HIVE-28013 > URL: https://issues.apache.org/jira/browse/HIVE-28013 > Project: Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Blocker > Fix For: 4.1.0 > > Attachments: orphaned_item_strategy.png > > > The Hive precommit tests fail due to lack of space. Few of the most recent > failures below: > * > http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-4744/23/console > * > http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5005/10/console > {noformat} > java.io.IOException: No space left on device > at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at > java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62) > at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113) > at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79) > at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280) > at > org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.(RiverWriter.java:109) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:560) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:537) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:520) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:444) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:97) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:315) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:279) > at > org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139) > at > jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) > at > jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28013) No space left on device when running precommit tests
[ https://issues.apache.org/jira/browse/HIVE-28013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808661#comment-17808661 ] Zoltan Haindrich commented on HIVE-28013: - there is by the way a job for checking the disk usage: [http://ci.hive.apache.org/job/space-check/] > No space left on device when running precommit tests > > > Key: HIVE-28013 > URL: https://issues.apache.org/jira/browse/HIVE-28013 > Project: Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Blocker > Fix For: 4.1.0 > > Attachments: orphaned_item_strategy.png > > > The Hive precommit tests fail due to lack of space. Few of the most recent > failures below: > * > http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-4744/23/console > * > http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5005/10/console > {noformat} > java.io.IOException: No space left on device > at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at > java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62) > at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113) > at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79) > at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280) > at > org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.(RiverWriter.java:109) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:560) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:537) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:520) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:444) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:97) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:315) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:279) > at > org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139) > at > jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) > at > jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28013) No space left on device when running precommit tests
[ https://issues.apache.org/jira/browse/HIVE-28013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808659#comment-17808659 ] Zoltan Haindrich commented on HIVE-28013: - 15 days is not so much - I would recommend to raise it back ; and look around what crap the jobs are storing I wonder how much these 10 log files cost: [https://github.com/apache/hive/blob/9c4eb96f816105560e7d4809f1d608e7eca9e523/Jenkinsfile#L366-L371] there was this PR: [https://github.com/apache/hive/pull/4732] from your notes it seems to me that 1 build which has those logs have gone from 100M a master build is usually to 1.1G: 1.1G var/jenkins_home/jobs/hive-precommit/branches/PR-4566/builds/8 1.2G var/jenkins_home/jobs/hive-precommit/branches/PR-4566/builds/27 there was a discussion about reverting - but that never landed... > No space left on device when running precommit tests > > > Key: HIVE-28013 > URL: https://issues.apache.org/jira/browse/HIVE-28013 > Project: Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Blocker > Fix For: 4.1.0 > > Attachments: orphaned_item_strategy.png > > > The Hive precommit tests fail due to lack of space. Few of the most recent > failures below: > * > http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-4744/23/console > * > http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5005/10/console > {noformat} > java.io.IOException: No space left on device > at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at > java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62) > at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113) > at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79) > at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280) > at > org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.(RiverWriter.java:109) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:560) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:537) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:520) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:444) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:97) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:315) > at > org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:279) > at > org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139) > at > jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) > at > jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27759) Include docker daemon logs in case of docker issues
[ https://issues.apache.org/jira/browse/HIVE-27759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771465#comment-17771465 ] Zoltan Haindrich commented on HIVE-27759: - hmm.. yes - that used to report more nicely; I think an `EOFException` is kinda like a tcp reset happening... I wonder how frequently this is happening? the other interesting thing is that it happened in 2 different splits; at different times: split22 * 2023-09-28T12:35:05,587 split7 * 2023-09-28T13:44:09,013 * 2023-09-28T13:44:59,934 * the last in s7 is even more odd: {code} 2023-09-28T13:46:17,934 INFO [Listener at 0.0.0.0/44773] externalDB.AbstractExternalDB: Stderr from proc: Unable to find image 'postgres:9.3' locally docker: Error response from daemon: received unexpected HTTP status: 503 Service Unavailable. {code} > Include docker daemon logs in case of docker issues > --- > > Key: HIVE-27759 > URL: https://issues.apache.org/jira/browse/HIVE-27759 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Priority: Major > > there is a test failure: > http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4753/2/tests/ > {code} > docker: Error response from daemon: Get https://registry-1.docker.io/v2/: EOF. > See 'docker run --help'. > {code} > the root cause of EOF is unknown, there might be further details somewhere > else, here is a github issue for reference (it's for mac but any ideas are > welcome): https://github.com/docker/for-mac/issues/6704 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27759) Include docker daemon logs in case of docker issues
[ https://issues.apache.org/jira/browse/HIVE-27759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771455#comment-17771455 ] Zoltan Haindrich commented on HIVE-27759: - okay; not sure - but that used to be the case around a year ago - I don't think that was fixed :) whatever causes this issue - if the images used for testing would be hosted inside the k8s cluster that would: * reduce dependency on external service * reduce external network usage * and also speed up builds that's why I think fixing this is not that interesting... > Include docker daemon logs in case of docker issues > --- > > Key: HIVE-27759 > URL: https://issues.apache.org/jira/browse/HIVE-27759 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Priority: Major > > there is a test failure: > http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4753/2/tests/ > {code} > docker: Error response from daemon: Get https://registry-1.docker.io/v2/: EOF. > See 'docker run --help'. > {code} > the root cause of EOF is unknown, there might be further details somewhere > else, here is a github issue for reference (it's for mac but any ideas are > welcome): https://github.com/docker/for-mac/issues/6704 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27718) TestMiniTezCliDriver: save application logs for failed tests
[ https://issues.apache.org/jira/browse/HIVE-27718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771402#comment-17771402 ] Zoltan Haindrich commented on HIVE-27718: - don't do this in the main job either - extend the debug job instead > TestMiniTezCliDriver: save application logs for failed tests > > > Key: HIVE-27718 > URL: https://issues.apache.org/jira/browse/HIVE-27718 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > > 1. locate tez app logs for a TestMiniTezCliDriver test > {code} > ls -laR itests/qtest/target/tmp/hive/yarn-*/hive-logDir-nm-* > {code} > 2. add them similarly to HIVE-27716 > important to note that tez app logs files are not specific to a particular > test, so we can collect those for the whole module in case of an error -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27759) Include docker daemon logs in case of docker issues
[ https://issues.apache.org/jira/browse/HIVE-27759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771400#comment-17771400 ] Zoltan Haindrich commented on HIVE-27759: - the central repo is most likely hitting back because it is reaching download count limitations... fix would be to use a local cache or separate registry during test executions > Include docker daemon logs in case of docker issues > --- > > Key: HIVE-27759 > URL: https://issues.apache.org/jira/browse/HIVE-27759 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Priority: Major > > there is a test failure: > http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4753/2/tests/ > {code} > docker: Error response from daemon: Get https://registry-1.docker.io/v2/: EOF. > See 'docker run --help'. > {code} > the root cause of EOF is unknown, there might be further details somewhere > else, here is a github issue for reference (it's for mac but any ideas are > welcome): https://github.com/docker/for-mac/issues/6704 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27719) Save heapdump in case of OOM
[ https://issues.apache.org/jira/browse/HIVE-27719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771399#comment-17771399 ] Zoltan Haindrich commented on HIVE-27719: - building something like this into the main job is kinda like a reciepie for disaster: * it could possibly take up all space on the executors * by exhausting all space it will make all jobs fail - even innoecent ones > Save heapdump in case of OOM > > > Key: HIVE-27719 > URL: https://issues.apache.org/jira/browse/HIVE-27719 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Assignee: Kokila N >Priority: Major > > This applies to 2 places: > 1. mini llap tests: 1 single JVM has everything (HS2, AM, tasks) > 2. mini tez test: tez app containers -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27758) Precommit: splits are messed up in the folders
[ https://issues.apache.org/jira/browse/HIVE-27758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-27758. - Resolution: Not A Problem > Precommit: splits are messed up in the folders > -- > > Key: HIVE-27758 > URL: https://issues.apache.org/jira/browse/HIVE-27758 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Priority: Major > Attachments: Screenshot 2023-09-29 at 9.15.22.png > > > e.g. in the screenshot below, split-07 folder contains logs for another > splits, maybe I'm getting something wrong > !Screenshot 2023-09-29 at 9.15.22.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27719) Save heapdump in case of OOM
[ https://issues.apache.org/jira/browse/HIVE-27719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771398#comment-17771398 ] Zoltan Haindrich commented on HIVE-27719: - don't do this - only in the debug job which runs only 1 test > Save heapdump in case of OOM > > > Key: HIVE-27719 > URL: https://issues.apache.org/jira/browse/HIVE-27719 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Assignee: Kokila N >Priority: Major > > This applies to 2 places: > 1. mini llap tests: 1 single JVM has everything (HS2, AM, tasks) > 2. mini tez test: tez app containers -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27717) Improve precommit logging to address flaky tests easier
[ https://issues.apache.org/jira/browse/HIVE-27717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771396#comment-17771396 ] Zoltan Haindrich commented on HIVE-27717: - why would you want to do this? use the job which reruns the same test multiple times... > Improve precommit logging to address flaky tests easier > --- > > Key: HIVE-27717 > URL: https://issues.apache.org/jira/browse/HIVE-27717 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27758) Precommit: splits are messed up in the folders
[ https://issues.apache.org/jira/browse/HIVE-27758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771395#comment-17771395 ] Zoltan Haindrich edited comment on HIVE-27758 at 10/3/23 10:07 AM: --- that shouldn't matter much; other then that it might be confusing - because the same word is reused: * the test case was split up into N parts * meanwhile the executor used M splits edit: I think I've just rephrased what you were already saying :D > the folders "split-X" belong to the kubernetes pods and "hive.cli.splitY" > packages belong to the qsplit profile logic was (Author: kgyrtkirk): that shouldn't matter much; other then that it might be confusing - because the same word is reused: * the test case was split up into N parts * meanwhile the executor used M splits > Precommit: splits are messed up in the folders > -- > > Key: HIVE-27758 > URL: https://issues.apache.org/jira/browse/HIVE-27758 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Priority: Major > Attachments: Screenshot 2023-09-29 at 9.15.22.png > > > e.g. in the screenshot below, split-07 folder contains logs for another > splits, maybe I'm getting something wrong > !Screenshot 2023-09-29 at 9.15.22.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27758) Precommit: splits are messed up in the folders
[ https://issues.apache.org/jira/browse/HIVE-27758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771395#comment-17771395 ] Zoltan Haindrich commented on HIVE-27758: - that shouldn't matter much; other then that it might be confusing - because the same word is reused: * the test case was split up into N parts * meanwhile the executor used M splits > Precommit: splits are messed up in the folders > -- > > Key: HIVE-27758 > URL: https://issues.apache.org/jira/browse/HIVE-27758 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Priority: Major > Attachments: Screenshot 2023-09-29 at 9.15.22.png > > > e.g. in the screenshot below, split-07 folder contains logs for another > splits, maybe I'm getting something wrong > !Screenshot 2023-09-29 at 9.15.22.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796
[ https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17754910#comment-17754910 ] Zoltan Haindrich commented on HIVE-26806: - made you an admin - didn't know you weren't one :D there are 2 ways to upgrade the plugin: * upgrade individually on the interface * upgrade by building a new htk-jenkins image (https://hub.docker.com/r/kgyrtkirk/htk-jenkins/tags) this second could upgrade everything from jenkins version to all plugins - since it wasn't been done for a while it might be helpfull to do that let me know if you need any help with that; I'm also on asf slack if you want to chat > Precommit tests in CI are timing out after HIVE-26796 > - > > Key: HIVE-26806 > URL: https://issues.apache.org/jira/browse/HIVE-26806 > Project: Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > http://ci.hive.apache.org/job/hive-precommit/job/master/1506/ > {noformat} > ancelling nested steps due to timeout > 15:22:08 Sending interrupt signal to process > 15:22:08 Killing processes > 15:22:09 kill finished with exit code 0 > 15:22:19 Terminated > 15:22:19 script returned exit code 143 > [Pipeline] } > [Pipeline] // withEnv > [Pipeline] } > 15:22:19 Deleting 1 temporary files > [Pipeline] // configFileProvider > [Pipeline] } > [Pipeline] // stage > [Pipeline] stage > [Pipeline] { (PostProcess) > [Pipeline] sh > [Pipeline] sh > [Pipeline] sh > [Pipeline] junit > 15:22:25 Recording test results > 15:22:32 [Checks API] No suitable checks publisher found. > [Pipeline] } > [Pipeline] // stage > [Pipeline] } > [Pipeline] // container > [Pipeline] } > [Pipeline] // node > [Pipeline] } > [Pipeline] // timeout > [Pipeline] } > [Pipeline] // podTemplate > [Pipeline] } > 15:22:32 Failed in branch split-01 > [Pipeline] // parallel > [Pipeline] } > [Pipeline] // stage > [Pipeline] stage > [Pipeline] { (Archive) > [Pipeline] podTemplate > [Pipeline] { > [Pipeline] timeout > 15:22:33 Timeout set to expire in 6 hr 0 min > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796
[ https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752828#comment-17752828 ] Zoltan Haindrich commented on HIVE-26806: - seems like there is a helpfull feature in the parallel-test-executor https://github.com/jenkinsci/parallel-test-executor-plugin/commit/c9145a5f849f01d6e99c2240eb51d9aaf283ef6a upgrade to >380 could make this go away > Precommit tests in CI are timing out after HIVE-26796 > - > > Key: HIVE-26806 > URL: https://issues.apache.org/jira/browse/HIVE-26806 > Project: Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > http://ci.hive.apache.org/job/hive-precommit/job/master/1506/ > {noformat} > ancelling nested steps due to timeout > 15:22:08 Sending interrupt signal to process > 15:22:08 Killing processes > 15:22:09 kill finished with exit code 0 > 15:22:19 Terminated > 15:22:19 script returned exit code 143 > [Pipeline] } > [Pipeline] // withEnv > [Pipeline] } > 15:22:19 Deleting 1 temporary files > [Pipeline] // configFileProvider > [Pipeline] } > [Pipeline] // stage > [Pipeline] stage > [Pipeline] { (PostProcess) > [Pipeline] sh > [Pipeline] sh > [Pipeline] sh > [Pipeline] junit > 15:22:25 Recording test results > 15:22:32 [Checks API] No suitable checks publisher found. > [Pipeline] } > [Pipeline] // stage > [Pipeline] } > [Pipeline] // container > [Pipeline] } > [Pipeline] // node > [Pipeline] } > [Pipeline] // timeout > [Pipeline] } > [Pipeline] // podTemplate > [Pipeline] } > 15:22:32 Failed in branch split-01 > [Pipeline] // parallel > [Pipeline] } > [Pipeline] // stage > [Pipeline] stage > [Pipeline] { (Archive) > [Pipeline] podTemplate > [Pipeline] { > [Pipeline] timeout > 15:22:33 Timeout set to expire in 6 hr 0 min > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26605) Remove reviewer pattern
[ https://issues.apache.org/jira/browse/HIVE-26605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-26605. - Resolution: Fixed > Remove reviewer pattern > --- > > Key: HIVE-26605 > URL: https://issues.apache.org/jira/browse/HIVE-26605 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] (HIVE-26978) Stale "Runtime stats" causes poor query planning
[ https://issues.apache.org/jira/browse/HIVE-26978 ] Zoltan Haindrich deleted comment on HIVE-26978: - was (Author: kgyrtkirk): have you restarted the HS2? the runtime stats are cached there; the metastore only stores them > Stale "Runtime stats" causes poor query planning > > > Key: HIVE-26978 > URL: https://issues.apache.org/jira/browse/HIVE-26978 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Rajesh Balamohan >Priority: Major > Labels: performance > Attachments: Screenshot 2023-01-24 at 10.23.16 AM.png > > > * Runtime stats can be stored in hiveserver or in metastore via > "hive.query.reexecution.stats.persist.scope". > * Though the table is dropped and recreated, it ends up showing old stats > via "RUNTIME" stats. Here is an example (note that the table is empty, but > gets datasize and numRows from RUNTIME stats) > * This causes suboptimal plan for "MERGE INTO" queries by creating > CUSTOM_EDGE instead of broadcast edge. > !Screenshot 2023-01-24 at 10.23.16 AM.png|width=2053,height=753! > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26978) Stale "Runtime stats" causes poor query planning
[ https://issues.apache.org/jira/browse/HIVE-26978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17726585#comment-17726585 ] Zoltan Haindrich commented on HIVE-26978: - have you restarted the HS2? the runtime stats are cached there; the metastore only stores them > Stale "Runtime stats" causes poor query planning > > > Key: HIVE-26978 > URL: https://issues.apache.org/jira/browse/HIVE-26978 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Rajesh Balamohan >Priority: Major > Labels: performance > Attachments: Screenshot 2023-01-24 at 10.23.16 AM.png > > > * Runtime stats can be stored in hiveserver or in metastore via > "hive.query.reexecution.stats.persist.scope". > * Though the table is dropped and recreated, it ends up showing old stats > via "RUNTIME" stats. Here is an example (note that the table is empty, but > gets datasize and numRows from RUNTIME stats) > * This causes suboptimal plan for "MERGE INTO" queries by creating > CUSTOM_EDGE instead of broadcast edge. > !Screenshot 2023-01-24 at 10.23.16 AM.png|width=2053,height=753! > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-23691) TestMiniLlapLocalCliDriver#testCliDriver[schq_materialized] is flaky
[ https://issues.apache.org/jira/browse/HIVE-23691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-23691: --- Assignee: KIRTI RUGE (was: Zoltan Haindrich) > TestMiniLlapLocalCliDriver#testCliDriver[schq_materialized] is flaky > > > Key: HIVE-23691 > URL: https://issues.apache.org/jira/browse/HIVE-23691 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: KIRTI RUGE >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > http://34.66.156.144:8080/job/hive-precommit/job/master/39/testReport/junit/org.apache.hadoop.hive.cli.split20/TestMiniLlapLocalCliDriver/Testing___split_13___Archive___testCliDriver_schq_materialized_/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26984) Deprecate public HiveConf constructors
[ https://issues.apache.org/jira/browse/HIVE-26984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683356#comment-17683356 ] Zoltan Haindrich commented on HIVE-26984: - I understand that you are after the ultimate comfort...but how often you would need this? you are saying that you want a system built-in which could tell you from "WHERE" conf keys are being altered...if that happens often - I would be interested in the causes of that... but I think you still have alternatives; you could probably: * enable aspectj weaving for the hive-exec module - since we are already shading the modul; that's not that big of a changeespecially if it could shade+weave at the same time.. * you could build-in the Traceable part into the main HiveConf object - right now you are returning a different impl if a conf key is set.. ** since this is a conf object - the state of that conf key is a chicken-egg problem: what if for some HiveConf instances you are loading from a different place/etc? and the key is off? you will not see those - but I guess those would be the most interesting ones...when someone just wrote `new HiveConf()`... * as about passing&launching the agent: I'm not sure - but maybe the agent can be placed inside say hive-exec or something; and then tweak the tez launch params (from inside HS2) to add the -agent to the launch cmdline ; probably similar for the HS2 startup...but that should be done somewhere in the scripts... ** so I don't see that way impossible either...have you already explored these paths? > Deprecate public HiveConf constructors > -- > > Key: HIVE-26984 > URL: https://issues.apache.org/jira/browse/HIVE-26984 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > From time to time we investigate configuration object problems that are hard > to investigate. We can improve this area, e.g. with HIVE-26985, but first, we > need to introduce a public static factory method to hook into the creation > process. I can see this pattern in another projects as well, like: > HBaseConfiguration. > Creating custom HiveConf subclasses can be useful because putting optional > (say: if else branches or whatever) stuff into the original HiveConf object's > hot codepaths can turn it less performant instantly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26984) Deprecate public HiveConf constructors
[ https://issues.apache.org/jira/browse/HIVE-26984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683337#comment-17683337 ] Zoltan Haindrich commented on HIVE-26984: - copied from HIVE-26985: I think you could probably achieve something similar by using AspectJ or Byteman or other java agent stuff; or you could write your own agent: https://stackify.com/what-are-java-agents-and-how-to-profile-with-them/ What's the problem with those approaches? I will leave a -1 here as it makes a significant API change by making the HiveConf constructor protected - which will break all 3rd party extensions which may use `new HiveConf()` > Deprecate public HiveConf constructors > -- > > Key: HIVE-26984 > URL: https://issues.apache.org/jira/browse/HIVE-26984 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > From time to time we investigate configuration object problems that are hard > to investigate. We can improve this area, e.g. with HIVE-26985, but first, we > need to introduce a public static factory method to hook into the creation > process. I can see this pattern in another projects as well, like: > HBaseConfiguration. > Creating custom HiveConf subclasses can be useful because putting optional > (say: if else branches or whatever) stuff into the original HiveConf object's > hot codepaths can turn it less performant instantly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26985) Create a trackable hive configuration object
[ https://issues.apache.org/jira/browse/HIVE-26985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683330#comment-17683330 ] Zoltan Haindrich commented on HIVE-26985: - I think you could probably achieve something similar by using AspectJ or Byteman or other java agent stuff; or write your own agent: https://stackify.com/what-are-java-agents-and-how-to-profile-with-them/ I will -1 the current patch because it makes a significant API change by making the HiveConf constructor protected - which will break all 3rd party extensions which may use `new HiveConf()` > Create a trackable hive configuration object > > > Key: HIVE-26985 > URL: https://issues.apache.org/jira/browse/HIVE-26985 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: hive.log > > Time Spent: 10m > Remaining Estimate: 0h > > During configuration-related investigations, I want to be able to easily find > out when and how a certain configuration is changed. I'm looking for an > improvement that simply logs if "hive.a.b.c" is changed from "hello" to > "asdf" or even null and on which thread/codepath. > Not sure if there is already a trackable configuration object in hadoop that > we can reuse, or we need to implement it in hive. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26400) Provide docker images for Hive
[ https://issues.apache.org/jira/browse/HIVE-26400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641232#comment-17641232 ] Zoltan Haindrich commented on HIVE-26400: - taking a quick look at the PR I'm not sure about the goal here...note that the following will fire up Hive from public dockerhub images/etc: {code} # To download and start the Hive in a docker image docker run --rm -p 1:1 --name hive4 -e HIVE_VERSION=4.0.0-alpha-1 -e TEZ_VERSION=0.10.1 -v hive-dev-box_work:/work kgyrtkirk/hive-dev-box:bazaar # After the pervious command is finished (it takes some time to download the image and start Hive) # In another terminal, to connect with BeeLine to Hive docker exec -it hive4 /bin/bash --login -e safe_bl {code} it will also cache downloaded artifacts; and don't need to "rebuild" a super-fat image every time a new version is being released...of course it might make sense to ditch all those features in case it will be used differently. > Provide docker images for Hive > -- > > Key: HIVE-26400 > URL: https://issues.apache.org/jira/browse/HIVE-26400 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Blocker > Labels: hive-4.0.0-must, pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > Make Apache Hive be able to run inside docker container in pseudo-distributed > mode, with MySQL/Derby as its back database, provide the following: > * Quick-start/Debugging/Prepare a test env for Hive; > * Tools to build target image with specified version of Hive and its > dependencies; > * Images can be used as the basis for the Kubernetes operator. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26741) Unexpect behavior for insert when table name is like `db.tab`
[ https://issues.apache.org/jira/browse/HIVE-26741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-26741. - Resolution: Duplicate dup of HIVE-16907 > Unexpect behavior for insert when table name is like `db.tab` > - > > Key: HIVE-26741 > URL: https://issues.apache.org/jira/browse/HIVE-26741 > Project: Hive > Issue Type: Bug > Components: SQL >Reporter: luoyuxia >Priority: Major > Attachments: image-2022-11-16-09-57-57-461.png, > image-2022-11-16-10-03-08-559.png, image-2022-11-16-10-08-31-699.png, > image-2022-11-16-10-09-40-766.png > > > Just meet a strange problem with following sql, it'll overwrite the data > instead of appending data. > {code:java} > insert into table `default.t1` values (1, 2){code} > The result is as follows: > !image-2022-11-16-09-57-57-461.png|width=397,height=362! > is it a bug or some other things? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26605) Remove reviewer pattern
[ https://issues.apache.org/jira/browse/HIVE-26605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-26605: --- > Remove reviewer pattern > --- > > Key: HIVE-26605 > URL: https://issues.apache.org/jira/browse/HIVE-26605 > Project: Hive > Issue Type: Sub-task >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-20607) TxnHandler should use PreparedStatement to execute direct SQL queries.
[ https://issues.apache.org/jira/browse/HIVE-20607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555030#comment-17555030 ] Zoltan Haindrich commented on HIVE-20607: - if it would have been on 3.1 - then it would have been released recentlybut as of now I don't know about any planned 3.x releases; I guess 4.0 will be next > TxnHandler should use PreparedStatement to execute direct SQL queries. > -- > > Key: HIVE-20607 > URL: https://issues.apache.org/jira/browse/HIVE-20607 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore, Transactions >Affects Versions: 3.1.0, 4.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: ACID, pull-request-available > Fix For: 3.2.0, 4.0.0, 4.0.0-alpha-1 > > Attachments: HIVE-20607.01-branch-3.patch, HIVE-20607.01.patch > > > TxnHandler uses direct SQL queries to operate on Txn related databases/tables > in Hive metastore RDBMS. > Most of the methods are direct calls from Metastore api which should be > directly append input string arguments to the SQL string. > Need to use parameterised PreparedStatement object to set these arguments. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-20607) TxnHandler should use PreparedStatement to execute direct SQL queries.
[ https://issues.apache.org/jira/browse/HIVE-20607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554996#comment-17554996 ] Zoltan Haindrich commented on HIVE-20607: - This patch is on branch-3 via [this commit|https://github.com/apache/hive/commit/09b92d3c864b00df99923f03a843a8179bd874a0]; I don't think we have a 3.2.1 release - or even 3.2.0; I don't see any traces of that ; we also don't even have a branch-3.2 right now. 3.2.0 is an [unreleased version|https://issues.apache.org/jira/projects/HIVE/versions/12343559] - I would recommend to use 4.0.0-alpha-1 which contains this fix. > TxnHandler should use PreparedStatement to execute direct SQL queries. > -- > > Key: HIVE-20607 > URL: https://issues.apache.org/jira/browse/HIVE-20607 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore, Transactions >Affects Versions: 3.1.0, 4.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: ACID, pull-request-available > Fix For: 3.2.0, 4.0.0, 4.0.0-alpha-1 > > Attachments: HIVE-20607.01-branch-3.patch, HIVE-20607.01.patch > > > TxnHandler uses direct SQL queries to operate on Txn related databases/tables > in Hive metastore RDBMS. > Most of the methods are direct calls from Metastore api which should be > directly append input string arguments to the SQL string. > Need to use parameterised PreparedStatement object to set these arguments. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-25733) Add check-spelling CI action
[ https://issues.apache.org/jira/browse/HIVE-25733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554542#comment-17554542 ] Zoltan Haindrich commented on HIVE-25733: - thank you [~pvary], the action seems to be checking the PR state and not the one merged with the actual master... thank you very much for fixing it! > Add check-spelling CI action > > > Key: HIVE-25733 > URL: https://issues.apache.org/jira/browse/HIVE-25733 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Reporter: Josh Soref >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Add CI to catch spelling errors. See [https://www.check-spelling.dev/] for > information. > Initially this will only check the {{serde}} directory, but the intention is > to expand its coverage as spelling errors in other directories are fixed. > Note that for this to work the action should be made a required check, > otherwise when a typo is added forks from that commit will get complaints. > If a typo is intentional, the action will provide information about how to > add it to {{expect.txt}} such that it will be accepted as an expected item > (i.e. not a typo). > To skip a file/directory entirely, add a matching entry to > {{{}excludes.txt{}}}. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HIVE-25879) MetaStoreDirectSql test query should not query the whole DBS table
[ https://issues.apache.org/jira/browse/HIVE-25879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-25879. - Fix Version/s: 4.0.0-alpha-2 Assignee: Miklos Szurap Resolution: Fixed merged into master; Thank you [~mszurap] for fixing this! > MetaStoreDirectSql test query should not query the whole DBS table > -- > > Key: HIVE-25879 > URL: https://issues.apache.org/jira/browse/HIVE-25879 > Project: Hive > Issue Type: Bug >Reporter: Miklos Szurap >Assignee: Miklos Szurap >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > The runTestQuery() in the > org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java is using a test query > {code:java} > select "DB_ID" from "DBS"{code} > to determine whether the direct SQL can be used. > With larger deployments with many (10k+) Hive databases it would be more > efficienct to query a small table instead, for example the "VERSION" table > should always have a single row only. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HIVE-26184) COLLECT_SET with GROUP BY is very slow when some keys are highly skewed
[ https://issues.apache.org/jira/browse/HIVE-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-26184. - Fix Version/s: 4.0.0-alpha-2 Resolution: Fixed merged into master. Thank you [~okumin] ! > COLLECT_SET with GROUP BY is very slow when some keys are highly skewed > --- > > Key: HIVE-26184 > URL: https://issues.apache.org/jira/browse/HIVE-26184 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.8, 3.1.3 >Reporter: okumin >Assignee: okumin >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > I observed some reducers spend 98% of CPU time in invoking > `java.util.HashMap#clear`. > Looking the detail, I found COLLECT_SET reuses a LinkedHashSet and its > `clear` can be quite heavy when a relation has a small number of highly > skewed keys. > > To reproduce the issue, first, we will create rows with a skewed key. > {code:java} > INSERT INTO test_collect_set > SELECT '----' AS key, CAST(UUID() AS VARCHAR) > AS value > FROM table_with_many_rows > LIMIT 10;{code} > Then, we will create many non-skewed rows. > {code:java} > INSERT INTO test_collect_set > SELECT UUID() AS key, UUID() AS value > FROM table_with_many_rows > LIMIT 500;{code} > We can observe the issue when we aggregate values by `key`. > {code:java} > SELECT key, COLLECT_SET(value) FROM group_by_skew GROUP BY key{code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-26303) [hive on spark] when hive.exec.parallel=true,beeline run sql in script,sometimes app is running but all job finished
[ https://issues.apache.org/jira/browse/HIVE-26303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552177#comment-17552177 ] Zoltan Haindrich commented on HIVE-26303: - please don't turn on hive.exec.parallel for any version of hive - especially for older ones as it may cause issues... > [hive on spark] when hive.exec.parallel=true,beeline run sql in > script,sometimes app is running but all job finished > > > Key: HIVE-26303 > URL: https://issues.apache.org/jira/browse/HIVE-26303 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 2.3.7 >Reporter: lkl >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HIVE-26268) Upgrade snappy-java to 1.1.8.4
[ https://issues.apache.org/jira/browse/HIVE-26268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-26268. - Fix Version/s: 4.0.0-alpha-2 Resolution: Fixed merged into master. Thank you [~slachiewicz]! > Upgrade snappy-java to 1.1.8.4 > -- > > Key: HIVE-26268 > URL: https://issues.apache.org/jira/browse/HIVE-26268 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Sylwester Lachiewicz >Assignee: Sylwester Lachiewicz >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Upgrade to get benefits from performance improvements and bug fixes. > Also to support Apple Silicon (M1, Mac-aarch64) > [https://github.com/xerial/snappy-java/blob/master/Milestone.md#snappy-java-1183-2021-01-20] > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HIVE-25635) Upgrade Thrift to 0.16.0
[ https://issues.apache.org/jira/browse/HIVE-25635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-25635. - Fix Version/s: 4.0.0-alpha-2 Resolution: Fixed merged into master. Thank you [~slachiewicz]! > Upgrade Thrift to 0.16.0 > > > Key: HIVE-25635 > URL: https://issues.apache.org/jira/browse/HIVE-25635 > Project: Hive > Issue Type: Improvement >Reporter: Yuming Wang >Assignee: Sylwester Lachiewicz >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > To addresses CVEs: > ||Component Name||Component Version Name||Vulnerability||Fixed version|| > |Apache > Thrift|0.11.0-4.|[CVE-2020-13949|https://github.com/advisories/GHSA-g2fg-mr77-6vrm]|0.14.1| -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (HIVE-25635) Upgrade Thrift to 0.16.0
[ https://issues.apache.org/jira/browse/HIVE-25635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-25635: --- Assignee: Sylwester Lachiewicz > Upgrade Thrift to 0.16.0 > > > Key: HIVE-25635 > URL: https://issues.apache.org/jira/browse/HIVE-25635 > Project: Hive > Issue Type: Improvement >Reporter: Yuming Wang >Assignee: Sylwester Lachiewicz >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > To addresses CVEs: > ||Component Name||Component Version Name||Vulnerability||Fixed version|| > |Apache > Thrift|0.11.0-4.|[CVE-2020-13949|https://github.com/advisories/GHSA-g2fg-mr77-6vrm]|0.14.1| -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26148) Keep MetaStoreFilterHook interface compatibility after introducing catalogs
[ https://issues.apache.org/jira/browse/HIVE-26148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-26148: Fix Version/s: 4.0.0-alpha-2 (was: 4.0.0-alpha-1) > Keep MetaStoreFilterHook interface compatibility after introducing catalogs > --- > > Key: HIVE-26148 > URL: https://issues.apache.org/jira/browse/HIVE-26148 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.0.0 >Reporter: Wechar >Assignee: Wechar >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Hive 3.0 introduce catalog concept, when we upgrade hive dependency version > from 2.3 to 3.x, we found some interfaces of *MetaStoreFilterHook* are not > compatible: > {code:bash} > git show ba8a99e115 -- > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreFilterHook.java > {code} > {code:bash} > --- > a/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreFilterHook.java > +++ > b/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreFilterHook.java >/** > * Filter given list of tables > - * @param dbName > - * @param tableList > + * @param catName catalog name > + * @param dbName database name > + * @param tableList list of table returned by the metastore > * @return List of filtered table names > */ > - public List filterTableNames(String dbName, List > tableList) throws MetaException; > + List filterTableNames(String catName, String dbName, List > tableList) > + throws MetaException; > {code} > We can remain the previous interfaces and use the default catalog to > implement. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-26224) Add support for ESRI GeoSpatial SERDE formats
[ https://issues.apache.org/jira/browse/HIVE-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544427#comment-17544427 ] Zoltan Haindrich commented on HIVE-26224: - this PR seem to have copied some sources from a different project - why did we have done that? I think we have enough problems already...why FLATTEN some 3rd party code directly into the QL module... https://github.com/Esri/spatial-framework-for-hadoop/blob/master/json/src/main/java/com/esri/json/hadoop/UnenclosedGeoJsonRecordReader.java > Add support for ESRI GeoSpatial SERDE formats > - > > Key: HIVE-26224 > URL: https://issues.apache.org/jira/browse/HIVE-26224 > Project: Hive > Issue Type: Sub-task >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 50m > Remaining Estimate: 0h > > Add support to use ESRI geospatial serde formats -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-26158) TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after rename table
[ https://issues.apache.org/jira/browse/HIVE-26158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544347#comment-17544347 ] Zoltan Haindrich commented on HIVE-26158: - [~sanguines] I think I've missed your comment; you could probably take a look at tickets marked with the newbie label: https://issues.apache.org/jira/browse/HIVE-25711?jql=project%20%3D%20Hive%20and%20labels%20%3D%20newbie%20%20ORDER%20BY%20id%20DESC let me know if you need more help - you could also reach out to us on the dev-list or on #hive in the asf slack (we don't use that channel for anything...) > TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after > rename table > -- > > Key: HIVE-26158 > URL: https://issues.apache.org/jira/browse/HIVE-26158 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2 >Reporter: tanghui >Assignee: Zoltan Haindrich >Priority: Major > Labels: metastore_translator, pull-request-available > Fix For: 4.0.0, 4.0.0-alpha-2 > > Time Spent: 1h > Remaining Estimate: 0h > > After the patch is updated, the partition table location and hdfs data > directory are displayed normally, but the partition location of the table in > the SDS in the Hive metabase is still displayed as the location of the old > table, resulting in no data in the query partition. > > in beeline: > > set hive.create.as.external.legacy=true; > CREATE TABLE part_test( > c1 string > ,c2 string > )PARTITIONED BY (dat string) > insert into part_test values ("11","th","20220101") > insert into part_test values ("22","th","20220102") > alter table part_test rename to part_test11; > --this result is null. > select * from part_test11 where dat="20220101"; > ||part_test.c1||part_test.c2||part_test.dat|| > | | | | > - > SDS in the Hive metabase: > select SDS.LOCATION from TBLS,SDS where TBLS.TBL_NAME="part_test11" AND > TBLS.TBL_ID=SDS.CD_ID; > --- > |*LOCATION*| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test11| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220101| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220102| > --- > > We need to modify the partition location of the table in SDS to ensure that > the query results are normal -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-25285) Retire HiveProjectJoinTransposeRule
[ https://issues.apache.org/jira/browse/HIVE-25285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-25285: Labels: newbie (was: ) > Retire HiveProjectJoinTransposeRule > --- > > Key: HIVE-25285 > URL: https://issues.apache.org/jira/browse/HIVE-25285 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Priority: Major > Labels: newbie > > we don't neccessary need our own rule anymore - a plain > ProjectJoinTransposeRule could probably work -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-26266) Column information is not present in lineage for CTAS when custom location/translated location is used
[ https://issues.apache.org/jira/browse/HIVE-26266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544218#comment-17544218 ] Zoltan Haindrich commented on HIVE-26266: - to update q.out-s you have to run the tests with `-Dtest.output.overwrite` > Column information is not present in lineage for CTAS when custom > location/translated location is used > -- > > Key: HIVE-26266 > URL: https://issues.apache.org/jira/browse/HIVE-26266 > Project: Hive > Issue Type: Bug >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: metastore_translator, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently CTAS considers only default table location when mapping the > location to the FileSinkOperator. This will miss the cases when a custom > location is specified as well as when the table has a translated location. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26266) Column information is not present in lineage for CTAS when custom location/translated location is used
[ https://issues.apache.org/jira/browse/HIVE-26266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-26266: Labels: metastore_translator (was: ) > Column information is not present in lineage for CTAS when custom > location/translated location is used > -- > > Key: HIVE-26266 > URL: https://issues.apache.org/jira/browse/HIVE-26266 > Project: Hive > Issue Type: Bug >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: metastore_translator > > Currently CTAS considers only default table location when mapping the > location to the FileSinkOperator. This will miss the cases when a custom > location is specified as well as when the table has a translated location. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-26084) Oracle metastore init tests are flaky
[ https://issues.apache.org/jira/browse/HIVE-26084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542011#comment-17542011 ] Zoltan Haindrich commented on HIVE-26084: - hmm..last time I checked I've only seen oracle-11g in xe - I'm so happy to see 18 and 21 :D > Oracle metastore init tests are flaky > - > > Key: HIVE-26084 > URL: https://issues.apache.org/jira/browse/HIVE-26084 > Project: Hive > Issue Type: Test > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Peter Vary >Assignee: Stamatis Zampetakis >Priority: Major > > After HIVE-26022 we started to run the oracle metastore init tests, but they > seem to be flaky. > I see this issue quite often: > http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-3147/1/pipeline/551 > We might have to increase the timeout, or use another oracle image for more > consistent tests. > The error in the logs for future reference > {code} > [2022-03-28T14:10:07.804Z] + echo 127.0.0.1 dev_oracle > [2022-03-28T14:10:07.804Z] + sudo tee -a /etc/hosts > [2022-03-28T14:10:07.804Z] 127.0.0.1 dev_oracle > [2022-03-28T14:10:07.804Z] + . /etc/profile.d/confs.sh > [2022-03-28T14:10:07.804Z] ++ export MAVEN_OPTS=-Xmx2g > [2022-03-28T14:10:07.804Z] ++ MAVEN_OPTS=-Xmx2g > [2022-03-28T14:10:07.804Z] ++ export HADOOP_CONF_DIR=/etc/hadoop > [2022-03-28T14:10:07.804Z] ++ HADOOP_CONF_DIR=/etc/hadoop > [2022-03-28T14:10:07.804Z] ++ export HADOOP_LOG_DIR=/data/log > [2022-03-28T14:10:07.804Z] ++ HADOOP_LOG_DIR=/data/log > [2022-03-28T14:10:07.804Z] ++ export > 'HADOOP_CLASSPATH=/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*' > [2022-03-28T14:10:07.804Z] ++ > HADOOP_CLASSPATH='/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*' > [2022-03-28T14:10:07.804Z] ++ export HIVE_CONF_DIR=/etc/hive/ > [2022-03-28T14:10:07.804Z] ++ HIVE_CONF_DIR=/etc/hive/ > [2022-03-28T14:10:07.804Z] ++ export > PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin > [2022-03-28T14:10:07.804Z] ++ > PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin > [2022-03-28T14:10:07.804Z] ++ . /etc/profile.d/java.sh > [2022-03-28T14:10:07.804Z] +++ export JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/ > [2022-03-28T14:10:07.804Z] +++ JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/ > [2022-03-28T14:10:07.804Z] + sw hive-dev > /home/jenkins/agent/workspace/hive-precommit_PR-3147 > [2022-03-28T14:10:07.804Z] @ activating: > /home/jenkins/agent/workspace/hive-precommit_PR-3147/packaging/target/apache-hive-4.0.0-alpha-1-SNAPSHOT-bin/apache-hive-4.0.0-alpha-1-SNAPSHOT-bin/ > for hive > [2022-03-28T14:10:07.804Z] + ping -c2 dev_oracle > [2022-03-28T14:10:07.804Z] PING dev_oracle (127.0.0.1) 56(84) bytes of data. > [2022-03-28T14:10:07.804Z] 64 bytes from localhost (127.0.0.1): icmp_seq=1 > ttl=64 time=0.082 ms > [2022-03-28T14:10:08.795Z] 64 bytes from localhost (127.0.0.1): icmp_seq=2 > ttl=64 time=0.087 ms > [2022-03-28T14:10:08.795Z] > [2022-03-28T14:10:08.795Z] --- dev_oracle ping statistics --- > [2022-03-28T14:10:08.795Z] 2 packets transmitted, 2 received, 0% packet loss, > time 51ms > [2022-03-28T14:10:08.795Z] rtt min/avg/max/mdev = 0.082/0.084/0.087/0.009 ms > [2022-03-28T14:10:08.795Z] + export DOCKER_NETWORK=host > [2022-03-28T14:10:08.795Z] + DOCKER_NETWORK=host > [2022-03-28T14:10:08.795Z] + export DBNAME=metastore > [2022-03-28T14:10:08.795Z] + DBNAME=metastore > [2022-03-28T14:10:08.795Z] + reinit_metastore oracle > [2022-03-28T14:10:08.795Z] @ initializing: oracle > [2022-03-28T14:10:08.795Z] metastore database name: metastore > [2022-03-28T14:10:09.135Z] @ starting dev_oracle... > [2022-03-28T14:10:09.445Z] Unable to find image > 'quay.io/maksymbilenko/oracle-12c:latest' locally > [2022-03-28T14:10:10.407Z] latest: Pulling from maksymbilenko/oracle-12c > [2022-03-28T14:10:10.407Z] 8ba884070f61: Pulling fs layer > [2022-03-28T14:10:10.407Z] ef9513b81046: Pulling fs layer > [2022-03-28T14:10:10.407Z] 6f1de349e202: Pulling fs layer > [2022-03-28T14:10:10.407Z] 5376ebfa0fa3: Pulling fs layer > [2022-03-28T14:10:10.407Z] 5f632c3633d2: Pulling fs layer > [2022-03-28T14:10:10.407Z] 3e74293031d2: Pulling fs layer > [2022-03-28T14:10:10.407Z] 5376ebfa0fa3: Waiting > [2022-03-28T14:10:10.407Z] 5f632c3633d2: Waiting > [2022-03-28T14:10:10.407Z] 3e74293031d2: Waiting > [2022-03-28T14:10:10.407Z] 6f1de349e202: Download complete > [2022-03-28T14:10:11.365Z] ef9513b81046: Download complete > [2022-03-28T14:10:11.365Z]
[jira] [Commented] (HIVE-26263) Mysql metastore init tests are flaky
[ https://issues.apache.org/jira/browse/HIVE-26263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542008#comment-17542008 ] Zoltan Haindrich commented on HIVE-26263: - I've [disabled the mysql/metastore test for now|https://github.com/apache/hive/commit/34b24d55ade393673424f077b69add43bad9f731] its strange that this happens so frequently and only for this database type... > Mysql metastore init tests are flaky > > > Key: HIVE-26263 > URL: https://issues.apache.org/jira/browse/HIVE-26263 > Project: Hive > Issue Type: Test > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Zoltan Haindrich >Priority: Major > > Similarly to HIVE-26084 (Oracle tests), also Mysql tests are failing > similarly. > In both cases we use _:latest_ as docker image version, which is probably not > ideal. > Reporting the error for future reference: > {noformat} > [2022-05-24T14:07:52.127Z] + sudo tee -a /etc/hosts > [2022-05-24T14:07:52.127Z] + echo 127.0.0.1 dev_mysql > [2022-05-24T14:07:52.127Z] 127.0.0.1 dev_mysql > [2022-05-24T14:07:52.127Z] + . /etc/profile.d/confs.sh > [2022-05-24T14:07:52.127Z] ++ export MAVEN_OPTS=-Xmx2g > [2022-05-24T14:07:52.127Z] ++ MAVEN_OPTS=-Xmx2g > [2022-05-24T14:07:52.127Z] ++ export HADOOP_CONF_DIR=/etc/hadoop > [2022-05-24T14:07:52.127Z] ++ HADOOP_CONF_DIR=/etc/hadoop > [2022-05-24T14:07:52.127Z] ++ export HADOOP_LOG_DIR=/data/log > [2022-05-24T14:07:52.127Z] ++ HADOOP_LOG_DIR=/data/log > [2022-05-24T14:07:52.127Z] ++ export > 'HADOOP_CLASSPATH=/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*' > [2022-05-24T14:07:52.127Z] ++ > HADOOP_CLASSPATH='/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*' > [2022-05-24T14:07:52.127Z] ++ export HIVE_CONF_DIR=/etc/hive/ > [2022-05-24T14:07:52.127Z] ++ HIVE_CONF_DIR=/etc/hive/ > [2022-05-24T14:07:52.127Z] ++ export > PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin > [2022-05-24T14:07:52.127Z] ++ > PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin > [2022-05-24T14:07:52.127Z] ++ . /etc/profile.d/java.sh > [2022-05-24T14:07:52.127Z] +++ export JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/ > [2022-05-24T14:07:52.127Z] +++ JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/ > [2022-05-24T14:07:52.127Z] + sw hive-dev > /home/jenkins/agent/workspace/hive-precommit_PR-3317 > [2022-05-24T14:07:52.127Z] @ activating: > /home/jenkins/agent/workspace/hive-precommit_PR-3317/packaging/target/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/ > for hive > [2022-05-24T14:07:52.127Z] + ping -c2 dev_mysql > [2022-05-24T14:07:52.127Z] PING dev_mysql (127.0.0.1) 56(84) bytes of data. > [2022-05-24T14:07:52.127Z] 64 bytes from localhost (127.0.0.1): icmp_seq=1 > ttl=64 time=0.114 ms > [2022-05-24T14:07:53.107Z] 64 bytes from localhost (127.0.0.1): icmp_seq=2 > ttl=64 time=0.123 ms > [2022-05-24T14:07:53.107Z] > [2022-05-24T14:07:53.107Z] --- dev_mysql ping statistics --- > [2022-05-24T14:07:53.107Z] 2 packets transmitted, 2 received, 0% packet loss, > time 49ms > [2022-05-24T14:07:53.107Z] rtt min/avg/max/mdev = 0.114/0.118/0.123/0.011 ms > [2022-05-24T14:07:53.107Z] + export DOCKER_NETWORK=host > [2022-05-24T14:07:53.107Z] + DOCKER_NETWORK=host > [2022-05-24T14:07:53.107Z] + export DBNAME=metastore > [2022-05-24T14:07:53.107Z] + DBNAME=metastore > [2022-05-24T14:07:53.107Z] + reinit_metastore mysql > [2022-05-24T14:07:53.107Z] @ initializing: mysql > [2022-05-24T14:07:53.107Z] metastore database name: metastore > [2022-05-24T14:07:53.381Z] @ starting dev_mysql... > [2022-05-24T14:07:53.382Z] Unable to find image 'mariadb:latest' locally > [2022-05-24T14:07:54.354Z] latest: Pulling from library/mariadb > [2022-05-24T14:07:54.354Z] 125a6e411906: Pulling fs layer > [2022-05-24T14:07:54.354Z] a28b55cc656d: Pulling fs layer > [2022-05-24T14:07:54.354Z] f2325f4e25a1: Pulling fs layer > [2022-05-24T14:07:54.354Z] c6c2d09f748d: Pulling fs layer > [2022-05-24T14:07:54.354Z] af2b4ed853d2: Pulling fs layer > [2022-05-24T14:07:54.354Z] 8394ac6b401e: Pulling fs layer > [2022-05-24T14:07:54.354Z] 5b150cf0c5a7: Pulling fs layer > [2022-05-24T14:07:54.354Z] 1b11b2e20899: Pulling fs layer > [2022-05-24T14:07:54.354Z] 3d35790a91d9: Pulling fs layer > [2022-05-24T14:07:54.354Z] 5e73c7793365: Pulling fs layer > [2022-05-24T14:07:54.354Z] 3d34b9f14ede: Pulling fs layer > [2022-05-24T14:07:54.354Z] c6c2d09f748d: Waiting > [2022-05-24T14:07:54.354Z] 8
[jira] [Assigned] (HIVE-26263) Mysql metastore init tests are flaky
[ https://issues.apache.org/jira/browse/HIVE-26263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-26263: --- Assignee: Zoltan Haindrich > Mysql metastore init tests are flaky > > > Key: HIVE-26263 > URL: https://issues.apache.org/jira/browse/HIVE-26263 > Project: Hive > Issue Type: Test > Components: Testing Infrastructure >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Zoltan Haindrich >Priority: Major > > Similarly to HIVE-26084 (Oracle tests), also Mysql tests are failing > similarly. > In both cases we use _:latest_ as docker image version, which is probably not > ideal. > Reporting the error for future reference: > {noformat} > [2022-05-24T14:07:52.127Z] + sudo tee -a /etc/hosts > [2022-05-24T14:07:52.127Z] + echo 127.0.0.1 dev_mysql > [2022-05-24T14:07:52.127Z] 127.0.0.1 dev_mysql > [2022-05-24T14:07:52.127Z] + . /etc/profile.d/confs.sh > [2022-05-24T14:07:52.127Z] ++ export MAVEN_OPTS=-Xmx2g > [2022-05-24T14:07:52.127Z] ++ MAVEN_OPTS=-Xmx2g > [2022-05-24T14:07:52.127Z] ++ export HADOOP_CONF_DIR=/etc/hadoop > [2022-05-24T14:07:52.127Z] ++ HADOOP_CONF_DIR=/etc/hadoop > [2022-05-24T14:07:52.127Z] ++ export HADOOP_LOG_DIR=/data/log > [2022-05-24T14:07:52.127Z] ++ HADOOP_LOG_DIR=/data/log > [2022-05-24T14:07:52.127Z] ++ export > 'HADOOP_CLASSPATH=/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*' > [2022-05-24T14:07:52.127Z] ++ > HADOOP_CLASSPATH='/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*' > [2022-05-24T14:07:52.127Z] ++ export HIVE_CONF_DIR=/etc/hive/ > [2022-05-24T14:07:52.127Z] ++ HIVE_CONF_DIR=/etc/hive/ > [2022-05-24T14:07:52.127Z] ++ export > PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin > [2022-05-24T14:07:52.127Z] ++ > PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin > [2022-05-24T14:07:52.127Z] ++ . /etc/profile.d/java.sh > [2022-05-24T14:07:52.127Z] +++ export JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/ > [2022-05-24T14:07:52.127Z] +++ JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/ > [2022-05-24T14:07:52.127Z] + sw hive-dev > /home/jenkins/agent/workspace/hive-precommit_PR-3317 > [2022-05-24T14:07:52.127Z] @ activating: > /home/jenkins/agent/workspace/hive-precommit_PR-3317/packaging/target/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/ > for hive > [2022-05-24T14:07:52.127Z] + ping -c2 dev_mysql > [2022-05-24T14:07:52.127Z] PING dev_mysql (127.0.0.1) 56(84) bytes of data. > [2022-05-24T14:07:52.127Z] 64 bytes from localhost (127.0.0.1): icmp_seq=1 > ttl=64 time=0.114 ms > [2022-05-24T14:07:53.107Z] 64 bytes from localhost (127.0.0.1): icmp_seq=2 > ttl=64 time=0.123 ms > [2022-05-24T14:07:53.107Z] > [2022-05-24T14:07:53.107Z] --- dev_mysql ping statistics --- > [2022-05-24T14:07:53.107Z] 2 packets transmitted, 2 received, 0% packet loss, > time 49ms > [2022-05-24T14:07:53.107Z] rtt min/avg/max/mdev = 0.114/0.118/0.123/0.011 ms > [2022-05-24T14:07:53.107Z] + export DOCKER_NETWORK=host > [2022-05-24T14:07:53.107Z] + DOCKER_NETWORK=host > [2022-05-24T14:07:53.107Z] + export DBNAME=metastore > [2022-05-24T14:07:53.107Z] + DBNAME=metastore > [2022-05-24T14:07:53.107Z] + reinit_metastore mysql > [2022-05-24T14:07:53.107Z] @ initializing: mysql > [2022-05-24T14:07:53.107Z] metastore database name: metastore > [2022-05-24T14:07:53.381Z] @ starting dev_mysql... > [2022-05-24T14:07:53.382Z] Unable to find image 'mariadb:latest' locally > [2022-05-24T14:07:54.354Z] latest: Pulling from library/mariadb > [2022-05-24T14:07:54.354Z] 125a6e411906: Pulling fs layer > [2022-05-24T14:07:54.354Z] a28b55cc656d: Pulling fs layer > [2022-05-24T14:07:54.354Z] f2325f4e25a1: Pulling fs layer > [2022-05-24T14:07:54.354Z] c6c2d09f748d: Pulling fs layer > [2022-05-24T14:07:54.354Z] af2b4ed853d2: Pulling fs layer > [2022-05-24T14:07:54.354Z] 8394ac6b401e: Pulling fs layer > [2022-05-24T14:07:54.354Z] 5b150cf0c5a7: Pulling fs layer > [2022-05-24T14:07:54.354Z] 1b11b2e20899: Pulling fs layer > [2022-05-24T14:07:54.354Z] 3d35790a91d9: Pulling fs layer > [2022-05-24T14:07:54.354Z] 5e73c7793365: Pulling fs layer > [2022-05-24T14:07:54.354Z] 3d34b9f14ede: Pulling fs layer > [2022-05-24T14:07:54.354Z] c6c2d09f748d: Waiting > [2022-05-24T14:07:54.354Z] 8394ac6b401e: Waiting > [2022-05-24T14:07:54.354Z] 5b150cf0c5a7: Waiting > [2022-05-24T14:07:54.354Z] 3d35790a91d9: Waiting > [2022-05-24T14:07:54.354Z] 5e73c7793365: Waiting > [2022-05-24T14:07:54.354Z] 3d34b9f14ede: Waiting > [20
[jira] [Commented] (HIVE-26158) TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after rename table
[ https://issues.apache.org/jira/browse/HIVE-26158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534954#comment-17534954 ] Zoltan Haindrich commented on HIVE-26158: - [~sanguines] there were some customers reporting this happening for them - and I was already working on a patch...and honestly I could have noticed how this could be done more precisely in the previous ticket... ($#%) let me know if you are looking for some tickets to work on; I could try to find one > TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after > rename table > -- > > Key: HIVE-26158 > URL: https://issues.apache.org/jira/browse/HIVE-26158 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2 >Reporter: tanghui >Assignee: Zoltan Haindrich >Priority: Major > Labels: metastore_translator, pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > After the patch is updated, the partition table location and hdfs data > directory are displayed normally, but the partition location of the table in > the SDS in the Hive metabase is still displayed as the location of the old > table, resulting in no data in the query partition. > > in beeline: > > set hive.create.as.external.legacy=true; > CREATE TABLE part_test( > c1 string > ,c2 string > )PARTITIONED BY (dat string) > insert into part_test values ("11","th","20220101") > insert into part_test values ("22","th","20220102") > alter table part_test rename to part_test11; > --this result is null. > select * from part_test11 where dat="20220101"; > ||part_test.c1||part_test.c2||part_test.dat|| > | | | | > - > SDS in the Hive metabase: > select SDS.LOCATION from TBLS,SDS where TBLS.TBL_NAME="part_test11" AND > TBLS.TBL_ID=SDS.CD_ID; > --- > |*LOCATION*| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test11| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220101| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220102| > --- > > We need to modify the partition location of the table in SDS to ensure that > the query results are normal -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HIVE-26158) TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after rename table
[ https://issues.apache.org/jira/browse/HIVE-26158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-26158. - Fix Version/s: 4.0.0 4.0.0-alpha-2 Resolution: Fixed > TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after > rename table > -- > > Key: HIVE-26158 > URL: https://issues.apache.org/jira/browse/HIVE-26158 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2 >Reporter: tanghui >Assignee: Zoltan Haindrich >Priority: Major > Labels: metastore_translator, pull-request-available > Fix For: 4.0.0, 4.0.0-alpha-2 > > Time Spent: 1h > Remaining Estimate: 0h > > After the patch is updated, the partition table location and hdfs data > directory are displayed normally, but the partition location of the table in > the SDS in the Hive metabase is still displayed as the location of the old > table, resulting in no data in the query partition. > > in beeline: > > set hive.create.as.external.legacy=true; > CREATE TABLE part_test( > c1 string > ,c2 string > )PARTITIONED BY (dat string) > insert into part_test values ("11","th","20220101") > insert into part_test values ("22","th","20220102") > alter table part_test rename to part_test11; > --this result is null. > select * from part_test11 where dat="20220101"; > ||part_test.c1||part_test.c2||part_test.dat|| > | | | | > - > SDS in the Hive metabase: > select SDS.LOCATION from TBLS,SDS where TBLS.TBL_NAME="part_test11" AND > TBLS.TBL_ID=SDS.CD_ID; > --- > |*LOCATION*| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test11| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220101| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220102| > --- > > We need to modify the partition location of the table in SDS to ensure that > the query results are normal -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-26158) TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after rename table
[ https://issues.apache.org/jira/browse/HIVE-26158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534955#comment-17534955 ] Zoltan Haindrich commented on HIVE-26158: - merged into master; Thank you Saihemanth Gantasala for reviewing the changes! > TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after > rename table > -- > > Key: HIVE-26158 > URL: https://issues.apache.org/jira/browse/HIVE-26158 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2 >Reporter: tanghui >Assignee: Zoltan Haindrich >Priority: Major > Labels: metastore_translator, pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > After the patch is updated, the partition table location and hdfs data > directory are displayed normally, but the partition location of the table in > the SDS in the Hive metabase is still displayed as the location of the old > table, resulting in no data in the query partition. > > in beeline: > > set hive.create.as.external.legacy=true; > CREATE TABLE part_test( > c1 string > ,c2 string > )PARTITIONED BY (dat string) > insert into part_test values ("11","th","20220101") > insert into part_test values ("22","th","20220102") > alter table part_test rename to part_test11; > --this result is null. > select * from part_test11 where dat="20220101"; > ||part_test.c1||part_test.c2||part_test.dat|| > | | | | > - > SDS in the Hive metabase: > select SDS.LOCATION from TBLS,SDS where TBLS.TBL_NAME="part_test11" AND > TBLS.TBL_ID=SDS.CD_ID; > --- > |*LOCATION*| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test11| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220101| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220102| > --- > > We need to modify the partition location of the table in SDS to ensure that > the query results are normal -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-26220) Shade & relocate dependencies in hive-exec to avoid conflicting with downstream projects
[ https://issues.apache.org/jira/browse/HIVE-26220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534713#comment-17534713 ] Zoltan Haindrich commented on HIVE-26220: - [~csun] have you tried using the current hive-exec from master? the shading was improved some time ago; especially in HIVE-22126. Probably the best would be to provide some usecases for the usage of the artifact - preferably with testcases; so that we don't break it again in the future...but I admit - this might not be a good ask... Correct me if I'm wrong but it sounds a bit unfair to push the task of evaluating and upgrading other projects to run with the next version - just because they might upgrade to it (in my mind fixing this blocker task would mean that). So I think the best middle ground could be to provide support for projects which do "their part" first - and they could link some development branches which is already using an 4.0.0-alpha-X release. > Shade & relocate dependencies in hive-exec to avoid conflicting with > downstream projects > > > Key: HIVE-26220 > URL: https://issues.apache.org/jira/browse/HIVE-26220 > Project: Hive > Issue Type: Improvement >Affects Versions: 4.0.0 >Reporter: Chao Sun >Priority: Blocker > > Currently projects like Spark, Trino/Presto, Iceberg, etc, are depending on > {{hive-exec:core}} which was removed in HIVE-25531. The reason these projects > use {{hive-exec:core}} is because they have the flexibility to exclude, shade > & relocate dependencies in {{hive-exec}} that conflict with the ones they > brought in by themselves. However, with {{hive-exec}} this is no longer > possible, since it is a fat jar that shade those dependencies but do not > relocate many of them. > In order for the downstream projects to consume {{hive-exec}}, we will need > to make sure all the dependencies in {{hive-exec}} are properly shaded and > relocated, so they won't cause conflicts with those from the downstream. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-25976) Cleaner may remove files being accessed from a fetch-task-converted reader
[ https://issues.apache.org/jira/browse/HIVE-25976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531212#comment-17531212 ] Zoltan Haindrich commented on HIVE-25976: - attached a unit test which reproduces the behaviour > Cleaner may remove files being accessed from a fetch-task-converted reader > -- > > Key: HIVE-25976 > URL: https://issues.apache.org/jira/browse/HIVE-25976 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Priority: Major > Attachments: fetch_task_conv_compactor_test.patch > > > in a nutshell the following happens: > * query is compiled in fetch-task-converted mode > * no real execution happensbut the locks are released > * the HS2 is communicating with the client and uses the fetch-task to get the > rows - which in this case will directly read files from the table's > directory > * client sleeps between reads - so there is ample time for other events... > * cleaner wakes up and removes some files > * in the next read the fetch-task encounters a read error... -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-25976) Cleaner may remove files being accessed from a fetch-task-converted reader
[ https://issues.apache.org/jira/browse/HIVE-25976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-25976: Attachment: fetch_task_conv_compactor_test.patch > Cleaner may remove files being accessed from a fetch-task-converted reader > -- > > Key: HIVE-25976 > URL: https://issues.apache.org/jira/browse/HIVE-25976 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Priority: Major > Attachments: fetch_task_conv_compactor_test.patch > > > in a nutshell the following happens: > * query is compiled in fetch-task-converted mode > * no real execution happensbut the locks are released > * the HS2 is communicating with the client and uses the fetch-task to get the > rows - which in this case will directly read files from the table's > directory > * client sleeps between reads - so there is ample time for other events... > * cleaner wakes up and removes some files > * in the next read the fetch-task encounters a read error... -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26158) TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after rename table
[ https://issues.apache.org/jira/browse/HIVE-26158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-26158: Labels: metastore_translator pull-request-available (was: pull-request-available) > TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after > rename table > -- > > Key: HIVE-26158 > URL: https://issues.apache.org/jira/browse/HIVE-26158 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2 >Reporter: tanghui >Assignee: Zoltan Haindrich >Priority: Major > Labels: metastore_translator, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > After the patch is updated, the partition table location and hdfs data > directory are displayed normally, but the partition location of the table in > the SDS in the Hive metabase is still displayed as the location of the old > table, resulting in no data in the query partition. > > in beeline: > > set hive.create.as.external.legacy=true; > CREATE TABLE part_test( > c1 string > ,c2 string > )PARTITIONED BY (dat string) > insert into part_test values ("11","th","20220101") > insert into part_test values ("22","th","20220102") > alter table part_test rename to part_test11; > --this result is null. > select * from part_test11 where dat="20220101"; > ||part_test.c1||part_test.c2||part_test.dat|| > | | | | > - > SDS in the Hive metabase: > select SDS.LOCATION from TBLS,SDS where TBLS.TBL_NAME="part_test11" AND > TBLS.TBL_ID=SDS.CD_ID; > --- > |*LOCATION*| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test11| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220101| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220102| > --- > > We need to modify the partition location of the table in SDS to ensure that > the query results are normal -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-26184) COLLECT_SET with GROUP BY is very slow when some keys are highly skewed
[ https://issues.apache.org/jira/browse/HIVE-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17529368#comment-17529368 ] Zoltan Haindrich commented on HIVE-26184: - because the value will be the same - I think collecting any number of them into a SET will not make the key for it overload - unless the hashCode of that UUID value is always the same constant...but in that case we should fix that - because it will make slow all the other operations; including `contains` > COLLECT_SET with GROUP BY is very slow when some keys are highly skewed > --- > > Key: HIVE-26184 > URL: https://issues.apache.org/jira/browse/HIVE-26184 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.8, 3.1.3 >Reporter: okumin >Assignee: okumin >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > I observed some reducers spend 98% of CPU time in invoking > `java.util.HashMap#clear`. > Looking the detail, I found COLLECT_SET reuses a LinkedHashSet and its > `clear` can be quite heavy when a relation has a small number of highly > skewed keys. > > To reproduce the issue, first, we will create rows with a skewed key. > {code:java} > INSERT INTO test_collect_set > SELECT '----' AS key, CAST(UUID() AS VARCHAR) > AS value > FROM table_with_many_rows > LIMIT 10;{code} > Then, we will create many non-skewed rows. > {code:java} > INSERT INTO test_collect_set > SELECT UUID() AS key, UUID() AS value > FROM sample_datasets.nasdaq > LIMIT 500;{code} > We can observe the issue when we aggregate values by `key`. > {code:java} > SELECT key, COLLECT_SET(value) FROM group_by_skew GROUP BY key{code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HIVE-26135) Invalid Anti join conversion may cause missing results
[ https://issues.apache.org/jira/browse/HIVE-26135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-26135. - Fix Version/s: 4.0.0-alpha-2 Resolution: Fixed merged into master. Thank you [~kkasa] for reviewing the changes! > Invalid Anti join conversion may cause missing results > -- > > Key: HIVE-26135 > URL: https://issues.apache.org/jira/browse/HIVE-26135 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 20m > Remaining Estimate: 0h > > right now I think the following is needed to trigger the issue: > * left outer join > * only select left hand side columns > * conditional which is using some udf > * the nullness of the udf is checked > repro sql; in case the conversion happens the row with 'a' will be missing > {code} > drop table if exists t; > drop table if exists n; > create table t(a string) stored as orc; > create table n(a string) stored as orc; > insert into t values ('a'),('1'),('2'),(null); > insert into n values ('a'),('b'),('1'),('3'),(null); > explain select n.* from n left outer join t on (n.a=t.a) where > assert_true(t.a is null) is null; > explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as > float) is null; > select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is > null; > set hive.auto.convert.anti.join=false; > select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is > null; > {code} > resultset with hive.auto.convert.anti.join enabled: > {code} > +--+ > | n.a | > +--+ > | b| > | 3| > +--+ > {code} > correct resultset with hive.auto.convert.anti.join disabled: > {code} > +---+ > | n.a | > +---+ > | a | > | b | > | 3 | > | NULL | > +---+ > {code} > workaround could be to disable the feature: > {code} > set hive.auto.convert.anti.join=false; > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (HIVE-26135) Invalid Anti join conversion may cause missing results
[ https://issues.apache.org/jira/browse/HIVE-26135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-26135: Description: right now I think the following is needed to trigger the issue: * left outer join * only select left hand side columns * conditional which is using some udf * the nullness of the udf is checked repro sql; in case the conversion happens the row with 'a' will be missing {code} drop table if exists t; drop table if exists n; create table t(a string) stored as orc; create table n(a string) stored as orc; insert into t values ('a'),('1'),('2'),(null); insert into n values ('a'),('b'),('1'),('3'),(null); explain select n.* from n left outer join t on (n.a=t.a) where assert_true(t.a is null) is null; explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is null; select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is null; set hive.auto.convert.anti.join=false; select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is null; {code} resultset with hive.auto.convert.anti.join enabled: {code} +--+ | n.a | +--+ | b| | 3| +--+ {code} correct resultset with hive.auto.convert.anti.join disabled: {code} +---+ | n.a | +---+ | a | | b | | 3 | | NULL | +---+ {code} workaround could be to disable the feature: {code} set hive.auto.convert.anti.join=false; {code} was: right now I think the following is needed to trigger the issue: * left outer join * only select left hand side columns * conditional which is using some udf * the nullness of the udf is checked repro sql; in case the conversion happens the row with 'a' will be missing {code} drop table if exists t; drop table if exists n; create table t(a string) stored as orc; create table n(a string) stored as orc; insert into t values ('a'),('1'),('2'),(null); insert into n values ('a'),('b'),('1'),('3'),(null); explain select n.* from n left outer join t on (n.a=t.a) where assert_true(t.a is null) is null; explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is null; select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is null; set hive.auto.convert.anti.join=false; select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is null; {code} workaround could be to disable the feature: {code} set hive.auto.convert.anti.join=false; {code} > Invalid Anti join conversion may cause missing results > -- > > Key: HIVE-26135 > URL: https://issues.apache.org/jira/browse/HIVE-26135 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > right now I think the following is needed to trigger the issue: > * left outer join > * only select left hand side columns > * conditional which is using some udf > * the nullness of the udf is checked > repro sql; in case the conversion happens the row with 'a' will be missing > {code} > drop table if exists t; > drop table if exists n; > create table t(a string) stored as orc; > create table n(a string) stored as orc; > insert into t values ('a'),('1'),('2'),(null); > insert into n values ('a'),('b'),('1'),('3'),(null); > explain select n.* from n left outer join t on (n.a=t.a) where > assert_true(t.a is null) is null; > explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as > float) is null; > select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is > null; > set hive.auto.convert.anti.join=false; > select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is > null; > {code} > resultset with hive.auto.convert.anti.join enabled: > {code} > +--+ > | n.a | > +--+ > | b| > | 3| > +--+ > {code} > correct resultset with hive.auto.convert.anti.join disabled: > {code} > +---+ > | n.a | > +---+ > | a | > | b | > | 3 | > | NULL | > +---+ > {code} > workaround could be to disable the feature: > {code} > set hive.auto.convert.anti.join=false; > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-26158) TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after rename table
[ https://issues.apache.org/jira/browse/HIVE-26158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527440#comment-17527440 ] Zoltan Haindrich commented on HIVE-26158: - [~sanguines] I've also just bumped into the exact same thing - let me know if you would like to pick this up I'll probably post a patch for it in the next couple days > TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after > rename table > -- > > Key: HIVE-26158 > URL: https://issues.apache.org/jira/browse/HIVE-26158 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2 >Reporter: tanghui >Assignee: Zoltan Haindrich >Priority: Major > > After the patch is updated, the partition table location and hdfs data > directory are displayed normally, but the partition location of the table in > the SDS in the Hive metabase is still displayed as the location of the old > table, resulting in no data in the query partition. > > in beeline: > > set hive.create.as.external.legacy=true; > CREATE TABLE part_test( > c1 string > ,c2 string > )PARTITIONED BY (dat string) > insert into part_test values ("11","th","20220101") > insert into part_test values ("22","th","20220102") > alter table part_test rename to part_test11; > --this result is null. > select * from part_test11 where dat="20220101"; > ||part_test.c1||part_test.c2||part_test.dat|| > | | | | > - > SDS in the Hive metabase: > select SDS.LOCATION from TBLS,SDS where TBLS.TBL_NAME="part_test11" AND > TBLS.TBL_ID=SDS.CD_ID; > --- > |*LOCATION*| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test11| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220101| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220102| > --- > > We need to modify the partition location of the table in SDS to ensure that > the query results are normal -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (HIVE-26158) TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after rename table
[ https://issues.apache.org/jira/browse/HIVE-26158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-26158: --- Assignee: Zoltan Haindrich > TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after > rename table > -- > > Key: HIVE-26158 > URL: https://issues.apache.org/jira/browse/HIVE-26158 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2 >Reporter: tanghui >Assignee: Zoltan Haindrich >Priority: Major > > After the patch is updated, the partition table location and hdfs data > directory are displayed normally, but the partition location of the table in > the SDS in the Hive metabase is still displayed as the location of the old > table, resulting in no data in the query partition. > > in beeline: > > set hive.create.as.external.legacy=true; > CREATE TABLE part_test( > c1 string > ,c2 string > )PARTITIONED BY (dat string) > insert into part_test values ("11","th","20220101") > insert into part_test values ("22","th","20220102") > alter table part_test rename to part_test11; > --this result is null. > select * from part_test11 where dat="20220101"; > ||part_test.c1||part_test.c2||part_test.dat|| > | | | | > - > SDS in the Hive metabase: > select SDS.LOCATION from TBLS,SDS where TBLS.TBL_NAME="part_test11" AND > TBLS.TBL_ID=SDS.CD_ID; > --- > |*LOCATION*| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test11| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220101| > |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220102| > --- > > We need to modify the partition location of the table in SDS to ensure that > the query results are normal -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HIVE-26163) Incorrect format in columnstats_columnname_parse.q's insert statement can cause exceptions
[ https://issues.apache.org/jira/browse/HIVE-26163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526261#comment-17526261 ] Zoltan Haindrich commented on HIVE-26163: - is something going wrong while processing this? {code} insert into table2 values("1","1","1"); {code} Is this problem flaky? but in any case I think this is a serious issue - and we should fix it without altering the qfile > Incorrect format in columnstats_columnname_parse.q's insert statement can > cause exceptions > -- > > Key: HIVE-26163 > URL: https://issues.apache.org/jira/browse/HIVE-26163 > Project: Hive > Issue Type: Improvement >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Minor > > Exception: > {code:java} > 2022-04-20T10:13:06,467 ERROR [016f5292-40a7-4fe6-be58-1c988fa4a6e5 main] > metastore.RetryingHMSHandler: java.lang.IndexOutOfBoundsException: Index: 0 > at java.util.Collections$EmptyList.get(Collections.java:4456) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updatePartColumnStatsWithMerge(HiveMetaStore.java:9099) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:9054) > at sun.reflect.GeneratedMethodAccessor193.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:160) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:121) > at com.sun.proxy.$Proxy59.set_aggr_stats_for(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.setPartitionColumnStatistics(HiveMetaStoreClient.java:2974) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.setPartitionColumnStatistics(SessionHiveMetaStoreClient.java:571) > at sun.reflect.GeneratedMethodAccessor192.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:216) > at com.sun.proxy.$Proxy60.setPartitionColumnStatistics(Unknown Source) > at > org.apache.hadoop.hive.ql.metadata.Hive.setPartitionColumnStatistics(Hive.java:5583) > at > org.apache.hadoop.hive.ql.stats.ColStatsProcessor.persistColumnStats(ColStatsProcessor.java:223) > at > org.apache.hadoop.hive.ql.stats.ColStatsProcessor.process(ColStatsProcessor.java:94) > at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:107) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361) > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334) > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250) > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:775) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:524) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:518) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:232) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256) > at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:421) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:352) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:853) > at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:823) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:192) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) > at > org.apache.hadoop.hive.cli.split4.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62) > at sun.reflect.GeneratedMethodAccessor180.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Meth
[jira] [Commented] (HIVE-26135) Invalid Anti join conversion may cause missing results
[ https://issues.apache.org/jira/browse/HIVE-26135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521205#comment-17521205 ] Zoltan Haindrich commented on HIVE-26135: - wanted to add a check for "Strong"-ness; however, consider: {code} (leftCol + rightCol) IS NULL {code} since we want to deduce that the nullness of the expression strongly depends on that `rightCol` can not be anything else than `null`... like: {code} (a + null) IS NULL {code} however; if the lefthandside is null - could also make it null; and in case rightCol is not in the joinkeys we could loose correct results... > Invalid Anti join conversion may cause missing results > -- > > Key: HIVE-26135 > URL: https://issues.apache.org/jira/browse/HIVE-26135 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > right now I think the following is needed to trigger the issue: > * left outer join > * only select left hand side columns > * conditional which is using some udf > * the nullness of the udf is checked > repro sql; in case the conversion happens the row with 'a' will be missing > {code} > drop table if exists t; > drop table if exists n; > create table t(a string) stored as orc; > create table n(a string) stored as orc; > insert into t values ('a'),('1'),('2'),(null); > insert into n values ('a'),('b'),('1'),('3'),(null); > explain select n.* from n left outer join t on (n.a=t.a) where > assert_true(t.a is null) is null; > explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as > float) is null; > select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is > null; > set hive.auto.convert.anti.join=false; > select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is > null; > {code} > workaround could be to disable the feature: > {code} > set hive.auto.convert.anti.join=false; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-26135) Invalid Anti join conversion may cause missing results
[ https://issues.apache.org/jira/browse/HIVE-26135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-26135: --- > Invalid Anti join conversion may cause missing results > -- > > Key: HIVE-26135 > URL: https://issues.apache.org/jira/browse/HIVE-26135 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > right now I think the following is needed to trigger the issue: > * left outer join > * only select left hand side columns > * conditional which is using some udf > * the nullness of the udf is checked > repro sql; in case the conversion happens the row with 'a' will be missing > {code} > drop table if exists t; > drop table if exists n; > create table t(a string) stored as orc; > create table n(a string) stored as orc; > insert into t values ('a'),('1'),('2'),(null); > insert into n values ('a'),('b'),('1'),('3'),(null); > explain select n.* from n left outer join t on (n.a=t.a) where > assert_true(t.a is null) is null; > explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as > float) is null; > select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is > null; > set hive.auto.convert.anti.join=false; > select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is > null; > {code} > workaround could be to disable the feature: > {code} > set hive.auto.convert.anti.join=false; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-26117) Remove 2 superfluous lines of code in genJoinRelNode
[ https://issues.apache.org/jira/browse/HIVE-26117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-26117: --- Assignee: Steve Carlin > Remove 2 superfluous lines of code in genJoinRelNode > > > Key: HIVE-26117 > URL: https://issues.apache.org/jira/browse/HIVE-26117 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Steve Carlin >Assignee: Steve Carlin >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The code was rewritten to associate ASTNodes to RexNodes. Some code was left > behind that doesn't add any value. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-26002) Create db scripts for 4.0.0-alpha-1
[ https://issues.apache.org/jira/browse/HIVE-26002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500921#comment-17500921 ] Zoltan Haindrich commented on HIVE-26002: - schematool is doing pretty basic comparisions we could easily get into trouble if we have 2 versions from which one is the prefix of the other (ex: 4.0.0 vs 4.0.0-alpha-1) https://github.com/apache/hive/blob/95c6155677b5a288b6bc571b11caf2c8eb80825f/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java#L106 https://the-asf.slack.com/archives/CFSSP9UPJ/p1646326383307109?thread_ts=1646235033.395189&cid=CFSSP9UPJ > Create db scripts for 4.0.0-alpha-1 > --- > > Key: HIVE-26002 > URL: https://issues.apache.org/jira/browse/HIVE-26002 > Project: Hive > Issue Type: Task >Reporter: Peter Vary >Priority: Major > Fix For: 4.0.0-alpha-1 > > > For the release we need to create the appropriate sql scripts for HMS db > initialization -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25994) Analyze table runs into ClassNotFoundException-s in case binary distribution is used
[ https://issues.apache.org/jira/browse/HIVE-25994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-25994: --- Assignee: Zoltan Haindrich > Analyze table runs into ClassNotFoundException-s in case binary distribution > is used > > > Key: HIVE-25994 > URL: https://issues.apache.org/jira/browse/HIVE-25994 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Fix For: 4.0.0-alpha-1 > > > any nightly release can be used to reproduce this: > {code} > create table t (a integer); insert into t values (1) ; analyze table t > compute statistics for columns; > {code} > results in > {code} > Caused by: java.lang.NoClassDefFoundError: org/antlr/runtime/tree/CommonTree > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:757) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > at java.net.URLClassLoader.access$100(URLClassLoader.java:74) > at java.net.URLClassLoader$1.run(URLClassLoader.java:369) > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:362) > at java.lang.ClassLoader.loadClass(ClassLoader.java:419) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:352) > at java.lang.Class.getDeclaredConstructors0(Native Method) > at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671) > at java.lang.Class.getConstructor0(Class.java:3075) > at java.lang.Class.getDeclaredConstructor(Class.java:2178) > at > org.apache.hive.com.esotericsoftware.reflectasm.ConstructorAccess.get(ConstructorAccess.java:65) > at > org.apache.hive.com.esotericsoftware.kryo.util.DefaultInstantiatorStrategy.newInstantiatorOf(DefaultInstantiatorStrategy.java:60) > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1119) > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1128) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:153) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:118) > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:729) > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.ReflectField.read(ReflectField.java:125) > ... 38 more > Caused by: java.lang.ClassNotFoundException: org.antlr.runtime.tree.CommonTree > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:419) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:352) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps
[ https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-23556: Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) merged into master. Thank you [~ibenny]! > Support hive.metastore.limit.partition.request for get_partitions_ps > > > Key: HIVE-23556 > URL: https://issues.apache.org/jira/browse/HIVE-23556 > Project: Hive > Issue Type: Improvement >Reporter: Toshihiko Uchida >Assignee: iBenny >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23556.2.patch, HIVE-23556.3.patch, > HIVE-23556.4.patch, HIVE-23556.patch > > Time Spent: 20m > Remaining Estimate: 0h > > HIVE-13884 added the configuration hive.metastore.limit.partition.request to > limit the number of partitions that can be requested. > Currently, it takes in effect for the following MetaStore APIs > * get_partitions, > * get_partitions_with_auth, > * get_partitions_by_filter, > * get_partitions_spec_by_filter, > * get_partitions_by_expr, > but not for > * get_partitions_ps, > * get_partitions_ps_with_auth. > This issue proposes to apply the configuration also to get_partitions_ps and > get_partitions_ps_with_auth. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25994) Analyze table runs into ClassNotFoundException-s in case binary distribution is used
[ https://issues.apache.org/jira/browse/HIVE-25994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-25994: Fix Version/s: 4.0.0-alpha-1 > Analyze table runs into ClassNotFoundException-s in case binary distribution > is used > > > Key: HIVE-25994 > URL: https://issues.apache.org/jira/browse/HIVE-25994 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Priority: Major > Fix For: 4.0.0-alpha-1 > > > any nightly release can be used to reproduce this: > {code} > create table t (a integer); insert into t values (1) ; analyze table t > compute statistics for columns; > {code} > results in > {code} > Caused by: java.lang.NoClassDefFoundError: org/antlr/runtime/tree/CommonTree > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:757) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > at java.net.URLClassLoader.access$100(URLClassLoader.java:74) > at java.net.URLClassLoader$1.run(URLClassLoader.java:369) > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:362) > at java.lang.ClassLoader.loadClass(ClassLoader.java:419) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:352) > at java.lang.Class.getDeclaredConstructors0(Native Method) > at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671) > at java.lang.Class.getConstructor0(Class.java:3075) > at java.lang.Class.getDeclaredConstructor(Class.java:2178) > at > org.apache.hive.com.esotericsoftware.reflectasm.ConstructorAccess.get(ConstructorAccess.java:65) > at > org.apache.hive.com.esotericsoftware.kryo.util.DefaultInstantiatorStrategy.newInstantiatorOf(DefaultInstantiatorStrategy.java:60) > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1119) > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1128) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:153) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:118) > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:729) > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216) > at > org.apache.hive.com.esotericsoftware.kryo.serializers.ReflectField.read(ReflectField.java:125) > ... 38 more > Caused by: java.lang.ClassNotFoundException: org.antlr.runtime.tree.CommonTree > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:419) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:352) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25665) Checkstyle LGPL files must not be in the release sources/binaries
[ https://issues.apache.org/jira/browse/HIVE-25665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-25665: Fix Version/s: 4.0.0-alpha-1 > Checkstyle LGPL files must not be in the release sources/binaries > - > > Key: HIVE-25665 > URL: https://issues.apache.org/jira/browse/HIVE-25665 > Project: Hive > Issue Type: Task > Components: Build Infrastructure >Affects Versions: 0.6.0 >Reporter: Stamatis Zampetakis >Priority: Blocker > Fix For: 4.0.0-alpha-1 > > > As discussed in the [dev > list|https://lists.apache.org/thread/r13e3236aa72a070b3267ed95f7cb3b45d3c4783fd4ca35f5376b1a35@%3cdev.hive.apache.org%3e] > LGPL files must not be present in the Apache released sources/binaries. > The following files must not be present in the release: > https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/checkstyle/checkstyle-noframes-sorted.xsl > https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/storage-api/checkstyle/checkstyle-noframes-sorted.xsl > https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/standalone-metastore/checkstyle/checkstyle-noframes-sorted.xsl > There may be other checkstyle LGPL files in the repo. All these should either > be removed entirely from the repository or selectively excluded from the > release. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25987) Incorrectly formatted pom.xml error in Beeline
[ https://issues.apache.org/jira/browse/HIVE-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498977#comment-17498977 ] Zoltan Haindrich commented on HIVE-25987: - note: for the [PR|https://github.com/apache/hive/pull/2824] in question the "tests passed" label was added in November ; roughly 5 months before it was mergedmergining such changes without re-running the CI is really risky... I think these labels should be removed after say 15 days they were given...not sure how that could be done... > Incorrectly formatted pom.xml error in Beeline > -- > > Key: HIVE-25987 > URL: https://issues.apache.org/jira/browse/HIVE-25987 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Abhay >Priority: Major > > After applying the patch [https://github.com/apache/hive/pull/3043,] > HIVE-25750, the precommit tests have started complaining of this > *!!! incorrectly formatted pom.xmls detected; see above!* > The code built fine locally and the pre-commit tests had run fine. Need to > investigate further why this was not caught earlier but the pom.xml file > needs to be fixed. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25970) Missing messages in HS2 operation logs
[ https://issues.apache.org/jira/browse/HIVE-25970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498120#comment-17498120 ] Zoltan Haindrich commented on HIVE-25970: - we just talked with [~zabetak]; and HIVE-24590 makes HIVE-22753 unneccessary - and it may only cause trouble (lost messages) > Missing messages in HS2 operation logs > -- > > Key: HIVE-25970 > URL: https://issues.apache.org/jira/browse/HIVE-25970 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > After HIVE-22753 & HIVE-24590, with some unlucky timing of events, operation > log messages can get lost and never appear in the appropriate files. > The changes in HIVE-22753 will prevent a {{HushableRandomAccessFileAppender}} > from being created if the latter refers to a file that has been closed in the > last second. Preventing the creation of the appender also means that the > message which triggered the creation will be lost forever. In fact any > message (for the same query) that comes in the interval of 1 second will be > lost forever. > Before HIVE-24590 the appender/file was closed only once (explicitly by HS2) > and thus the problem may be very hard to notice in practice. However, with > the arrival of HIVE-24590 appenders may close much more frequently (and not > via HS2) making the issue reproducible rather easily. It suffices to set > _hive.server2.operation.log.purgePolicy.timeToLive_ property very low and > check the operation logs. > The problem was discovered by investigating some intermittent failures in > operation logging tests (e.g., TestOperationLoggingAPIWithTez). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25977) Enhance Compaction Cleaner to skip when there is nothing to do #2
[ https://issues.apache.org/jira/browse/HIVE-25977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-25977: --- > Enhance Compaction Cleaner to skip when there is nothing to do #2 > - > > Key: HIVE-25977 > URL: https://issues.apache.org/jira/browse/HIVE-25977 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > initially this was just an addendum to the original patch ; but got delayed > and altered - so it should have its own ticket -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25874) Slow filter evaluation of nest struct fields in vectorized executions
[ https://issues.apache.org/jira/browse/HIVE-25874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-25874. - Fix Version/s: 4.0.0 Resolution: Fixed merged into master. Thank you [~kkasa] for reviewing the changes! > Slow filter evaluation of nest struct fields in vectorized executions > - > > Key: HIVE-25874 > URL: https://issues.apache.org/jira/browse/HIVE-25874 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > time is spent at resizing vectors around > [here|https://github.com/apache/hive/blob/200c0bf1feb259f4d95bf065a2ab38fe684383da/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java#L252] > or in some other "ensureSize" method > {code:java} > create table t as > select > named_struct('id',13,'str','string','nest',named_struct('id',12,'str','string','arr',array('value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value'))) > s; > -- go up to 1M rows > insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > -- insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > set hive.fetch.task.conversion=none; > select count(1) from t; > --explain > select s > .id from t > where > s > .nest > .id > 0; > {code} > interestingly; the issue is not present: > * for a query not looking into the nested struct > * and in case the struct with the array is at the top level > {code} > select count(1) from t; > --explain > select s > .id from t > where > s > -- .nest > .id > 0; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25844) Exception deserialization error-s may cause beeline to terminate immediately
[ https://issues.apache.org/jira/browse/HIVE-25844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-25844. - Fix Version/s: 4.0.0 Resolution: Fixed merged into branch-3. Thank you Krisztian for reviewing the changes@! > Exception deserialization error-s may cause beeline to terminate immediately > > > Key: HIVE-25844 > URL: https://issues.apache.org/jira/browse/HIVE-25844 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 3.1.2 >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > the exception on the server side happens: > * fetch task conversion is on > * there is an exception during reading the table the error bubbles up > * => transmits a message to beeline that error class name is: > "org.apache.phoenix.schema.ColumnNotFoundException" + the message > * it tries to reconstruct the exception around HiveSqlException > * but during the constructor call > org.apache.phoenix.exception.SQLExceptionCode is needed which fails to load > org/apache/hadoop/hbase/shaded/com/google/protobuf/Service > * a > java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/shaded/com/google/protobuf/Service is thrown - which > is not handled in that method - so it becomes a real error ; and shuts down > the client > {code:java} > java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/shaded/com/google/protobuf/Service > [...] > at java.lang.Class.forName(Class.java:264) > at > org.apache.hive.service.cli.HiveSQLException.newInstance(HiveSQLException.java:245) > at > org.apache.hive.service.cli.HiveSQLException.toStackTrace(HiveSQLException.java:211) > [...] > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.shaded.com.google.protobuf.Service > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-21152) Rewrite if expression to case and recognize simple case as an if
[ https://issues.apache.org/jira/browse/HIVE-21152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-21152: Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) merged into master. Thank you [~kkasa] for revieweing the changes! > Rewrite if expression to case and recognize simple case as an if > > > Key: HIVE-21152 > URL: https://issues.apache.org/jira/browse/HIVE-21152 > Project: Hive > Issue Type: Improvement > Components: CBO >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21152.01.patch, HIVE-21152.02.patch, > HIVE-21152.03.patch, HIVE-21152.04.patch, HIVE-21152.05.patch, > HIVE-21152.06.patch, HIVE-21152.07.patch > > Time Spent: 20m > Remaining Estimate: 0h > > * {{IF}} is not part of the sql standard; however given its special form its > simpler - and currently in Hive it also has vectorized support > * people writing standard sql may write: {{CASE WHEN member=1 THEN attr+1 > else attr+2 END}} which is essentially an if. > The idea is to rewrite IFs to CASEs for the cbo; and recognize simple > "CASE"-s as IFs to get vectorization on them if possible -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25715) Provide nightly builds
[ https://issues.apache.org/jira/browse/HIVE-25715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-25715. - Resolution: Fixed merged into master. Thank you [~kkasa] for reviewing the changes! > Provide nightly builds > -- > > Key: HIVE-25715 > URL: https://issues.apache.org/jira/browse/HIVE-25715 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > provide nightly builds for the master branch -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25944) Format pom.xml-s
[ https://issues.apache.org/jira/browse/HIVE-25944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-25944. - Fix Version/s: 4.0.0 Resolution: Fixed merged into master. Thank you [~dengzh] for reviewing the changes! > Format pom.xml-s > > > Key: HIVE-25944 > URL: https://issues.apache.org/jira/browse/HIVE-25944 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > at the moment I touch pom.xml-s with xmlstarlet it starts fixing indentation > which makes seeing real diffs harder. > fix and enforce that the pom.xmls are indented correctly -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25942) Upgrade commons-io to 2.8.0 due to CVE-2021-29425
[ https://issues.apache.org/jira/browse/HIVE-25942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-25942. - Resolution: Fixed merged into master. Thank you [~srahman]! > Upgrade commons-io to 2.8.0 due to CVE-2021-29425 > - > > Key: HIVE-25942 > URL: https://issues.apache.org/jira/browse/HIVE-25942 > Project: Hive > Issue Type: Bug >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Due to [CVE-2021-29425|https://nvd.nist.gov/vuln/detail/CVE-2021-29425] all > the commons-io versions below 2.7 are affected. > Tez and Hadoop have upgraded commons-io to 2.8.0 in > [TEZ-4353|https://issues.apache.org/jira/browse/TEZ-4353] and > [HADOOP-17683|https://issues.apache.org/jira/browse/HADOOP-17683] > respectively and it will be good if Hive also follows the same. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25944) Format pom.xml-s
[ https://issues.apache.org/jira/browse/HIVE-25944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-25944: --- > Format pom.xml-s > > > Key: HIVE-25944 > URL: https://issues.apache.org/jira/browse/HIVE-25944 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > at the moment I touch pom.xml-s with xmlstarlet it starts fixing indentation > which makes seeing real diffs harder. > fix and enforce that the pom.xmls are indented correctly -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps
[ https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489476#comment-17489476 ] Zoltan Haindrich commented on HIVE-23556: - [~touchida] could you open a PR against the hive repo on github? https://github.com/apache/hive/pulls > Support hive.metastore.limit.partition.request for get_partitions_ps > > > Key: HIVE-23556 > URL: https://issues.apache.org/jira/browse/HIVE-23556 > Project: Hive > Issue Type: Improvement >Reporter: Toshihiko Uchida >Assignee: Toshihiko Uchida >Priority: Minor > Attachments: HIVE-23556.2.patch, HIVE-23556.3.patch, > HIVE-23556.4.patch, HIVE-23556.patch > > > HIVE-13884 added the configuration hive.metastore.limit.partition.request to > limit the number of partitions that can be requested. > Currently, it takes in effect for the following MetaStore APIs > * get_partitions, > * get_partitions_with_auth, > * get_partitions_by_filter, > * get_partitions_spec_by_filter, > * get_partitions_by_expr, > but not for > * get_partitions_ps, > * get_partitions_ps_with_auth. > This issue proposes to apply the configuration also to get_partitions_ps and > get_partitions_ps_with_auth. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-24887) getDatabase() to call translation code even if client has no capabilities
[ https://issues.apache.org/jira/browse/HIVE-24887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-24887: Labels: metastore_translator pull-request-available (was: pull-request-available) > getDatabase() to call translation code even if client has no capabilities > - > > Key: HIVE-24887 > URL: https://issues.apache.org/jira/browse/HIVE-24887 > Project: Hive > Issue Type: Sub-task >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Major > Labels: metastore_translator, pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > We do this for other calls that go thru translation layer. For some reason, > the current code only calls it when the client sets the capabilities. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-24920) TRANSLATED_TO_EXTERNAL tables may write to the same location
[ https://issues.apache.org/jira/browse/HIVE-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-24920: Labels: metastore_translator pull-request-available (was: pull-request-available) > TRANSLATED_TO_EXTERNAL tables may write to the same location > > > Key: HIVE-24920 > URL: https://issues.apache.org/jira/browse/HIVE-24920 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: metastore_translator, pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > {code} > create table t (a integer); > insert into t values(1); > alter table t rename to t2; > create table t (a integer); -- I expected an exception from this command > (location already exists) but because its an external table no exception > insert into t values(2); > select * from t; -- shows 1 and 2 > drop table t2;-- wipes out data location > select * from t; -- empty resultset > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25303) CTAS hive.create.as.external.legacy tries to place data files in managed WH path
[ https://issues.apache.org/jira/browse/HIVE-25303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-25303: Labels: metastore_translator pull-request-available (was: pull-request-available) > CTAS hive.create.as.external.legacy tries to place data files in managed WH > path > > > Key: HIVE-25303 > URL: https://issues.apache.org/jira/browse/HIVE-25303 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Standalone Metastore >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: metastore_translator, pull-request-available > Fix For: 4.0.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Under legacy table creation mode (hive.create.as.external.legacy=true), when > a database has been created in a specific LOCATION, in a session where that > database is Used, tables are created using the following command: > {code:java} > CREATE TABLE AS SELECT {code} > should inherit the HDFS path from the database's location. Instead, Hive is > trying to write the table data into > /warehouse/tablespace/managed/hive// > +Design+: > In the CTAS query, first data is written in the target directory (which > happens in HS2) and then the table is created(This happens in HMS). So here > two decisions are being made i) target directory location ii) how the table > should be created (table type, sd e.t.c). > When HS2 needs a target location that needs to be set, it'll make create a > table dry run call to HMS (where table translation happens) and i) and ii) > decisions are made within HMS and returns table object. Then HS2 will use > this location set by HMS for placing the data. > The patch for issue addresses the table location being incorrect and table > data being empty for the following cases 1) when the external legacy config > is set i.e.., hive.create.as.external.legacy=true 2) when the table is > created with the transactional property set to false i.e.., TBLPROPERTIES > ('transactional'='false') -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25782) Create Table As Select fails for managed ACID tables
[ https://issues.apache.org/jira/browse/HIVE-25782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-25782: Labels: metastore_translator pull-request-available (was: pull-request-available) > Create Table As Select fails for managed ACID tables > > > Key: HIVE-25782 > URL: https://issues.apache.org/jira/browse/HIVE-25782 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Reporter: Csaba Juhász >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: metastore_translator, pull-request-available > Attachments: ctas_acid_managed.q > > Time Spent: 1h > Remaining Estimate: 0h > > Create Table As Select fails for managed ACID tables: > *MetaException(message:Processor has no capabilities, cannot create an ACID > table.)* > HMSHandler.translate_table_dryrun invokes > MetastoreDefaultTransformer.transformCreateTable with null > processorCapabilities and processorId. > https://github.com/apache/hive/blob/c7fdd459305f4bf6913dc4bed7e8df8c7bf9e458/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java#L2251 > {code:java} > Dec 06 05:32:47 Starting translation for CreateTable for processor null with > null on table vectortab10korc > Dec 06 05:32:47 MetaException(message:Processor has no capabilities, cannot > create an ACID table.) > at > org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer.transformCreateTable(MetastoreDefaultTransformer.java:663) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.translate_table_dryrun(HiveMetaStore.java:2159) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) > at com.sun.proxy.$Proxy29.translate_table_dryrun(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$translate_table_dryrun.getResult(ThriftHiveMetastore.java:16981) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$translate_table_dryrun.getResult(ThriftHiveMetastore.java:16965) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:643) > at > org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:638) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) > at > org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:638) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > {code} > Reproduction ([^ctas_acid_managed.q]): > {code:java} > set hive.support.concurrency=true; > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > set > metastore.metadata.transformer.class=org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer; > create table test stored as orc tblproperties ('transactional'='true') as > select from_unixtime(unix_timestamp("0002-01-01 09:57:21", "-MM-dd > HH:mm:ss")); {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25630) Transformer fixes
[ https://issues.apache.org/jira/browse/HIVE-25630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-25630: Labels: metastore_translator pull-request-available (was: pull-request-available) > Transformer fixes > - > > Key: HIVE-25630 > URL: https://issues.apache.org/jira/browse/HIVE-25630 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: metastore_translator, pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > there are some issues: > * AlreadyExistsException might be suppressed by the translator > * uppercase letter usage may cause problems for some clients > * add a way to suppress location checks for legacy clients -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-24954) MetastoreTransformer is disabled during testing
[ https://issues.apache.org/jira/browse/HIVE-24954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-24954: Labels: metastore_translator pull-request-available (was: pull-request-available) > MetastoreTransformer is disabled during testing > --- > > Key: HIVE-24954 > URL: https://issues.apache.org/jira/browse/HIVE-24954 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: metastore_translator, pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > all calls are fortified with "isInTest" guards to avoid testing those calls > (!@#$#) > https://github.com/apache/hive/blob/86fa9b30fe347c7fc78a2930f4d20ece2e124f03/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java#L1647 > this causes some wierd behaviour: > out of the box hive installation creates TRANSLATED_TO_EXTERNAL external > tables for plain CREATE TABLE commands > meanwhile during when most testing is executed CREATE table creates regular > MANAGED tables... -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-24951) Table created with Uppercase name using CTAS does not produce result for select queries
[ https://issues.apache.org/jira/browse/HIVE-24951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-24951: Labels: metastore_translator pull-request-available (was: pull-request-available) > Table created with Uppercase name using CTAS does not produce result for > select queries > --- > > Key: HIVE-24951 > URL: https://issues.apache.org/jira/browse/HIVE-24951 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 4.0.0 >Reporter: Rajkumar Singh >Assignee: Rajkumar Singh >Priority: Major > Labels: metastore_translator, pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Steps to repro: > {code:java} > CREATE EXTERNAL TABLE MY_TEST AS SELECT * FROM source > Table created with Location but does not have any data moved to it. > /warehouse/tablespace/external/hive/MY_TEST > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25707) SchemaTool may leave the metastore in-between upgrade steps
[ https://issues.apache.org/jira/browse/HIVE-25707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482378#comment-17482378 ] Zoltan Haindrich commented on HIVE-25707: - [~rahulp] yes; it could probably catch a lot of problematic cases I've wrote a test for it - but we run the sql-s using sqlline ; if I disable auto-commit - the file is executed without being committed in the end...unless the jdbc driver autocommit-s it... I leave a reference to my branch here - in case someone picks this up later https://github.com/kgyrtkirk/hive/tree/HIVE-25707-schematool-commit > SchemaTool may leave the metastore in-between upgrade steps > --- > > Key: HIVE-25707 > URL: https://issues.apache.org/jira/browse/HIVE-25707 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Priority: Major > > it seems like: > * schematool runs the sql files via beeline > * autocommit is turned on > * pressing ctrl+c or killing the process will result in an invalid schema > https://github.com/apache/hive/blob/6e02f6164385a370ee8014c795bee1fa423d7937/beeline/src/java/org/apache/hive/beeline/schematool/HiveSchemaTool.java#L79 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25883) Enhance Compaction Cleaner to skip when there is nothing to do
[ https://issues.apache.org/jira/browse/HIVE-25883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481814#comment-17481814 ] Zoltan Haindrich commented on HIVE-25883: - {code} // aborted txn: 3881 // com:8020/warehouse/tablespace/managed/hive/test_835163/base_0003209_v0003877 // com:8020/warehouse/tablespace/managed/hive/test_835163/delta_0003561_0003561_000 // @,type:MAJOR,enqueueTime:0,start:0,properties:null,runAs:hive,tooManyAborts:false, // hasOldAbort:false,highestWriteId:3309,errorMessage:null {code} > Enhance Compaction Cleaner to skip when there is nothing to do > -- > > Key: HIVE-25883 > URL: https://issues.apache.org/jira/browse/HIVE-25883 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > the cleaner works the following way: > * it identifies obsolete directories (delta dirs ; which doesn't have open > txns) > * removes them and done > if there are no obsolete directoris that is attributed to that there might be > open txns so the request should be retried later. > however if for some reason the directory was already cleaned - similarily it > has no obsolete directories; and thus the request is retried for forever -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25672) Hive isn't purging older compaction entries from show compaction command
[ https://issues.apache.org/jira/browse/HIVE-25672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481313#comment-17481313 ] Zoltan Haindrich commented on HIVE-25672: - HIVE-25633 could cause the AcidHouseKeeperService to not run > Hive isn't purging older compaction entries from show compaction command > > > Key: HIVE-25672 > URL: https://issues.apache.org/jira/browse/HIVE-25672 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore, Transactions >Affects Versions: 3.1.1 >Reporter: Rohan Nimmagadda >Priority: Minor > > Added below properties in hive-site, but it's not enforced to auto purging. > When we run show compaction command it takes forever and returns billions of > rows. > Result of show compactions command : > {code:java} > 752,450 rows selected (198.066 seconds) > {code} > {code:java} > hive.compactor.history.retention.succeeded": "10", > "hive.compactor.history.retention.failed": "10", > "hive.compactor.history.retention.attempted": "10", > "hive.compactor.history.reaper.interval": "10m" {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25672) Hive isn't purging older compaction entries from show compaction command
[ https://issues.apache.org/jira/browse/HIVE-25672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481265#comment-17481265 ] Zoltan Haindrich commented on HIVE-25672: - I tried to reproduce this issue; at first metastore.compactor.initiator.on was disabled on my cluster for some reason; but after turning that on things started working correctly: * a metastore with 52M of heap was able to cleanup 10K of records in no time ** and was OOM-ed for 100K * a metastore with 966K rows in the COMPLETED_COMPACTIONS table ** removed 50773 rows multiple times - and was able to reduce the volume to below 100 in around a minute I don't know if we have an issue here - as it seems like that most likely for some reason either the `AcidHouseKeeperService` is not running - or stopped running for some reason > Hive isn't purging older compaction entries from show compaction command > > > Key: HIVE-25672 > URL: https://issues.apache.org/jira/browse/HIVE-25672 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore, Transactions >Affects Versions: 3.1.1 >Reporter: Rohan Nimmagadda >Priority: Minor > > Added below properties in hive-site, but it's not enforced to auto purging. > When we run show compaction command it takes forever and returns billions of > rows. > Result of show compactions command : > {code:java} > 752,450 rows selected (198.066 seconds) > {code} > {code:java} > hive.compactor.history.retention.succeeded": "10", > "hive.compactor.history.retention.failed": "10", > "hive.compactor.history.retention.attempted": "10", > "hive.compactor.history.reaper.interval": "10m" {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25883) Enhance Compaction Cleaner to skip when there is nothing to do
[ https://issues.apache.org/jira/browse/HIVE-25883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-25883. - Fix Version/s: 4.0.0 Resolution: Fixed merged into master; Thank you Denys for reviewing the changes! > Enhance Compaction Cleaner to skip when there is nothing to do > -- > > Key: HIVE-25883 > URL: https://issues.apache.org/jira/browse/HIVE-25883 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > the cleaner works the following way: > * it identifies obsolete directories (delta dirs ; which doesn't have open > txns) > * removes them and done > if there are no obsolete directoris that is attributed to that there might be > open txns so the request should be retried later. > however if for some reason the directory was already cleaned - similarily it > has no obsolete directories; and thus the request is retried for forever -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25883) Enhance Compaction Cleaner to skip when there is nothing to do
[ https://issues.apache.org/jira/browse/HIVE-25883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-25883: --- > Enhance Compaction Cleaner to skip when there is nothing to do > -- > > Key: HIVE-25883 > URL: https://issues.apache.org/jira/browse/HIVE-25883 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > the cleaner works the following way: > * it identifies obsolete directories (delta dirs ; which doesn't have open > txns) > * removes them and done > if there are no obsolete directoris that is attributed to that there might be > open txns so the request should be retried later. > however if for some reason the directory was already cleaned - similarily it > has no obsolete directories; and thus the request is retried for forever -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25874) Slow filter evaluation of nest struct fields in vectorized executions
[ https://issues.apache.org/jira/browse/HIVE-25874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477819#comment-17477819 ] Zoltan Haindrich commented on HIVE-25874: - issue is caused by that VectorStructField doesnt resets the output vector - which causes that the array in it will retain all previous elementsand it will keep expanding the backing vector. it took 21 minutes to execute the query before the patch; after it 2seconds > Slow filter evaluation of nest struct fields in vectorized executions > - > > Key: HIVE-25874 > URL: https://issues.apache.org/jira/browse/HIVE-25874 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > time is spent at resizing vectors around > [here|https://github.com/apache/hive/blob/200c0bf1feb259f4d95bf065a2ab38fe684383da/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java#L252] > or in some other "ensureSize" method > {code:java} > create table t as > select > named_struct('id',13,'str','string','nest',named_struct('id',12,'str','string','arr',array('value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value'))) > s; > -- go up to 1M rows > insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > -- insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > set hive.fetch.task.conversion=none; > select count(1) from t; > --explain > select s > .id from t > where > s > .nest > .id > 0; > {code} > interestingly; the issue is not present: > * for a query not looking into the nested struct > * and in case the struct with the array is at the top level > {code} > select count(1) from t; > --explain > select s > .id from t > where > s > -- .nest > .id > 0; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25874) Slow filter evaluation of nest struct fields in vectorized executions
[ https://issues.apache.org/jira/browse/HIVE-25874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-25874: --- Assignee: Zoltan Haindrich > Slow filter evaluation of nest struct fields in vectorized executions > - > > Key: HIVE-25874 > URL: https://issues.apache.org/jira/browse/HIVE-25874 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > time is spent at resizing vectors around > [here|https://github.com/apache/hive/blob/200c0bf1feb259f4d95bf065a2ab38fe684383da/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java#L252] > or in some other "ensureSize" method > {code:java} > create table t as > select > named_struct('id',13,'str','string','nest',named_struct('id',12,'str','string','arr',array('value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value'))) > s; > -- go up to 1M rows > insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > -- insert into table t select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t union all select * from t union all select * from t union all > select * from t; > set hive.fetch.task.conversion=none; > select count(1) from t; > --explain > select s > .id from t > where > s > .nest > .id > 0; > {code} > interestingly; the issue is not present: > * for a query not looking into the nested struct > * and in case the struct with the array is at the top level > {code} > select count(1) from t; > --explain > select s > .id from t > where > s > -- .nest > .id > 0; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25874) Slow filter evaluation of nest struct fields in vectorized executions
[ https://issues.apache.org/jira/browse/HIVE-25874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-25874: Description: time is spent at resizing vectors around [here|https://github.com/apache/hive/blob/200c0bf1feb259f4d95bf065a2ab38fe684383da/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java#L252] or in some other "ensureSize" method {code:java} create table t as select named_struct('id',13,'str','string','nest',named_struct('id',12,'str','string','arr',array('value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value'))) s; -- go up to 1M rows insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t; insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t; insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t; insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t; insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t; -- insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t; set hive.fetch.task.conversion=none; select count(1) from t; --explain select s .id from t where s .nest .id > 0; {code} interestingly; the issue is not present: * for a query not looking into the nested struct * and in case the struct with the array is at the top level {code} select count(1) from t; --explain select s .id from t where s -- .nest .id > 0; {code} was: {code:java} create table t as select named_struct('id',13,'str','string','nest',named_struct('id',12,'str','string','arr',array('value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value'))) s; -- go up to 1M rows insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t; insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t; insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t; insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t; insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t; -- insert into table t select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t union all select * from t; set hive.fetch.task.conversion=none; select count(1) from t; --explain select s .id from t where s .nest .id > 0; {code} interestingly; the issue is not present: * for a query not looking into the nested struct * and in case the struct with the array is at the top level {code} select count(1) from t; --explain select s .id from
[jira] [Commented] (HIVE-25844) Exception deserialization error-s may cause beeline to terminate immediately
[ https://issues.apache.org/jira/browse/HIVE-25844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472679#comment-17472679 ] Zoltan Haindrich commented on HIVE-25844: - makes sense; backported HIVE-24772 instead > Exception deserialization error-s may cause beeline to terminate immediately > > > Key: HIVE-25844 > URL: https://issues.apache.org/jira/browse/HIVE-25844 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 3.1.2 >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > the exception on the server side happens: > * fetch task conversion is on > * there is an exception during reading the table the error bubbles up > * => transmits a message to beeline that error class name is: > "org.apache.phoenix.schema.ColumnNotFoundException" + the message > * it tries to reconstruct the exception around HiveSqlException > * but during the constructor call > org.apache.phoenix.exception.SQLExceptionCode is needed which fails to load > org/apache/hadoop/hbase/shaded/com/google/protobuf/Service > * a > java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/shaded/com/google/protobuf/Service is thrown - which > is not handled in that method - so it becomes a real error ; and shuts down > the client > {code:java} > java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/shaded/com/google/protobuf/Service > [...] > at java.lang.Class.forName(Class.java:264) > at > org.apache.hive.service.cli.HiveSQLException.newInstance(HiveSQLException.java:245) > at > org.apache.hive.service.cli.HiveSQLException.toStackTrace(HiveSQLException.java:211) > [...] > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.shaded.com.google.protobuf.Service > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25844) Exception deserialization error-s may cause beeline to terminate immediately
[ https://issues.apache.org/jira/browse/HIVE-25844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-25844: --- > Exception deserialization error-s may cause beeline to terminate immediately > > > Key: HIVE-25844 > URL: https://issues.apache.org/jira/browse/HIVE-25844 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 3.1.2 >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > the exception on the server side happens: > * fetch task conversion is on > * there is an exception during reading the table the error bubbles up > * => transmits a message to beeline that error class name is: > "org.apache.phoenix.schema.ColumnNotFoundException" + the message > * it tries to reconstruct the exception around HiveSqlException > * but during the constructor call > org.apache.phoenix.exception.SQLExceptionCode is needed which fails to load > org/apache/hadoop/hbase/shaded/com/google/protobuf/Service > * a > java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/shaded/com/google/protobuf/Service is thrown - which > is not handled in that method - so it becomes a real error ; and shuts down > the client > {code:java} > java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/shaded/com/google/protobuf/Service > [...] > at java.lang.Class.forName(Class.java:264) > at > org.apache.hive.service.cli.HiveSQLException.newInstance(HiveSQLException.java:245) > at > org.apache.hive.service.cli.HiveSQLException.toStackTrace(HiveSQLException.java:211) > [...] > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.shaded.com.google.protobuf.Service > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25820) Provide a way to disable join filters
[ https://issues.apache.org/jira/browse/HIVE-25820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-25820. - Resolution: Won't Fix this is not really a good option - as optionally disabling this feature may result in incorrect results in some form > Provide a way to disable join filters > - > > Key: HIVE-25820 > URL: https://issues.apache.org/jira/browse/HIVE-25820 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25822) Incorrect False positive result rows may be outputted in case outer join has conditions only affecting one side
[ https://issues.apache.org/jira/browse/HIVE-25822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-25822. - Resolution: Fixed Merged into master. > Incorrect False positive result rows may be outputted in case outer join has > conditions only affecting one side > --- > > Key: HIVE-25822 > URL: https://issues.apache.org/jira/browse/HIVE-25822 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > needed > * outer join > * on condition has at least one condition for one side of the join > * in a single reducer: > ** a right hand side only row outputted right before > ** >=2 rows on LHS and 1 on RHS matching in the join keys but the first LHS > doesn't satisfies the filter condition > ** second LHS row with good filter condition > {code} > with > t_y as (select col1 as id,col2 as s from (VALUES(0,'a'),(1,'y')) as c), > t_xy as (select col1 as id,col2 as s from (VALUES(1,'x'),(1,'y')) as c) > select * from t_xy l full outer join t_y r on (l.id=r.id and l.s='y'); > {code} > null,null,1,y is a false positive result > {code} > +---+---+---+---+ > | l.id | l.s | r.id | r.s | > +---+---+---+---+ > | NULL | NULL | 0 | a | > | 1 | x | NULL | NULL | > | NULL | NULL | 1 | y | > | 1 | y | 1 | y | > +---+---+---+---+ > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25823) Incorrect false positive results for outer join using non-satisfiable residual filters
[ https://issues.apache.org/jira/browse/HIVE-25823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-25823. - Resolution: Duplicate this is the same issue as HIVE-25822; I most likely made some mistake with checking the results on a different branch > Incorrect false positive results for outer join using non-satisfiable > residual filters > -- > > Key: HIVE-25823 > URL: https://issues.apache.org/jira/browse/HIVE-25823 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > similar to HIVE-25822 > {code} > create table t_y (id integer,s string); > create table t_xy (id integer,s string); > insert into t_y values(0,'a'),(1,'y'),(1,'x'); > insert into t_xy values(1,'x'),(1,'y'); > select * from t_xy l full outer join t_y r on (l.id=r.id and l.s='y' and > l.id+2*r.id=1); > {code} > the rows with full of NULLs are incorrect > {code} > +---+---+---+---+ > | l.id | l.s | r.id | r.s | > +---+---+---+---+ > | NULL | NULL | 0 | a | > | NULL | NULL | NULL | NULL | > | 1 | y | NULL | NULL | > | NULL | NULL | NULL | NULL | > | NULL | NULL | 1 | y | > | NULL | NULL | 1 | x | > +---+---+---+---+ > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25823) Incorrect false positive results for outer join using non-satisfiable residual filters
[ https://issues.apache.org/jira/browse/HIVE-25823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-25823: --- Assignee: Zoltan Haindrich > Incorrect false positive results for outer join using non-satisfiable > residual filters > -- > > Key: HIVE-25823 > URL: https://issues.apache.org/jira/browse/HIVE-25823 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich >Priority: Major > > similar to HIVE-25822 > {code} > create table t_y (id integer,s string); > create table t_xy (id integer,s string); > insert into t_y values(0,'a'),(1,'y'),(1,'x'); > insert into t_xy values(1,'x'),(1,'y'); > select * from t_xy l full outer join t_y r on (l.id=r.id and l.s='y' and > l.id+2*r.id=1); > {code} > the rows with full of NULLs are incorrect > {code} > +---+---+---+---+ > | l.id | l.s | r.id | r.s | > +---+---+---+---+ > | NULL | NULL | 0 | a | > | NULL | NULL | NULL | NULL | > | 1 | y | NULL | NULL | > | NULL | NULL | NULL | NULL | > | NULL | NULL | 1 | y | > | NULL | NULL | 1 | x | > +---+---+---+---+ > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)