[jira] [Commented] (HIVE-28013) No space left on device when running precommit tests

2024-02-05 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17814329#comment-17814329
 ] 

Zoltan Haindrich commented on HIVE-28013:
-

oh sorry - I should have noticed your excellent comment there... :)
seems like nothing is easy nowadays...even a build tool dependency upgrade 
could hold surprises! :)

> No space left on device when running precommit tests
> 
>
> Key: HIVE-28013
> URL: https://issues.apache.org/jira/browse/HIVE-28013
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Blocker
> Fix For: 4.1.0
>
> Attachments: orphaned_item_strategy.png
>
>
> The Hive precommit tests fail due to lack of space. Few of the most recent 
> failures below:
> * 
> http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-4744/23/console
> * 
> http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5005/10/console
> {noformat}
> java.io.IOException: No space left on device
>   at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at 
> java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
>   at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
>   at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
>   at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)
>   at 
> org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.(RiverWriter.java:109)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:560)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:537)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:520)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:444)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:97)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:315)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:279)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
>   at 
> jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
>   at 
> jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28013) No space left on device when running precommit tests

2024-02-02 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813797#comment-17813797
 ] 

Zoltan Haindrich commented on HIVE-28013:
-

that's unfortunate; if you look at split-03 in 
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-5019/1/pipeline/950
 
seems like something have crashed in the postprocess thingy:
{code}
[2024-01-22T19:36:53.771Z] 
./standalone-metastore/metastore-server/target/surefire-reports/TEST-org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStoreZKBindHost.xml:47233.57:
 internal error: Huge input lookup

[2024-01-22T19:36:53.771Z] [DEBUG] 2024-01-22 19:14:03.393 
[Metastore-Handler-Pool: Thread-84] Persistence 

[2024-01-22T19:36:53.771Z]  
   ^
{code}

not sure what was having a bad day there - `xmlstarlet` ?

this issue could easily explain why it have started running out of space; I 
think analyzing and running the post script locally on the contents of the tgz 
could give further details what went wrong - and could possibly help creating a 
solution for it

> No space left on device when running precommit tests
> 
>
> Key: HIVE-28013
> URL: https://issues.apache.org/jira/browse/HIVE-28013
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Blocker
> Fix For: 4.1.0
>
> Attachments: orphaned_item_strategy.png
>
>
> The Hive precommit tests fail due to lack of space. Few of the most recent 
> failures below:
> * 
> http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-4744/23/console
> * 
> http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5005/10/console
> {noformat}
> java.io.IOException: No space left on device
>   at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at 
> java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
>   at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
>   at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
>   at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)
>   at 
> org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.(RiverWriter.java:109)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:560)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:537)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:520)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:444)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:97)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:315)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:279)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
>   at 
> jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
>   at 
> jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28013) No space left on device when running precommit tests

2024-01-22 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17809345#comment-17809345
 ] 

Zoltan Haindrich commented on HIVE-28013:
-

I believe the cleanup runs during the daily repo scan

> I checked the sizes of builds for master from 2021 to now and I didn't see 
> any huge spikes. It was always around 100M as I noted in a comment above.

I think those lines are about 10 *failed* tests - I think on the master there 
don't supposed to be failed tests :D 

> No space left on device when running precommit tests
> 
>
> Key: HIVE-28013
> URL: https://issues.apache.org/jira/browse/HIVE-28013
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Blocker
> Fix For: 4.1.0
>
> Attachments: orphaned_item_strategy.png
>
>
> The Hive precommit tests fail due to lack of space. Few of the most recent 
> failures below:
> * 
> http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-4744/23/console
> * 
> http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5005/10/console
> {noformat}
> java.io.IOException: No space left on device
>   at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at 
> java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
>   at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
>   at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
>   at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)
>   at 
> org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.(RiverWriter.java:109)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:560)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:537)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:520)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:444)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:97)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:315)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:279)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
>   at 
> jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
>   at 
> jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28013) No space left on device when running precommit tests

2024-01-19 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808666#comment-17808666
 ] 

Zoltan Haindrich commented on HIVE-28013:
-

fyi the amount of disk used by a build was estimated earlier; and it was 
working according to those estimates for around 2023 Feb ; I think there might 
be some ballast in the builds

| 2021 September | 141G | http://ci.hive.apache.org/job/space-check/100/ |
| 2022 Jul | 134G | http://ci.hive.apache.org/job/space-check/400/ |
| 2023 Feb | 141G  | http://ci.hive.apache.org/job/space-check/600/ |
| 2023 Aug | 170G | http://ci.hive.apache.org/job/space-check/800/| 
| 2023 Nov |  194G | http://ci.hive.apache.org/job/space-check/900/| 
| 2024 Jan19 | 209G | http://ci.hive.apache.org/job/space-check/950/|





> No space left on device when running precommit tests
> 
>
> Key: HIVE-28013
> URL: https://issues.apache.org/jira/browse/HIVE-28013
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Blocker
> Fix For: 4.1.0
>
> Attachments: orphaned_item_strategy.png
>
>
> The Hive precommit tests fail due to lack of space. Few of the most recent 
> failures below:
> * 
> http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-4744/23/console
> * 
> http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5005/10/console
> {noformat}
> java.io.IOException: No space left on device
>   at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at 
> java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
>   at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
>   at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
>   at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)
>   at 
> org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.(RiverWriter.java:109)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:560)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:537)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:520)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:444)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:97)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:315)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:279)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
>   at 
> jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
>   at 
> jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28013) No space left on device when running precommit tests

2024-01-19 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808661#comment-17808661
 ] 

Zoltan Haindrich commented on HIVE-28013:
-

there is by the way a job for checking the disk usage:

[http://ci.hive.apache.org/job/space-check/]

> No space left on device when running precommit tests
> 
>
> Key: HIVE-28013
> URL: https://issues.apache.org/jira/browse/HIVE-28013
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Blocker
> Fix For: 4.1.0
>
> Attachments: orphaned_item_strategy.png
>
>
> The Hive precommit tests fail due to lack of space. Few of the most recent 
> failures below:
> * 
> http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-4744/23/console
> * 
> http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5005/10/console
> {noformat}
> java.io.IOException: No space left on device
>   at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at 
> java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
>   at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
>   at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
>   at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)
>   at 
> org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.(RiverWriter.java:109)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:560)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:537)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:520)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:444)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:97)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:315)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:279)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
>   at 
> jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
>   at 
> jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28013) No space left on device when running precommit tests

2024-01-19 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17808659#comment-17808659
 ] 

Zoltan Haindrich commented on HIVE-28013:
-

15 days is not so much - I would recommend to raise it back ; and look around 
what crap the jobs are storing

I wonder how much these 10 log files cost: 
[https://github.com/apache/hive/blob/9c4eb96f816105560e7d4809f1d608e7eca9e523/Jenkinsfile#L366-L371]

there was this PR: [https://github.com/apache/hive/pull/4732]

 

from your notes it seems to me that 1 build which has those logs have gone from 
100M a master build is usually to 1.1G:

1.1G var/jenkins_home/jobs/hive-precommit/branches/PR-4566/builds/8 1.2G 
var/jenkins_home/jobs/hive-precommit/branches/PR-4566/builds/27

there was a discussion about reverting - but that never landed...

> No space left on device when running precommit tests
> 
>
> Key: HIVE-28013
> URL: https://issues.apache.org/jira/browse/HIVE-28013
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Blocker
> Fix For: 4.1.0
>
> Attachments: orphaned_item_strategy.png
>
>
> The Hive precommit tests fail due to lack of space. Few of the most recent 
> failures below:
> * 
> http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-4744/23/console
> * 
> http://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5005/10/console
> {noformat}
> java.io.IOException: No space left on device
>   at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at 
> java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62)
>   at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
>   at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
>   at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:280)
>   at 
> org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.(RiverWriter.java:109)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:560)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:537)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:520)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:444)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:97)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:315)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:279)
>   at 
> org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
>   at 
> jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
>   at 
> jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:829)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27759) Include docker daemon logs in case of docker issues

2023-10-03 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771465#comment-17771465
 ] 

Zoltan Haindrich commented on HIVE-27759:
-

hmm.. yes - that used to report more nicely; I think an `EOFException` is kinda 
like a tcp reset happening... I wonder how frequently this is happening?

the other interesting thing is that it happened in 2 different splits; at 
different times:
split22
* 2023-09-28T12:35:05,587 
split7
* 2023-09-28T13:44:09,013 
* 2023-09-28T13:44:59,934  
* the last in s7 is even more odd:
{code}
2023-09-28T13:46:17,934  INFO [Listener at 0.0.0.0/44773] 
externalDB.AbstractExternalDB: Stderr from proc: Unable to find image 
'postgres:9.3' locally
docker: Error response from daemon: received unexpected HTTP status: 503 
Service Unavailable.
{code}


> Include docker daemon logs in case of docker issues
> ---
>
> Key: HIVE-27759
> URL: https://issues.apache.org/jira/browse/HIVE-27759
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Priority: Major
>
> there is a test failure:
> http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4753/2/tests/
> {code}
> docker: Error response from daemon: Get https://registry-1.docker.io/v2/: EOF.
> See 'docker run --help'.
> {code}
> the root cause of EOF is unknown, there might be further details somewhere 
> else, here is a github issue for reference (it's for mac but any ideas are 
> welcome): https://github.com/docker/for-mac/issues/6704



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27759) Include docker daemon logs in case of docker issues

2023-10-03 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771455#comment-17771455
 ] 

Zoltan Haindrich commented on HIVE-27759:
-

okay; not sure - but that used to be the case around a year ago - I don't think 
that was fixed :)

whatever causes this issue - if the images used for testing would be hosted 
inside the k8s cluster that would:
* reduce dependency on external service
* reduce external network usage
* and also speed up builds

that's why I think fixing this is not that interesting...


> Include docker daemon logs in case of docker issues
> ---
>
> Key: HIVE-27759
> URL: https://issues.apache.org/jira/browse/HIVE-27759
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Priority: Major
>
> there is a test failure:
> http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4753/2/tests/
> {code}
> docker: Error response from daemon: Get https://registry-1.docker.io/v2/: EOF.
> See 'docker run --help'.
> {code}
> the root cause of EOF is unknown, there might be further details somewhere 
> else, here is a github issue for reference (it's for mac but any ideas are 
> welcome): https://github.com/docker/for-mac/issues/6704



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27718) TestMiniTezCliDriver: save application logs for failed tests

2023-10-03 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771402#comment-17771402
 ] 

Zoltan Haindrich commented on HIVE-27718:
-

don't do this in the main job either - extend the debug job instead

> TestMiniTezCliDriver: save application logs for failed tests
> 
>
> Key: HIVE-27718
> URL: https://issues.apache.org/jira/browse/HIVE-27718
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>
> 1. locate tez app logs for a TestMiniTezCliDriver test
> {code}
> ls -laR itests/qtest/target/tmp/hive/yarn-*/hive-logDir-nm-*
> {code}
> 2. add them similarly to HIVE-27716
> important to note that tez app logs files are not specific to a particular 
> test, so we can collect those for the whole module in case of an error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27759) Include docker daemon logs in case of docker issues

2023-10-03 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771400#comment-17771400
 ] 

Zoltan Haindrich commented on HIVE-27759:
-

the central repo is most likely hitting back because it is reaching download 
count limitations...

fix would be to use a local cache or separate registry during test executions



> Include docker daemon logs in case of docker issues
> ---
>
> Key: HIVE-27759
> URL: https://issues.apache.org/jira/browse/HIVE-27759
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Priority: Major
>
> there is a test failure:
> http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4753/2/tests/
> {code}
> docker: Error response from daemon: Get https://registry-1.docker.io/v2/: EOF.
> See 'docker run --help'.
> {code}
> the root cause of EOF is unknown, there might be further details somewhere 
> else, here is a github issue for reference (it's for mac but any ideas are 
> welcome): https://github.com/docker/for-mac/issues/6704



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27719) Save heapdump in case of OOM

2023-10-03 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771399#comment-17771399
 ] 

Zoltan Haindrich commented on HIVE-27719:
-

building something like this into the main job is kinda like a reciepie for 
disaster:
* it could possibly take up all space on the executors
* by exhausting all space it will make all jobs fail - even innoecent ones

> Save heapdump in case of OOM
> 
>
> Key: HIVE-27719
> URL: https://issues.apache.org/jira/browse/HIVE-27719
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: Kokila N
>Priority: Major
>
> This applies to 2 places:
> 1. mini llap tests: 1 single JVM has everything (HS2, AM, tasks)
> 2. mini tez test: tez app containers



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27758) Precommit: splits are messed up in the folders

2023-10-03 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-27758.
-
Resolution: Not A Problem

> Precommit: splits are messed up in the folders
> --
>
> Key: HIVE-27758
> URL: https://issues.apache.org/jira/browse/HIVE-27758
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Priority: Major
> Attachments: Screenshot 2023-09-29 at 9.15.22.png
>
>
> e.g. in the screenshot below, split-07 folder contains logs for another 
> splits, maybe I'm getting something wrong
> !Screenshot 2023-09-29 at 9.15.22.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27719) Save heapdump in case of OOM

2023-10-03 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771398#comment-17771398
 ] 

Zoltan Haindrich commented on HIVE-27719:
-

don't do this - only in the debug job which runs only 1 test

> Save heapdump in case of OOM
> 
>
> Key: HIVE-27719
> URL: https://issues.apache.org/jira/browse/HIVE-27719
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: Kokila N
>Priority: Major
>
> This applies to 2 places:
> 1. mini llap tests: 1 single JVM has everything (HS2, AM, tasks)
> 2. mini tez test: tez app containers



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27717) Improve precommit logging to address flaky tests easier

2023-10-03 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771396#comment-17771396
 ] 

Zoltan Haindrich commented on HIVE-27717:
-

why would you want to do this? use the job which reruns the same test multiple 
times...

> Improve precommit logging to address flaky tests easier
> ---
>
> Key: HIVE-27717
> URL: https://issues.apache.org/jira/browse/HIVE-27717
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27758) Precommit: splits are messed up in the folders

2023-10-03 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771395#comment-17771395
 ] 

Zoltan Haindrich edited comment on HIVE-27758 at 10/3/23 10:07 AM:
---

that shouldn't matter much; other then that it might be confusing - because the 
same word is reused:
* the test case was split up into N parts
* meanwhile the executor used M splits

edit:
I think I've just rephrased what you were already saying :D 
> the folders "split-X" belong to the kubernetes pods and "hive.cli.splitY" 
> packages belong to the qsplit profile logic



was (Author: kgyrtkirk):
that shouldn't matter much; other then that it might be confusing - because the 
same word is reused:
* the test case was split up into N parts
* meanwhile the executor used M splits


> Precommit: splits are messed up in the folders
> --
>
> Key: HIVE-27758
> URL: https://issues.apache.org/jira/browse/HIVE-27758
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Priority: Major
> Attachments: Screenshot 2023-09-29 at 9.15.22.png
>
>
> e.g. in the screenshot below, split-07 folder contains logs for another 
> splits, maybe I'm getting something wrong
> !Screenshot 2023-09-29 at 9.15.22.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27758) Precommit: splits are messed up in the folders

2023-10-03 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771395#comment-17771395
 ] 

Zoltan Haindrich commented on HIVE-27758:
-

that shouldn't matter much; other then that it might be confusing - because the 
same word is reused:
* the test case was split up into N parts
* meanwhile the executor used M splits


> Precommit: splits are messed up in the folders
> --
>
> Key: HIVE-27758
> URL: https://issues.apache.org/jira/browse/HIVE-27758
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Priority: Major
> Attachments: Screenshot 2023-09-29 at 9.15.22.png
>
>
> e.g. in the screenshot below, split-07 folder contains logs for another 
> splits, maybe I'm getting something wrong
> !Screenshot 2023-09-29 at 9.15.22.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796

2023-08-16 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754910#comment-17754910
 ] 

Zoltan Haindrich commented on HIVE-26806:
-

made you an admin - didn't know you weren't one :D

there are 2 ways to upgrade the plugin:
* upgrade individually on the interface
* upgrade by building a new htk-jenkins image 
(https://hub.docker.com/r/kgyrtkirk/htk-jenkins/tags)
this second could upgrade everything from jenkins version to all plugins - 
since it wasn't been done for a while it might be helpfull to do that

let me know if you need any help with that; I'm also on asf slack if you want 
to chat




> Precommit tests in CI are timing out after HIVE-26796
> -
>
> Key: HIVE-26806
> URL: https://issues.apache.org/jira/browse/HIVE-26806
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/1506/
> {noformat}
> ancelling nested steps due to timeout
> 15:22:08  Sending interrupt signal to process
> 15:22:08  Killing processes
> 15:22:09  kill finished with exit code 0
> 15:22:19  Terminated
> 15:22:19  script returned exit code 143
> [Pipeline] }
> [Pipeline] // withEnv
> [Pipeline] }
> 15:22:19  Deleting 1 temporary files
> [Pipeline] // configFileProvider
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (PostProcess)
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] junit
> 15:22:25  Recording test results
> 15:22:32  [Checks API] No suitable checks publisher found.
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] }
> [Pipeline] // container
> [Pipeline] }
> [Pipeline] // node
> [Pipeline] }
> [Pipeline] // timeout
> [Pipeline] }
> [Pipeline] // podTemplate
> [Pipeline] }
> 15:22:32  Failed in branch split-01
> [Pipeline] // parallel
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (Archive)
> [Pipeline] podTemplate
> [Pipeline] {
> [Pipeline] timeout
> 15:22:33  Timeout set to expire in 6 hr 0 min
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26806) Precommit tests in CI are timing out after HIVE-26796

2023-08-10 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752828#comment-17752828
 ] 

Zoltan Haindrich commented on HIVE-26806:
-

seems like there is a helpfull feature in the parallel-test-executor
https://github.com/jenkinsci/parallel-test-executor-plugin/commit/c9145a5f849f01d6e99c2240eb51d9aaf283ef6a
upgrade to >380 could make this go away

> Precommit tests in CI are timing out after HIVE-26796
> -
>
> Key: HIVE-26806
> URL: https://issues.apache.org/jira/browse/HIVE-26806
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/1506/
> {noformat}
> ancelling nested steps due to timeout
> 15:22:08  Sending interrupt signal to process
> 15:22:08  Killing processes
> 15:22:09  kill finished with exit code 0
> 15:22:19  Terminated
> 15:22:19  script returned exit code 143
> [Pipeline] }
> [Pipeline] // withEnv
> [Pipeline] }
> 15:22:19  Deleting 1 temporary files
> [Pipeline] // configFileProvider
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (PostProcess)
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] sh
> [Pipeline] junit
> 15:22:25  Recording test results
> 15:22:32  [Checks API] No suitable checks publisher found.
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] }
> [Pipeline] // container
> [Pipeline] }
> [Pipeline] // node
> [Pipeline] }
> [Pipeline] // timeout
> [Pipeline] }
> [Pipeline] // podTemplate
> [Pipeline] }
> 15:22:32  Failed in branch split-01
> [Pipeline] // parallel
> [Pipeline] }
> [Pipeline] // stage
> [Pipeline] stage
> [Pipeline] { (Archive)
> [Pipeline] podTemplate
> [Pipeline] {
> [Pipeline] timeout
> 15:22:33  Timeout set to expire in 6 hr 0 min
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26605) Remove reviewer pattern

2023-08-02 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-26605.
-
Resolution: Fixed

> Remove reviewer pattern
> ---
>
> Key: HIVE-26605
> URL: https://issues.apache.org/jira/browse/HIVE-26605
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (HIVE-26978) Stale "Runtime stats" causes poor query planning

2023-05-26 Thread Zoltan Haindrich (Jira)


[ https://issues.apache.org/jira/browse/HIVE-26978 ]


Zoltan Haindrich deleted comment on HIVE-26978:
-

was (Author: kgyrtkirk):
have you restarted the HS2?
the runtime stats are cached there; the metastore only stores them

> Stale "Runtime stats" causes poor query planning
> 
>
> Key: HIVE-26978
> URL: https://issues.apache.org/jira/browse/HIVE-26978
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance
> Attachments: Screenshot 2023-01-24 at 10.23.16 AM.png
>
>
> * Runtime stats can be stored in hiveserver or in metastore via 
> "hive.query.reexecution.stats.persist.scope".
>  * Though the table is dropped and recreated, it ends up showing old stats 
> via "RUNTIME" stats. Here is an example (note that the table is empty, but 
> gets datasize and numRows from RUNTIME stats)
>  * This causes suboptimal plan for "MERGE INTO" queries by creating 
> CUSTOM_EDGE instead of broadcast edge.
> !Screenshot 2023-01-24 at 10.23.16 AM.png|width=2053,height=753!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26978) Stale "Runtime stats" causes poor query planning

2023-05-26 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17726585#comment-17726585
 ] 

Zoltan Haindrich commented on HIVE-26978:
-

have you restarted the HS2?
the runtime stats are cached there; the metastore only stores them

> Stale "Runtime stats" causes poor query planning
> 
>
> Key: HIVE-26978
> URL: https://issues.apache.org/jira/browse/HIVE-26978
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance
> Attachments: Screenshot 2023-01-24 at 10.23.16 AM.png
>
>
> * Runtime stats can be stored in hiveserver or in metastore via 
> "hive.query.reexecution.stats.persist.scope".
>  * Though the table is dropped and recreated, it ends up showing old stats 
> via "RUNTIME" stats. Here is an example (note that the table is empty, but 
> gets datasize and numRows from RUNTIME stats)
>  * This causes suboptimal plan for "MERGE INTO" queries by creating 
> CUSTOM_EDGE instead of broadcast edge.
> !Screenshot 2023-01-24 at 10.23.16 AM.png|width=2053,height=753!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-23691) TestMiniLlapLocalCliDriver#testCliDriver[schq_materialized] is flaky

2023-04-14 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-23691:
---

Assignee: KIRTI RUGE  (was: Zoltan Haindrich)

> TestMiniLlapLocalCliDriver#testCliDriver[schq_materialized] is flaky
> 
>
> Key: HIVE-23691
> URL: https://issues.apache.org/jira/browse/HIVE-23691
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: KIRTI RUGE
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> http://34.66.156.144:8080/job/hive-precommit/job/master/39/testReport/junit/org.apache.hadoop.hive.cli.split20/TestMiniLlapLocalCliDriver/Testing___split_13___Archive___testCliDriver_schq_materialized_/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26984) Deprecate public HiveConf constructors

2023-02-02 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683356#comment-17683356
 ] 

Zoltan Haindrich commented on HIVE-26984:
-

I understand that you are after the ultimate comfort...but how often you would 
need this? you are saying that you want a system built-in which could tell you 
from "WHERE"  conf keys are being altered...if that happens often - I would be 
interested in the causes of that...

but I think you still have alternatives;  you could probably:
* enable aspectj weaving for the hive-exec module - since we are already 
shading the modul; that's not that big of a changeespecially if it could 
shade+weave at the same time..
* you could build-in the Traceable part into the main HiveConf object - right 
now you are returning a different impl if a conf key is set..
** since this is a conf object - the state of that conf key is a chicken-egg 
problem: what if for some HiveConf instances you are loading from a different 
place/etc? and the key is off? you will not see those - but I guess those would 
be the most interesting ones...when someone just wrote `new HiveConf()`...
* as about passing the agent: I'm not sure - but maybe the agent can 
be placed inside say hive-exec or something; and then tweak the tez launch 
params (from inside HS2) to add the -agent to the launch cmdline ; probably 
similar for the HS2 startup...but that should be done somewhere in the 
scripts...
** so I don't see that way impossible either...have you already explored these 
paths?

> Deprecate public HiveConf constructors
> --
>
> Key: HIVE-26984
> URL: https://issues.apache.org/jira/browse/HIVE-26984
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> From time to time we investigate configuration object problems that are hard 
> to investigate. We can improve this area, e.g. with HIVE-26985, but first, we 
> need to introduce a public static factory method to hook into the creation 
> process. I can see this pattern in another projects as well, like: 
> HBaseConfiguration.
> Creating custom HiveConf subclasses can be useful because putting optional 
> (say: if else branches or whatever) stuff into the original HiveConf object's 
> hot codepaths can turn it less performant instantly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26984) Deprecate public HiveConf constructors

2023-02-02 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683337#comment-17683337
 ] 

Zoltan Haindrich commented on HIVE-26984:
-

copied from HIVE-26985:

I think you could probably achieve something similar by using AspectJ or 
Byteman or other java agent stuff; or you could write your own agent:
https://stackify.com/what-are-java-agents-and-how-to-profile-with-them/
What's the problem with those approaches?

I will leave a -1 here as it makes a significant API change by making the 
HiveConf constructor protected - which will break all 3rd party extensions 
which may use `new HiveConf()`


> Deprecate public HiveConf constructors
> --
>
> Key: HIVE-26984
> URL: https://issues.apache.org/jira/browse/HIVE-26984
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> From time to time we investigate configuration object problems that are hard 
> to investigate. We can improve this area, e.g. with HIVE-26985, but first, we 
> need to introduce a public static factory method to hook into the creation 
> process. I can see this pattern in another projects as well, like: 
> HBaseConfiguration.
> Creating custom HiveConf subclasses can be useful because putting optional 
> (say: if else branches or whatever) stuff into the original HiveConf object's 
> hot codepaths can turn it less performant instantly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26985) Create a trackable hive configuration object

2023-02-02 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683330#comment-17683330
 ] 

Zoltan Haindrich commented on HIVE-26985:
-

I think you could probably achieve something similar by using  AspectJ or 
Byteman or other java agent stuff; 

or write your own agent:
https://stackify.com/what-are-java-agents-and-how-to-profile-with-them/

I will -1 the current patch because it makes a significant API change by making 
the HiveConf constructor protected - which will break all 3rd party extensions 
which may use `new HiveConf()`

> Create a trackable hive configuration object
> 
>
> Key: HIVE-26985
> URL: https://issues.apache.org/jira/browse/HIVE-26985
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: hive.log
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> During configuration-related investigations, I want to be able to easily find 
> out when and how a certain configuration is changed. I'm looking for an 
> improvement that simply logs if "hive.a.b.c" is changed from "hello" to 
> "asdf" or even null and on which thread/codepath.
> Not sure if there is already a trackable configuration object in hadoop that 
> we can reuse, or we need to implement it in hive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26400) Provide docker images for Hive

2022-11-30 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17641232#comment-17641232
 ] 

Zoltan Haindrich commented on HIVE-26400:
-

taking a quick look at the PR I'm not sure about the goal here...note that the 
following will fire up Hive from public dockerhub images/etc:

{code}
# To download and start the Hive in a docker image
docker run --rm -p 1:1 --name hive4 -e HIVE_VERSION=4.0.0-alpha-1 -e 
TEZ_VERSION=0.10.1 -v hive-dev-box_work:/work kgyrtkirk/hive-dev-box:bazaar

# After the pervious command is finished (it takes some time to download the 
image and start Hive)
# In another terminal, to connect with BeeLine to Hive
docker exec -it hive4 /bin/bash --login -e safe_bl
{code}

it will also cache downloaded artifacts; and don't need to "rebuild" a 
super-fat image every time a new version is being released...of course it might 
make sense to ditch all those features in case it will be used differently.

> Provide docker images for Hive
> --
>
> Key: HIVE-26400
> URL: https://issues.apache.org/jira/browse/HIVE-26400
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Blocker
>  Labels: hive-4.0.0-must, pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Make Apache Hive be able to run inside docker container in pseudo-distributed 
> mode, with MySQL/Derby as its back database, provide the following:
>  * Quick-start/Debugging/Prepare a test env for Hive;
>  * Tools to build target image with specified version of Hive and its 
> dependencies;
>  * Images can be used as the basis for the Kubernetes operator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26741) Unexpect behavior for insert when table name is like `db.tab`

2022-11-16 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-26741.
-
Resolution: Duplicate

dup of HIVE-16907

> Unexpect behavior for insert when table name is like `db.tab`
> -
>
> Key: HIVE-26741
> URL: https://issues.apache.org/jira/browse/HIVE-26741
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Reporter: luoyuxia
>Priority: Major
> Attachments: image-2022-11-16-09-57-57-461.png, 
> image-2022-11-16-10-03-08-559.png, image-2022-11-16-10-08-31-699.png, 
> image-2022-11-16-10-09-40-766.png
>
>
> Just meet a strange problem with following sql, it'll overwrite the data 
> instead of appending data.
> {code:java}
> insert into table `default.t1` values (1, 2){code}
> The result is as follows:
> !image-2022-11-16-09-57-57-461.png|width=397,height=362!
> is it a bug or some other things?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26605) Remove reviewer pattern

2022-10-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-26605:
---


> Remove reviewer pattern
> ---
>
> Key: HIVE-26605
> URL: https://issues.apache.org/jira/browse/HIVE-26605
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-20607) TxnHandler should use PreparedStatement to execute direct SQL queries.

2022-06-16 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-20607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555030#comment-17555030
 ] 

Zoltan Haindrich commented on HIVE-20607:
-

if it would have been on 3.1 - then it would have been released recentlybut 
as of now I don't know about any planned 3.x releases; I guess 4.0 will be next

> TxnHandler should use PreparedStatement to execute direct SQL queries.
> --
>
> Key: HIVE-20607
> URL: https://issues.apache.org/jira/browse/HIVE-20607
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore, Transactions
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, pull-request-available
> Fix For: 3.2.0, 4.0.0, 4.0.0-alpha-1
>
> Attachments: HIVE-20607.01-branch-3.patch, HIVE-20607.01.patch
>
>
> TxnHandler uses direct SQL queries to operate on Txn related databases/tables 
> in Hive metastore RDBMS.
> Most of the methods are direct calls from Metastore api which should be 
> directly append input string arguments to the SQL string.
> Need to use parameterised PreparedStatement object to set these arguments.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-20607) TxnHandler should use PreparedStatement to execute direct SQL queries.

2022-06-16 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-20607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17554996#comment-17554996
 ] 

Zoltan Haindrich commented on HIVE-20607:
-

This patch is on branch-3 via [this 
commit|https://github.com/apache/hive/commit/09b92d3c864b00df99923f03a843a8179bd874a0];
I don't think we have a 3.2.1 release - or even 3.2.0; I don't see any traces 
of that ; we also don't even have a branch-3.2 right now.

3.2.0 is an [unreleased 
version|https://issues.apache.org/jira/projects/HIVE/versions/12343559] - I 
would recommend to use 4.0.0-alpha-1 which contains this fix.

> TxnHandler should use PreparedStatement to execute direct SQL queries.
> --
>
> Key: HIVE-20607
> URL: https://issues.apache.org/jira/browse/HIVE-20607
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore, Transactions
>Affects Versions: 3.1.0, 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, pull-request-available
> Fix For: 3.2.0, 4.0.0, 4.0.0-alpha-1
>
> Attachments: HIVE-20607.01-branch-3.patch, HIVE-20607.01.patch
>
>
> TxnHandler uses direct SQL queries to operate on Txn related databases/tables 
> in Hive metastore RDBMS.
> Most of the methods are direct calls from Metastore api which should be 
> directly append input string arguments to the SQL string.
> Need to use parameterised PreparedStatement object to set these arguments.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-25733) Add check-spelling CI action

2022-06-15 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17554542#comment-17554542
 ] 

Zoltan Haindrich commented on HIVE-25733:
-

thank you [~pvary], the action seems to be checking the PR state and not the 
one merged with the actual master...

thank you very much for fixing it!

> Add check-spelling CI action
> 
>
> Key: HIVE-25733
> URL: https://issues.apache.org/jira/browse/HIVE-25733
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Josh Soref
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Add CI to catch spelling errors. See [https://www.check-spelling.dev/] for 
> information.
> Initially this will only check the {{serde}} directory, but the intention is 
> to expand its coverage as spelling errors in other directories are fixed.
> Note that for this to work the action should be made a required check, 
> otherwise when a typo is added forks from that commit will get complaints.
> If a typo is intentional, the action will provide information about how to 
> add it to {{expect.txt}} such that it will be accepted as an expected item 
> (i.e. not a typo).
> To skip a file/directory entirely, add a matching entry to 
> {{{}excludes.txt{}}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-25879) MetaStoreDirectSql test query should not query the whole DBS table

2022-06-14 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-25879.
-
Fix Version/s: 4.0.0-alpha-2
 Assignee: Miklos Szurap
   Resolution: Fixed

merged into master; Thank you [~mszurap] for fixing this!

> MetaStoreDirectSql test query should not query the whole DBS table
> --
>
> Key: HIVE-25879
> URL: https://issues.apache.org/jira/browse/HIVE-25879
> Project: Hive
>  Issue Type: Bug
>Reporter: Miklos Szurap
>Assignee: Miklos Szurap
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The runTestQuery() in the 
> org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java is using a test query
> {code:java}
> select "DB_ID" from "DBS"{code}
> to determine whether the direct SQL can be used.
> With larger deployments with many (10k+) Hive databases it would be more 
> efficienct to query a small table instead, for example the "VERSION" table 
> should always have a single row only.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26184) COLLECT_SET with GROUP BY is very slow when some keys are highly skewed

2022-06-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-26184.
-
Fix Version/s: 4.0.0-alpha-2
   Resolution: Fixed

merged into master. Thank you [~okumin] !

> COLLECT_SET with GROUP BY is very slow when some keys are highly skewed
> ---
>
> Key: HIVE-26184
> URL: https://issues.apache.org/jira/browse/HIVE-26184
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.8, 3.1.3
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I observed some reducers spend 98% of CPU time in invoking 
> `java.util.HashMap#clear`.
> Looking the detail, I found COLLECT_SET reuses a LinkedHashSet and its 
> `clear` can be quite heavy when a relation has a small number of highly 
> skewed keys.
>  
> To reproduce the issue, first, we will create rows with a skewed key.
> {code:java}
> INSERT INTO test_collect_set
> SELECT '----' AS key, CAST(UUID() AS VARCHAR) 
> AS value
> FROM table_with_many_rows
> LIMIT 10;{code}
> Then, we will create many non-skewed rows.
> {code:java}
> INSERT INTO test_collect_set
> SELECT UUID() AS key, UUID() AS value
> FROM table_with_many_rows
> LIMIT 500;{code}
> We can observe the issue when we aggregate values by `key`.
> {code:java}
> SELECT key, COLLECT_SET(value) FROM group_by_skew GROUP BY key{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26303) [hive on spark] when hive.exec.parallel=true,beeline run sql in script,sometimes app is running but all job finished

2022-06-09 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552177#comment-17552177
 ] 

Zoltan Haindrich commented on HIVE-26303:
-

please don't turn on hive.exec.parallel for any version of hive - especially 
for older ones as it may cause issues...

> [hive on spark] when hive.exec.parallel=true,beeline run sql in 
> script,sometimes app is running but all job finished
> 
>
> Key: HIVE-26303
> URL: https://issues.apache.org/jira/browse/HIVE-26303
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 2.3.7
>Reporter: lkl
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26268) Upgrade snappy-java to 1.1.8.4

2022-06-08 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-26268.
-
Fix Version/s: 4.0.0-alpha-2
   Resolution: Fixed

merged into master. Thank you [~slachiewicz]!

> Upgrade snappy-java to 1.1.8.4
> --
>
> Key: HIVE-26268
> URL: https://issues.apache.org/jira/browse/HIVE-26268
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Sylwester Lachiewicz
>Assignee: Sylwester Lachiewicz
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Upgrade to get benefits from performance improvements and bug fixes. 
> Also to support Apple Silicon (M1, Mac-aarch64)
> [https://github.com/xerial/snappy-java/blob/master/Milestone.md#snappy-java-1183-2021-01-20]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-25635) Upgrade Thrift to 0.16.0

2022-06-08 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-25635.
-
Fix Version/s: 4.0.0-alpha-2
   Resolution: Fixed

merged into master. Thank you [~slachiewicz]!

> Upgrade Thrift to 0.16.0
> 
>
> Key: HIVE-25635
> URL: https://issues.apache.org/jira/browse/HIVE-25635
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yuming Wang
>Assignee: Sylwester Lachiewicz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> To addresses CVEs:
> ||Component Name||Component Version Name||Vulnerability||Fixed version||
> |Apache 
> Thrift|0.11.0-4.|[CVE-2020-13949|https://github.com/advisories/GHSA-g2fg-mr77-6vrm]|0.14.1|



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HIVE-25635) Upgrade Thrift to 0.16.0

2022-06-08 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-25635:
---

Assignee: Sylwester Lachiewicz

> Upgrade Thrift to 0.16.0
> 
>
> Key: HIVE-25635
> URL: https://issues.apache.org/jira/browse/HIVE-25635
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yuming Wang
>Assignee: Sylwester Lachiewicz
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> To addresses CVEs:
> ||Component Name||Component Version Name||Vulnerability||Fixed version||
> |Apache 
> Thrift|0.11.0-4.|[CVE-2020-13949|https://github.com/advisories/GHSA-g2fg-mr77-6vrm]|0.14.1|



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26148) Keep MetaStoreFilterHook interface compatibility after introducing catalogs

2022-06-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-26148:

Fix Version/s: 4.0.0-alpha-2
   (was: 4.0.0-alpha-1)

> Keep MetaStoreFilterHook interface compatibility after introducing catalogs
> ---
>
> Key: HIVE-26148
> URL: https://issues.apache.org/jira/browse/HIVE-26148
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Wechar
>Assignee: Wechar
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Hive 3.0 introduce catalog concept, when we upgrade hive dependency version 
> from 2.3 to 3.x, we found some interfaces of *MetaStoreFilterHook* are not 
> compatible:
> {code:bash}
>  git show ba8a99e115 -- 
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreFilterHook.java
> {code}
> {code:bash}
> --- 
> a/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreFilterHook.java
> +++ 
> b/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreFilterHook.java
>/**
> * Filter given list of tables
> -   * @param dbName
> -   * @param tableList
> +   * @param catName catalog name
> +   * @param dbName database name
> +   * @param tableList list of table returned by the metastore
> * @return List of filtered table names
> */
> -  public List filterTableNames(String dbName, List 
> tableList) throws MetaException;
> +  List filterTableNames(String catName, String dbName, List 
> tableList)
> +  throws MetaException;
> {code}
> We can remain the previous interfaces and use the default catalog to 
> implement.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26224) Add support for ESRI GeoSpatial SERDE formats

2022-05-31 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17544427#comment-17544427
 ] 

Zoltan Haindrich commented on HIVE-26224:
-

this PR seem to have copied some sources from a different project - why did we 
have done that?
I think we have enough problems already...why FLATTEN some 3rd party code 
directly into the QL module...

https://github.com/Esri/spatial-framework-for-hadoop/blob/master/json/src/main/java/com/esri/json/hadoop/UnenclosedGeoJsonRecordReader.java

> Add support for ESRI GeoSpatial SERDE formats
> -
>
> Key: HIVE-26224
> URL: https://issues.apache.org/jira/browse/HIVE-26224
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add support to use ESRI geospatial serde formats



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26158) TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after rename table

2022-05-31 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17544347#comment-17544347
 ] 

Zoltan Haindrich commented on HIVE-26158:
-

[~sanguines] I think I've missed your comment;

you could probably take a look at tickets marked with the newbie label:
https://issues.apache.org/jira/browse/HIVE-25711?jql=project%20%3D%20Hive%20and%20labels%20%3D%20newbie%20%20ORDER%20BY%20id%20DESC

let me know if you need more help - you could also reach out to us on the 
dev-list or on #hive in the asf slack (we don't use that channel for 
anything...)

> TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after 
> rename table
> --
>
> Key: HIVE-26158
> URL: https://issues.apache.org/jira/browse/HIVE-26158
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: tanghui
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: metastore_translator, pull-request-available
> Fix For: 4.0.0, 4.0.0-alpha-2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> After the patch is updated, the partition table location and hdfs data 
> directory are displayed normally, but the partition location of the table in 
> the SDS in the Hive metabase is still displayed as the location of the old 
> table, resulting in no data in the query partition.
>  
> in beeline:
> 
> set hive.create.as.external.legacy=true;
> CREATE TABLE part_test(
> c1 string
> ,c2 string
> )PARTITIONED BY (dat string)
> insert into part_test values ("11","th","20220101")
> insert into part_test values ("22","th","20220102")
> alter table part_test rename to part_test11;
> --this result is null.
> select * from part_test11 where dat="20220101";
> ||part_test.c1||part_test.c2||part_test.dat||
> | | | |
> -
> SDS in the Hive metabase:
> select SDS.LOCATION from TBLS,SDS where TBLS.TBL_NAME="part_test11" AND 
> TBLS.TBL_ID=SDS.CD_ID;
> ---
> |*LOCATION*|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test11|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220101|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220102|
> ---
>  
> We need to modify the partition location of the table in SDS to ensure that 
> the query results are normal



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-25285) Retire HiveProjectJoinTransposeRule

2022-05-31 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-25285:

Labels: newbie  (was: )

> Retire HiveProjectJoinTransposeRule
> ---
>
> Key: HIVE-25285
> URL: https://issues.apache.org/jira/browse/HIVE-25285
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Priority: Major
>  Labels: newbie
>
> we don't neccessary need our own rule anymore - a plain 
> ProjectJoinTransposeRule  could probably work



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26266) Column information is not present in lineage for CTAS when custom location/translated location is used

2022-05-31 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17544218#comment-17544218
 ] 

Zoltan Haindrich commented on HIVE-26266:
-

to update q.out-s you have to run the tests with `-Dtest.output.overwrite`

> Column information is not present in lineage for CTAS when custom 
> location/translated location is used
> --
>
> Key: HIVE-26266
> URL: https://issues.apache.org/jira/browse/HIVE-26266
> Project: Hive
>  Issue Type: Bug
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: metastore_translator, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently CTAS considers only default table location when mapping the 
> location to the FileSinkOperator. This will miss the cases when a custom 
> location is specified as well as when the table has a translated location.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26266) Column information is not present in lineage for CTAS when custom location/translated location is used

2022-05-26 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-26266:

Labels: metastore_translator  (was: )

> Column information is not present in lineage for CTAS when custom 
> location/translated location is used
> --
>
> Key: HIVE-26266
> URL: https://issues.apache.org/jira/browse/HIVE-26266
> Project: Hive
>  Issue Type: Bug
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: metastore_translator
>
> Currently CTAS considers only default table location when mapping the 
> location to the FileSinkOperator. This will miss the cases when a custom 
> location is specified as well as when the table has a translated location.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26084) Oracle metastore init tests are flaky

2022-05-25 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542011#comment-17542011
 ] 

Zoltan Haindrich commented on HIVE-26084:
-

hmm..last time I checked I've only seen oracle-11g in xe - I'm so happy to see 
18 and 21 :D

> Oracle metastore init tests are flaky
> -
>
> Key: HIVE-26084
> URL: https://issues.apache.org/jira/browse/HIVE-26084
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Peter Vary
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> After HIVE-26022 we started to run the oracle metastore init tests, but they 
> seem to be flaky.
> I see this issue quite often: 
> http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-3147/1/pipeline/551
> We might have to increase the timeout, or use another oracle image for more 
> consistent tests.
> The error in the logs for future reference
> {code}
> [2022-03-28T14:10:07.804Z] + echo 127.0.0.1 dev_oracle
> [2022-03-28T14:10:07.804Z] + sudo tee -a /etc/hosts
> [2022-03-28T14:10:07.804Z] 127.0.0.1 dev_oracle
> [2022-03-28T14:10:07.804Z] + . /etc/profile.d/confs.sh
> [2022-03-28T14:10:07.804Z] ++ export MAVEN_OPTS=-Xmx2g
> [2022-03-28T14:10:07.804Z] ++ MAVEN_OPTS=-Xmx2g
> [2022-03-28T14:10:07.804Z] ++ export HADOOP_CONF_DIR=/etc/hadoop
> [2022-03-28T14:10:07.804Z] ++ HADOOP_CONF_DIR=/etc/hadoop
> [2022-03-28T14:10:07.804Z] ++ export HADOOP_LOG_DIR=/data/log
> [2022-03-28T14:10:07.804Z] ++ HADOOP_LOG_DIR=/data/log
> [2022-03-28T14:10:07.804Z] ++ export 
> 'HADOOP_CLASSPATH=/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*'
> [2022-03-28T14:10:07.804Z] ++ 
> HADOOP_CLASSPATH='/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*'
> [2022-03-28T14:10:07.804Z] ++ export HIVE_CONF_DIR=/etc/hive/
> [2022-03-28T14:10:07.804Z] ++ HIVE_CONF_DIR=/etc/hive/
> [2022-03-28T14:10:07.804Z] ++ export 
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin
> [2022-03-28T14:10:07.804Z] ++ 
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin
> [2022-03-28T14:10:07.804Z] ++ . /etc/profile.d/java.sh
> [2022-03-28T14:10:07.804Z] +++ export JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/
> [2022-03-28T14:10:07.804Z] +++ JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/
> [2022-03-28T14:10:07.804Z] + sw hive-dev 
> /home/jenkins/agent/workspace/hive-precommit_PR-3147
> [2022-03-28T14:10:07.804Z] @ activating: 
> /home/jenkins/agent/workspace/hive-precommit_PR-3147/packaging/target/apache-hive-4.0.0-alpha-1-SNAPSHOT-bin/apache-hive-4.0.0-alpha-1-SNAPSHOT-bin/
>  for hive
> [2022-03-28T14:10:07.804Z] + ping -c2 dev_oracle
> [2022-03-28T14:10:07.804Z] PING dev_oracle (127.0.0.1) 56(84) bytes of data.
> [2022-03-28T14:10:07.804Z] 64 bytes from localhost (127.0.0.1): icmp_seq=1 
> ttl=64 time=0.082 ms
> [2022-03-28T14:10:08.795Z] 64 bytes from localhost (127.0.0.1): icmp_seq=2 
> ttl=64 time=0.087 ms
> [2022-03-28T14:10:08.795Z] 
> [2022-03-28T14:10:08.795Z] --- dev_oracle ping statistics ---
> [2022-03-28T14:10:08.795Z] 2 packets transmitted, 2 received, 0% packet loss, 
> time 51ms
> [2022-03-28T14:10:08.795Z] rtt min/avg/max/mdev = 0.082/0.084/0.087/0.009 ms
> [2022-03-28T14:10:08.795Z] + export DOCKER_NETWORK=host
> [2022-03-28T14:10:08.795Z] + DOCKER_NETWORK=host
> [2022-03-28T14:10:08.795Z] + export DBNAME=metastore
> [2022-03-28T14:10:08.795Z] + DBNAME=metastore
> [2022-03-28T14:10:08.795Z] + reinit_metastore oracle
> [2022-03-28T14:10:08.795Z] @ initializing: oracle
> [2022-03-28T14:10:08.795Z] metastore database name: metastore
> [2022-03-28T14:10:09.135Z] @ starting dev_oracle...
> [2022-03-28T14:10:09.445Z] Unable to find image 
> 'quay.io/maksymbilenko/oracle-12c:latest' locally
> [2022-03-28T14:10:10.407Z] latest: Pulling from maksymbilenko/oracle-12c
> [2022-03-28T14:10:10.407Z] 8ba884070f61: Pulling fs layer
> [2022-03-28T14:10:10.407Z] ef9513b81046: Pulling fs layer
> [2022-03-28T14:10:10.407Z] 6f1de349e202: Pulling fs layer
> [2022-03-28T14:10:10.407Z] 5376ebfa0fa3: Pulling fs layer
> [2022-03-28T14:10:10.407Z] 5f632c3633d2: Pulling fs layer
> [2022-03-28T14:10:10.407Z] 3e74293031d2: Pulling fs layer
> [2022-03-28T14:10:10.407Z] 5376ebfa0fa3: Waiting
> [2022-03-28T14:10:10.407Z] 5f632c3633d2: Waiting
> [2022-03-28T14:10:10.407Z] 3e74293031d2: Waiting
> [2022-03-28T14:10:10.407Z] 6f1de349e202: Download complete
> [2022-03-28T14:10:11.365Z] ef9513b81046: Download complete
> [2022-03-28T14:10:11.365Z] 5f632c3633d2: 

[jira] [Commented] (HIVE-26263) Mysql metastore init tests are flaky

2022-05-25 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542008#comment-17542008
 ] 

Zoltan Haindrich commented on HIVE-26263:
-

I've [disabled the mysql/metastore test for 
now|https://github.com/apache/hive/commit/34b24d55ade393673424f077b69add43bad9f731]

its strange that this happens so frequently and only for this database type...


> Mysql metastore init tests are flaky
> 
>
> Key: HIVE-26263
> URL: https://issues.apache.org/jira/browse/HIVE-26263
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Zoltan Haindrich
>Priority: Major
>
> Similarly to HIVE-26084 (Oracle tests), also Mysql tests are failing 
> similarly.
> In both cases we use _:latest_ as docker image version, which is probably not 
> ideal.
> Reporting the error for future reference:
> {noformat}
> [2022-05-24T14:07:52.127Z] + sudo tee -a /etc/hosts
> [2022-05-24T14:07:52.127Z] + echo 127.0.0.1 dev_mysql
> [2022-05-24T14:07:52.127Z] 127.0.0.1 dev_mysql
> [2022-05-24T14:07:52.127Z] + . /etc/profile.d/confs.sh
> [2022-05-24T14:07:52.127Z] ++ export MAVEN_OPTS=-Xmx2g
> [2022-05-24T14:07:52.127Z] ++ MAVEN_OPTS=-Xmx2g
> [2022-05-24T14:07:52.127Z] ++ export HADOOP_CONF_DIR=/etc/hadoop
> [2022-05-24T14:07:52.127Z] ++ HADOOP_CONF_DIR=/etc/hadoop
> [2022-05-24T14:07:52.127Z] ++ export HADOOP_LOG_DIR=/data/log
> [2022-05-24T14:07:52.127Z] ++ HADOOP_LOG_DIR=/data/log
> [2022-05-24T14:07:52.127Z] ++ export 
> 'HADOOP_CLASSPATH=/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*'
> [2022-05-24T14:07:52.127Z] ++ 
> HADOOP_CLASSPATH='/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*'
> [2022-05-24T14:07:52.127Z] ++ export HIVE_CONF_DIR=/etc/hive/
> [2022-05-24T14:07:52.127Z] ++ HIVE_CONF_DIR=/etc/hive/
> [2022-05-24T14:07:52.127Z] ++ export 
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin
> [2022-05-24T14:07:52.127Z] ++ 
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin
> [2022-05-24T14:07:52.127Z] ++ . /etc/profile.d/java.sh
> [2022-05-24T14:07:52.127Z] +++ export JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/
> [2022-05-24T14:07:52.127Z] +++ JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/
> [2022-05-24T14:07:52.127Z] + sw hive-dev 
> /home/jenkins/agent/workspace/hive-precommit_PR-3317
> [2022-05-24T14:07:52.127Z] @ activating: 
> /home/jenkins/agent/workspace/hive-precommit_PR-3317/packaging/target/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/
>  for hive
> [2022-05-24T14:07:52.127Z] + ping -c2 dev_mysql
> [2022-05-24T14:07:52.127Z] PING dev_mysql (127.0.0.1) 56(84) bytes of data.
> [2022-05-24T14:07:52.127Z] 64 bytes from localhost (127.0.0.1): icmp_seq=1 
> ttl=64 time=0.114 ms
> [2022-05-24T14:07:53.107Z] 64 bytes from localhost (127.0.0.1): icmp_seq=2 
> ttl=64 time=0.123 ms
> [2022-05-24T14:07:53.107Z] 
> [2022-05-24T14:07:53.107Z] --- dev_mysql ping statistics ---
> [2022-05-24T14:07:53.107Z] 2 packets transmitted, 2 received, 0% packet loss, 
> time 49ms
> [2022-05-24T14:07:53.107Z] rtt min/avg/max/mdev = 0.114/0.118/0.123/0.011 ms
> [2022-05-24T14:07:53.107Z] + export DOCKER_NETWORK=host
> [2022-05-24T14:07:53.107Z] + DOCKER_NETWORK=host
> [2022-05-24T14:07:53.107Z] + export DBNAME=metastore
> [2022-05-24T14:07:53.107Z] + DBNAME=metastore
> [2022-05-24T14:07:53.107Z] + reinit_metastore mysql
> [2022-05-24T14:07:53.107Z] @ initializing: mysql
> [2022-05-24T14:07:53.107Z] metastore database name: metastore
> [2022-05-24T14:07:53.381Z] @ starting dev_mysql...
> [2022-05-24T14:07:53.382Z] Unable to find image 'mariadb:latest' locally
> [2022-05-24T14:07:54.354Z] latest: Pulling from library/mariadb
> [2022-05-24T14:07:54.354Z] 125a6e411906: Pulling fs layer
> [2022-05-24T14:07:54.354Z] a28b55cc656d: Pulling fs layer
> [2022-05-24T14:07:54.354Z] f2325f4e25a1: Pulling fs layer
> [2022-05-24T14:07:54.354Z] c6c2d09f748d: Pulling fs layer
> [2022-05-24T14:07:54.354Z] af2b4ed853d2: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 8394ac6b401e: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 5b150cf0c5a7: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 1b11b2e20899: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 3d35790a91d9: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 5e73c7793365: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 3d34b9f14ede: Pulling fs layer
> [2022-05-24T14:07:54.354Z] c6c2d09f748d: Waiting
> [2022-05-24T14:07:54.354Z] 8394ac6b401e: 

[jira] [Assigned] (HIVE-26263) Mysql metastore init tests are flaky

2022-05-25 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-26263:
---

Assignee: Zoltan Haindrich

> Mysql metastore init tests are flaky
> 
>
> Key: HIVE-26263
> URL: https://issues.apache.org/jira/browse/HIVE-26263
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0-alpha-2
>Reporter: Alessandro Solimando
>Assignee: Zoltan Haindrich
>Priority: Major
>
> Similarly to HIVE-26084 (Oracle tests), also Mysql tests are failing 
> similarly.
> In both cases we use _:latest_ as docker image version, which is probably not 
> ideal.
> Reporting the error for future reference:
> {noformat}
> [2022-05-24T14:07:52.127Z] + sudo tee -a /etc/hosts
> [2022-05-24T14:07:52.127Z] + echo 127.0.0.1 dev_mysql
> [2022-05-24T14:07:52.127Z] 127.0.0.1 dev_mysql
> [2022-05-24T14:07:52.127Z] + . /etc/profile.d/confs.sh
> [2022-05-24T14:07:52.127Z] ++ export MAVEN_OPTS=-Xmx2g
> [2022-05-24T14:07:52.127Z] ++ MAVEN_OPTS=-Xmx2g
> [2022-05-24T14:07:52.127Z] ++ export HADOOP_CONF_DIR=/etc/hadoop
> [2022-05-24T14:07:52.127Z] ++ HADOOP_CONF_DIR=/etc/hadoop
> [2022-05-24T14:07:52.127Z] ++ export HADOOP_LOG_DIR=/data/log
> [2022-05-24T14:07:52.127Z] ++ HADOOP_LOG_DIR=/data/log
> [2022-05-24T14:07:52.127Z] ++ export 
> 'HADOOP_CLASSPATH=/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*'
> [2022-05-24T14:07:52.127Z] ++ 
> HADOOP_CLASSPATH='/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*'
> [2022-05-24T14:07:52.127Z] ++ export HIVE_CONF_DIR=/etc/hive/
> [2022-05-24T14:07:52.127Z] ++ HIVE_CONF_DIR=/etc/hive/
> [2022-05-24T14:07:52.127Z] ++ export 
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin
> [2022-05-24T14:07:52.127Z] ++ 
> PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin
> [2022-05-24T14:07:52.127Z] ++ . /etc/profile.d/java.sh
> [2022-05-24T14:07:52.127Z] +++ export JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/
> [2022-05-24T14:07:52.127Z] +++ JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/
> [2022-05-24T14:07:52.127Z] + sw hive-dev 
> /home/jenkins/agent/workspace/hive-precommit_PR-3317
> [2022-05-24T14:07:52.127Z] @ activating: 
> /home/jenkins/agent/workspace/hive-precommit_PR-3317/packaging/target/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/
>  for hive
> [2022-05-24T14:07:52.127Z] + ping -c2 dev_mysql
> [2022-05-24T14:07:52.127Z] PING dev_mysql (127.0.0.1) 56(84) bytes of data.
> [2022-05-24T14:07:52.127Z] 64 bytes from localhost (127.0.0.1): icmp_seq=1 
> ttl=64 time=0.114 ms
> [2022-05-24T14:07:53.107Z] 64 bytes from localhost (127.0.0.1): icmp_seq=2 
> ttl=64 time=0.123 ms
> [2022-05-24T14:07:53.107Z] 
> [2022-05-24T14:07:53.107Z] --- dev_mysql ping statistics ---
> [2022-05-24T14:07:53.107Z] 2 packets transmitted, 2 received, 0% packet loss, 
> time 49ms
> [2022-05-24T14:07:53.107Z] rtt min/avg/max/mdev = 0.114/0.118/0.123/0.011 ms
> [2022-05-24T14:07:53.107Z] + export DOCKER_NETWORK=host
> [2022-05-24T14:07:53.107Z] + DOCKER_NETWORK=host
> [2022-05-24T14:07:53.107Z] + export DBNAME=metastore
> [2022-05-24T14:07:53.107Z] + DBNAME=metastore
> [2022-05-24T14:07:53.107Z] + reinit_metastore mysql
> [2022-05-24T14:07:53.107Z] @ initializing: mysql
> [2022-05-24T14:07:53.107Z] metastore database name: metastore
> [2022-05-24T14:07:53.381Z] @ starting dev_mysql...
> [2022-05-24T14:07:53.382Z] Unable to find image 'mariadb:latest' locally
> [2022-05-24T14:07:54.354Z] latest: Pulling from library/mariadb
> [2022-05-24T14:07:54.354Z] 125a6e411906: Pulling fs layer
> [2022-05-24T14:07:54.354Z] a28b55cc656d: Pulling fs layer
> [2022-05-24T14:07:54.354Z] f2325f4e25a1: Pulling fs layer
> [2022-05-24T14:07:54.354Z] c6c2d09f748d: Pulling fs layer
> [2022-05-24T14:07:54.354Z] af2b4ed853d2: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 8394ac6b401e: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 5b150cf0c5a7: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 1b11b2e20899: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 3d35790a91d9: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 5e73c7793365: Pulling fs layer
> [2022-05-24T14:07:54.354Z] 3d34b9f14ede: Pulling fs layer
> [2022-05-24T14:07:54.354Z] c6c2d09f748d: Waiting
> [2022-05-24T14:07:54.354Z] 8394ac6b401e: Waiting
> [2022-05-24T14:07:54.354Z] 5b150cf0c5a7: Waiting
> [2022-05-24T14:07:54.354Z] 3d35790a91d9: Waiting
> [2022-05-24T14:07:54.354Z] 5e73c7793365: Waiting
> [2022-05-24T14:07:54.354Z] 3d34b9f14ede: Waiting
> 

[jira] [Commented] (HIVE-26158) TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after rename table

2022-05-11 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534954#comment-17534954
 ] 

Zoltan Haindrich commented on HIVE-26158:
-

[~sanguines] there were some customers reporting this happening for them - and 
I was already working on a patch...and honestly I could have noticed how this 
could be done more precisely in the previous ticket... ($#%)

let me know if you are looking for some tickets to work on; I could try to find 
one

> TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after 
> rename table
> --
>
> Key: HIVE-26158
> URL: https://issues.apache.org/jira/browse/HIVE-26158
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: tanghui
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: metastore_translator, pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> After the patch is updated, the partition table location and hdfs data 
> directory are displayed normally, but the partition location of the table in 
> the SDS in the Hive metabase is still displayed as the location of the old 
> table, resulting in no data in the query partition.
>  
> in beeline:
> 
> set hive.create.as.external.legacy=true;
> CREATE TABLE part_test(
> c1 string
> ,c2 string
> )PARTITIONED BY (dat string)
> insert into part_test values ("11","th","20220101")
> insert into part_test values ("22","th","20220102")
> alter table part_test rename to part_test11;
> --this result is null.
> select * from part_test11 where dat="20220101";
> ||part_test.c1||part_test.c2||part_test.dat||
> | | | |
> -
> SDS in the Hive metabase:
> select SDS.LOCATION from TBLS,SDS where TBLS.TBL_NAME="part_test11" AND 
> TBLS.TBL_ID=SDS.CD_ID;
> ---
> |*LOCATION*|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test11|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220101|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220102|
> ---
>  
> We need to modify the partition location of the table in SDS to ensure that 
> the query results are normal



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26158) TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after rename table

2022-05-11 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-26158.
-
Fix Version/s: 4.0.0
   4.0.0-alpha-2
   Resolution: Fixed

> TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after 
> rename table
> --
>
> Key: HIVE-26158
> URL: https://issues.apache.org/jira/browse/HIVE-26158
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: tanghui
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: metastore_translator, pull-request-available
> Fix For: 4.0.0, 4.0.0-alpha-2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> After the patch is updated, the partition table location and hdfs data 
> directory are displayed normally, but the partition location of the table in 
> the SDS in the Hive metabase is still displayed as the location of the old 
> table, resulting in no data in the query partition.
>  
> in beeline:
> 
> set hive.create.as.external.legacy=true;
> CREATE TABLE part_test(
> c1 string
> ,c2 string
> )PARTITIONED BY (dat string)
> insert into part_test values ("11","th","20220101")
> insert into part_test values ("22","th","20220102")
> alter table part_test rename to part_test11;
> --this result is null.
> select * from part_test11 where dat="20220101";
> ||part_test.c1||part_test.c2||part_test.dat||
> | | | |
> -
> SDS in the Hive metabase:
> select SDS.LOCATION from TBLS,SDS where TBLS.TBL_NAME="part_test11" AND 
> TBLS.TBL_ID=SDS.CD_ID;
> ---
> |*LOCATION*|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test11|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220101|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220102|
> ---
>  
> We need to modify the partition location of the table in SDS to ensure that 
> the query results are normal



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26158) TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after rename table

2022-05-11 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534955#comment-17534955
 ] 

Zoltan Haindrich commented on HIVE-26158:
-

merged into master; Thank you Saihemanth Gantasala for reviewing the changes!

> TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after 
> rename table
> --
>
> Key: HIVE-26158
> URL: https://issues.apache.org/jira/browse/HIVE-26158
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: tanghui
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: metastore_translator, pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> After the patch is updated, the partition table location and hdfs data 
> directory are displayed normally, but the partition location of the table in 
> the SDS in the Hive metabase is still displayed as the location of the old 
> table, resulting in no data in the query partition.
>  
> in beeline:
> 
> set hive.create.as.external.legacy=true;
> CREATE TABLE part_test(
> c1 string
> ,c2 string
> )PARTITIONED BY (dat string)
> insert into part_test values ("11","th","20220101")
> insert into part_test values ("22","th","20220102")
> alter table part_test rename to part_test11;
> --this result is null.
> select * from part_test11 where dat="20220101";
> ||part_test.c1||part_test.c2||part_test.dat||
> | | | |
> -
> SDS in the Hive metabase:
> select SDS.LOCATION from TBLS,SDS where TBLS.TBL_NAME="part_test11" AND 
> TBLS.TBL_ID=SDS.CD_ID;
> ---
> |*LOCATION*|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test11|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220101|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220102|
> ---
>  
> We need to modify the partition location of the table in SDS to ensure that 
> the query results are normal



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26220) Shade & relocate dependencies in hive-exec to avoid conflicting with downstream projects

2022-05-11 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534713#comment-17534713
 ] 

Zoltan Haindrich commented on HIVE-26220:
-

[~csun] have you tried using the current hive-exec from master? 
the shading was improved some time ago; especially in HIVE-22126.

Probably the best would be to provide some usecases for the usage of the 
artifact - preferably with testcases; so that we don't break it again in the 
future...but I admit - this might not be a good ask...

Correct me if I'm wrong but it sounds a bit unfair to push the task of 
evaluating and upgrading other projects to run with the next version - just 
because they might upgrade to it (in my mind fixing this blocker task would 
mean that).

So I think the best middle ground could be to provide support for projects 
which do "their part" first - and they could link some development branches 
which is already using an 4.0.0-alpha-X release.

> Shade & relocate dependencies in hive-exec to avoid conflicting with 
> downstream projects
> 
>
> Key: HIVE-26220
> URL: https://issues.apache.org/jira/browse/HIVE-26220
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Chao Sun
>Priority: Blocker
>
> Currently projects like Spark, Trino/Presto, Iceberg, etc, are depending on 
> {{hive-exec:core}} which was removed in HIVE-25531. The reason these projects 
> use {{hive-exec:core}} is because they have the flexibility to exclude, shade 
> & relocate dependencies in {{hive-exec}} that conflict with the ones they 
> brought in by themselves. However, with {{hive-exec}} this is no longer 
> possible, since it is a fat jar that shade those dependencies but do not 
> relocate many of them.
> In order for the downstream projects to consume {{hive-exec}}, we will need 
> to make sure all the dependencies in {{hive-exec}} are properly shaded and 
> relocated, so they won't cause conflicts with those from the downstream.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-25976) Cleaner may remove files being accessed from a fetch-task-converted reader

2022-05-03 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17531212#comment-17531212
 ] 

Zoltan Haindrich commented on HIVE-25976:
-

attached a unit test which reproduces the behaviour

> Cleaner may remove files being accessed from a fetch-task-converted reader
> --
>
> Key: HIVE-25976
> URL: https://issues.apache.org/jira/browse/HIVE-25976
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
> Attachments: fetch_task_conv_compactor_test.patch
>
>
> in a nutshell the following happens:
> * query is compiled in fetch-task-converted mode
> * no real execution happensbut the locks are released
> * the HS2 is communicating with the client and uses the fetch-task to get the 
> rows - which in this case will directly read files from the table's 
> directory
> * client sleeps between reads - so there is ample time for other events...
> * cleaner wakes up and removes some files
> * in the next read the fetch-task encounters a read error...



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-25976) Cleaner may remove files being accessed from a fetch-task-converted reader

2022-05-03 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-25976:

Attachment: fetch_task_conv_compactor_test.patch

> Cleaner may remove files being accessed from a fetch-task-converted reader
> --
>
> Key: HIVE-25976
> URL: https://issues.apache.org/jira/browse/HIVE-25976
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
> Attachments: fetch_task_conv_compactor_test.patch
>
>
> in a nutshell the following happens:
> * query is compiled in fetch-task-converted mode
> * no real execution happensbut the locks are released
> * the HS2 is communicating with the client and uses the fetch-task to get the 
> rows - which in this case will directly read files from the table's 
> directory
> * client sleeps between reads - so there is ample time for other events...
> * cleaner wakes up and removes some files
> * in the next read the fetch-task encounters a read error...



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26158) TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after rename table

2022-05-02 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-26158:

Labels: metastore_translator pull-request-available  (was: 
pull-request-available)

> TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after 
> rename table
> --
>
> Key: HIVE-26158
> URL: https://issues.apache.org/jira/browse/HIVE-26158
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: tanghui
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: metastore_translator, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After the patch is updated, the partition table location and hdfs data 
> directory are displayed normally, but the partition location of the table in 
> the SDS in the Hive metabase is still displayed as the location of the old 
> table, resulting in no data in the query partition.
>  
> in beeline:
> 
> set hive.create.as.external.legacy=true;
> CREATE TABLE part_test(
> c1 string
> ,c2 string
> )PARTITIONED BY (dat string)
> insert into part_test values ("11","th","20220101")
> insert into part_test values ("22","th","20220102")
> alter table part_test rename to part_test11;
> --this result is null.
> select * from part_test11 where dat="20220101";
> ||part_test.c1||part_test.c2||part_test.dat||
> | | | |
> -
> SDS in the Hive metabase:
> select SDS.LOCATION from TBLS,SDS where TBLS.TBL_NAME="part_test11" AND 
> TBLS.TBL_ID=SDS.CD_ID;
> ---
> |*LOCATION*|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test11|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220101|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220102|
> ---
>  
> We need to modify the partition location of the table in SDS to ensure that 
> the query results are normal



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26184) COLLECT_SET with GROUP BY is very slow when some keys are highly skewed

2022-04-28 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529368#comment-17529368
 ] 

Zoltan Haindrich commented on HIVE-26184:
-

because the value will be the same - I think collecting any number of them into 
a SET will not make the key for it overload - unless the hashCode of that UUID 
value is always the same constant...but in that case we should fix that - 
because it will make slow all the other operations; including `contains`

> COLLECT_SET with GROUP BY is very slow when some keys are highly skewed
> ---
>
> Key: HIVE-26184
> URL: https://issues.apache.org/jira/browse/HIVE-26184
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.8, 3.1.3
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I observed some reducers spend 98% of CPU time in invoking 
> `java.util.HashMap#clear`.
> Looking the detail, I found COLLECT_SET reuses a LinkedHashSet and its 
> `clear` can be quite heavy when a relation has a small number of highly 
> skewed keys.
>  
> To reproduce the issue, first, we will create rows with a skewed key.
> {code:java}
> INSERT INTO test_collect_set
> SELECT '----' AS key, CAST(UUID() AS VARCHAR) 
> AS value
> FROM table_with_many_rows
> LIMIT 10;{code}
> Then, we will create many non-skewed rows.
> {code:java}
> INSERT INTO test_collect_set
> SELECT UUID() AS key, UUID() AS value
> FROM sample_datasets.nasdaq
> LIMIT 500;{code}
> We can observe the issue when we aggregate values by `key`.
> {code:java}
> SELECT key, COLLECT_SET(value) FROM group_by_skew GROUP BY key{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HIVE-26135) Invalid Anti join conversion may cause missing results

2022-04-26 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-26135.
-
Fix Version/s: 4.0.0-alpha-2
   Resolution: Fixed

merged into master. Thank you [~kkasa] for reviewing the changes!

> Invalid Anti join conversion may cause missing results
> --
>
> Key: HIVE-26135
> URL: https://issues.apache.org/jira/browse/HIVE-26135
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> right now I think the following is needed to trigger the issue:
> * left outer join
> * only select left hand side columns
> * conditional which is using some udf
> * the nullness of the udf is checked
> repro sql; in case the conversion happens the row with 'a' will be missing
> {code}
> drop table if exists t;
> drop table if exists n;
> create table t(a string) stored as orc;
> create table n(a string) stored as orc;
> insert into t values ('a'),('1'),('2'),(null);
> insert into n values ('a'),('b'),('1'),('3'),(null);
> explain select n.* from n left outer join t on (n.a=t.a) where 
> assert_true(t.a is null) is null;
> explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as 
> float) is null;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
> null;
> set hive.auto.convert.anti.join=false;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
> null;
> {code}
> resultset with hive.auto.convert.anti.join enabled:
> {code}
> +--+
> | n.a  |
> +--+
> | b|
> | 3|
> +--+
> {code}
> correct resultset with hive.auto.convert.anti.join disabled:
> {code}
> +---+
> |  n.a  |
> +---+
> | a |
> | b |
> | 3 |
> | NULL  |
> +---+
> {code}
> workaround could be to disable the feature:
> {code}
> set hive.auto.convert.anti.join=false;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HIVE-26135) Invalid Anti join conversion may cause missing results

2022-04-25 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-26135:

Description: 
right now I think the following is needed to trigger the issue:
* left outer join
* only select left hand side columns
* conditional which is using some udf
* the nullness of the udf is checked

repro sql; in case the conversion happens the row with 'a' will be missing
{code}
drop table if exists t;
drop table if exists n;

create table t(a string) stored as orc;
create table n(a string) stored as orc;

insert into t values ('a'),('1'),('2'),(null);
insert into n values ('a'),('b'),('1'),('3'),(null);


explain select n.* from n left outer join t on (n.a=t.a) where assert_true(t.a 
is null) is null;
explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as 
float) is null;


select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
null;
set hive.auto.convert.anti.join=false;
select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
null;

{code}

resultset with hive.auto.convert.anti.join enabled:
{code}
+--+
| n.a  |
+--+
| b|
| 3|
+--+
{code}

correct resultset with hive.auto.convert.anti.join disabled:
{code}
+---+
|  n.a  |
+---+
| a |
| b |
| 3 |
| NULL  |
+---+
{code}


workaround could be to disable the feature:
{code}
set hive.auto.convert.anti.join=false;
{code}


  was:
right now I think the following is needed to trigger the issue:
* left outer join
* only select left hand side columns
* conditional which is using some udf
* the nullness of the udf is checked

repro sql; in case the conversion happens the row with 'a' will be missing
{code}
drop table if exists t;
drop table if exists n;

create table t(a string) stored as orc;
create table n(a string) stored as orc;

insert into t values ('a'),('1'),('2'),(null);
insert into n values ('a'),('b'),('1'),('3'),(null);


explain select n.* from n left outer join t on (n.a=t.a) where assert_true(t.a 
is null) is null;
explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as 
float) is null;


select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
null;
set hive.auto.convert.anti.join=false;
select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
null;

{code}



workaround could be to disable the feature:
{code}
set hive.auto.convert.anti.join=false;
{code}



> Invalid Anti join conversion may cause missing results
> --
>
> Key: HIVE-26135
> URL: https://issues.apache.org/jira/browse/HIVE-26135
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> right now I think the following is needed to trigger the issue:
> * left outer join
> * only select left hand side columns
> * conditional which is using some udf
> * the nullness of the udf is checked
> repro sql; in case the conversion happens the row with 'a' will be missing
> {code}
> drop table if exists t;
> drop table if exists n;
> create table t(a string) stored as orc;
> create table n(a string) stored as orc;
> insert into t values ('a'),('1'),('2'),(null);
> insert into n values ('a'),('b'),('1'),('3'),(null);
> explain select n.* from n left outer join t on (n.a=t.a) where 
> assert_true(t.a is null) is null;
> explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as 
> float) is null;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
> null;
> set hive.auto.convert.anti.join=false;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
> null;
> {code}
> resultset with hive.auto.convert.anti.join enabled:
> {code}
> +--+
> | n.a  |
> +--+
> | b|
> | 3|
> +--+
> {code}
> correct resultset with hive.auto.convert.anti.join disabled:
> {code}
> +---+
> |  n.a  |
> +---+
> | a |
> | b |
> | 3 |
> | NULL  |
> +---+
> {code}
> workaround could be to disable the feature:
> {code}
> set hive.auto.convert.anti.join=false;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26158) TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after rename table

2022-04-25 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527440#comment-17527440
 ] 

Zoltan Haindrich commented on HIVE-26158:
-

[~sanguines] I've also just bumped into the exact same thing - let me know if 
you would like to pick this up
I'll probably post a patch for it in the next couple days

> TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after 
> rename table
> --
>
> Key: HIVE-26158
> URL: https://issues.apache.org/jira/browse/HIVE-26158
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: tanghui
>Assignee: Zoltan Haindrich
>Priority: Major
>
> After the patch is updated, the partition table location and hdfs data 
> directory are displayed normally, but the partition location of the table in 
> the SDS in the Hive metabase is still displayed as the location of the old 
> table, resulting in no data in the query partition.
>  
> in beeline:
> 
> set hive.create.as.external.legacy=true;
> CREATE TABLE part_test(
> c1 string
> ,c2 string
> )PARTITIONED BY (dat string)
> insert into part_test values ("11","th","20220101")
> insert into part_test values ("22","th","20220102")
> alter table part_test rename to part_test11;
> --this result is null.
> select * from part_test11 where dat="20220101";
> ||part_test.c1||part_test.c2||part_test.dat||
> | | | |
> -
> SDS in the Hive metabase:
> select SDS.LOCATION from TBLS,SDS where TBLS.TBL_NAME="part_test11" AND 
> TBLS.TBL_ID=SDS.CD_ID;
> ---
> |*LOCATION*|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test11|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220101|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220102|
> ---
>  
> We need to modify the partition location of the table in SDS to ensure that 
> the query results are normal



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HIVE-26158) TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after rename table

2022-04-22 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-26158:
---

Assignee: Zoltan Haindrich

> TRANSLATED_TO_EXTERNAL partition tables cannot query partition data after 
> rename table
> --
>
> Key: HIVE-26158
> URL: https://issues.apache.org/jira/browse/HIVE-26158
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 4.0.0-alpha-1, 4.0.0-alpha-2
>Reporter: tanghui
>Assignee: Zoltan Haindrich
>Priority: Major
>
> After the patch is updated, the partition table location and hdfs data 
> directory are displayed normally, but the partition location of the table in 
> the SDS in the Hive metabase is still displayed as the location of the old 
> table, resulting in no data in the query partition.
>  
> in beeline:
> 
> set hive.create.as.external.legacy=true;
> CREATE TABLE part_test(
> c1 string
> ,c2 string
> )PARTITIONED BY (dat string)
> insert into part_test values ("11","th","20220101")
> insert into part_test values ("22","th","20220102")
> alter table part_test rename to part_test11;
> --this result is null.
> select * from part_test11 where dat="20220101";
> ||part_test.c1||part_test.c2||part_test.dat||
> | | | |
> -
> SDS in the Hive metabase:
> select SDS.LOCATION from TBLS,SDS where TBLS.TBL_NAME="part_test11" AND 
> TBLS.TBL_ID=SDS.CD_ID;
> ---
> |*LOCATION*|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test11|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220101|
> |hdfs://nameservice1/warehouse/tablespace/external/hive/part_test/dat=20220102|
> ---
>  
> We need to modify the partition location of the table in SDS to ensure that 
> the query results are normal



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HIVE-26163) Incorrect format in columnstats_columnname_parse.q's insert statement can cause exceptions

2022-04-22 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526261#comment-17526261
 ] 

Zoltan Haindrich commented on HIVE-26163:
-

is something going wrong while processing this?
{code}
insert into table2 values("1","1","1");
{code}

Is this problem flaky?

but in any case I think this is a serious issue - and we should fix it without 
altering the qfile

> Incorrect format in columnstats_columnname_parse.q's insert statement can 
> cause exceptions
> --
>
> Key: HIVE-26163
> URL: https://issues.apache.org/jira/browse/HIVE-26163
> Project: Hive
>  Issue Type: Improvement
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Minor
>
> Exception:
> {code:java}
> 2022-04-20T10:13:06,467 ERROR [016f5292-40a7-4fe6-be58-1c988fa4a6e5 main] 
> metastore.RetryingHMSHandler: java.lang.IndexOutOfBoundsException: Index: 0
>   at java.util.Collections$EmptyList.get(Collections.java:4456)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updatePartColumnStatsWithMerge(HiveMetaStore.java:9099)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:9054)
>   at sun.reflect.GeneratedMethodAccessor193.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:160)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:121)
>   at com.sun.proxy.$Proxy59.set_aggr_stats_for(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.setPartitionColumnStatistics(HiveMetaStoreClient.java:2974)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.setPartitionColumnStatistics(SessionHiveMetaStoreClient.java:571)
>   at sun.reflect.GeneratedMethodAccessor192.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:216)
>   at com.sun.proxy.$Proxy60.setPartitionColumnStatistics(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.setPartitionColumnStatistics(Hive.java:5583)
>   at 
> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.persistColumnStats(ColStatsProcessor.java:223)
>   at 
> org.apache.hadoop.hive.ql.stats.ColStatsProcessor.process(ColStatsProcessor.java:94)
>   at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:107)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:775)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:524)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:518)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:232)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:421)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:352)
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:853)
>   at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:823)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:192)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
>   at 
> org.apache.hadoop.hive.cli.split4.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
>   at sun.reflect.GeneratedMethodAccessor180.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   

[jira] [Commented] (HIVE-26135) Invalid Anti join conversion may cause missing results

2022-04-12 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521205#comment-17521205
 ] 

Zoltan Haindrich commented on HIVE-26135:
-

wanted to add a check for "Strong"-ness; however, consider:
{code}
(leftCol + rightCol) IS NULL
{code}
since we want to deduce that the nullness of the expression strongly depends on 
that `rightCol` can not be anything else than `null`... like:
{code}
(a + null) IS NULL
{code}

however; if the lefthandside is null - could also make it null; and in case 
rightCol is not in the joinkeys we could loose correct results...



> Invalid Anti join conversion may cause missing results
> --
>
> Key: HIVE-26135
> URL: https://issues.apache.org/jira/browse/HIVE-26135
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> right now I think the following is needed to trigger the issue:
> * left outer join
> * only select left hand side columns
> * conditional which is using some udf
> * the nullness of the udf is checked
> repro sql; in case the conversion happens the row with 'a' will be missing
> {code}
> drop table if exists t;
> drop table if exists n;
> create table t(a string) stored as orc;
> create table n(a string) stored as orc;
> insert into t values ('a'),('1'),('2'),(null);
> insert into n values ('a'),('b'),('1'),('3'),(null);
> explain select n.* from n left outer join t on (n.a=t.a) where 
> assert_true(t.a is null) is null;
> explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as 
> float) is null;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
> null;
> set hive.auto.convert.anti.join=false;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
> null;
> {code}
> workaround could be to disable the feature:
> {code}
> set hive.auto.convert.anti.join=false;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26135) Invalid Anti join conversion may cause missing results

2022-04-12 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-26135:
---


> Invalid Anti join conversion may cause missing results
> --
>
> Key: HIVE-26135
> URL: https://issues.apache.org/jira/browse/HIVE-26135
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> right now I think the following is needed to trigger the issue:
> * left outer join
> * only select left hand side columns
> * conditional which is using some udf
> * the nullness of the udf is checked
> repro sql; in case the conversion happens the row with 'a' will be missing
> {code}
> drop table if exists t;
> drop table if exists n;
> create table t(a string) stored as orc;
> create table n(a string) stored as orc;
> insert into t values ('a'),('1'),('2'),(null);
> insert into n values ('a'),('b'),('1'),('3'),(null);
> explain select n.* from n left outer join t on (n.a=t.a) where 
> assert_true(t.a is null) is null;
> explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as 
> float) is null;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
> null;
> set hive.auto.convert.anti.join=false;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
> null;
> {code}
> workaround could be to disable the feature:
> {code}
> set hive.auto.convert.anti.join=false;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26117) Remove 2 superfluous lines of code in genJoinRelNode

2022-04-12 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-26117:
---

Assignee: Steve Carlin

> Remove 2 superfluous lines of code in genJoinRelNode
> 
>
> Key: HIVE-26117
> URL: https://issues.apache.org/jira/browse/HIVE-26117
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The code was rewritten to associate ASTNodes to RexNodes.  Some code was left 
> behind that doesn't add any value.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-26002) Create db scripts for 4.0.0-alpha-1

2022-03-03 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500921#comment-17500921
 ] 

Zoltan Haindrich commented on HIVE-26002:
-

schematool is doing pretty basic comparisions we could easily get into 
trouble if we have 2 versions from which one is the prefix of the other (ex: 
4.0.0 vs 4.0.0-alpha-1)

https://github.com/apache/hive/blob/95c6155677b5a288b6bc571b11caf2c8eb80825f/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreSchemaInfo.java#L106

https://the-asf.slack.com/archives/CFSSP9UPJ/p1646326383307109?thread_ts=1646235033.395189=CFSSP9UPJ



> Create db scripts for 4.0.0-alpha-1
> ---
>
> Key: HIVE-26002
> URL: https://issues.apache.org/jira/browse/HIVE-26002
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Priority: Major
> Fix For: 4.0.0-alpha-1
>
>
> For the release we need to create the appropriate sql scripts for HMS db 
> initialization



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25994) Analyze table runs into ClassNotFoundException-s in case binary distribution is used

2022-03-02 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-25994:
---

Assignee: Zoltan Haindrich

> Analyze table runs into ClassNotFoundException-s in case binary distribution 
> is used
> 
>
> Key: HIVE-25994
> URL: https://issues.apache.org/jira/browse/HIVE-25994
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
> Fix For: 4.0.0-alpha-1
>
>
> any nightly release can be used to reproduce this:
> {code}
> create table t (a integer); insert into t values (1) ; analyze table t 
> compute statistics for columns;
> {code}
> results in
> {code}
> Caused by: java.lang.NoClassDefFoundError: org/antlr/runtime/tree/CommonTree
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
> at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
> at java.lang.Class.getDeclaredConstructors0(Native Method)
> at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
> at java.lang.Class.getConstructor0(Class.java:3075)
> at java.lang.Class.getDeclaredConstructor(Class.java:2178)
> at 
> org.apache.hive.com.esotericsoftware.reflectasm.ConstructorAccess.get(ConstructorAccess.java:65)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultInstantiatorStrategy.newInstantiatorOf(DefaultInstantiatorStrategy.java:60)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1119)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1128)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:153)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:118)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:729)
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ReflectField.read(ReflectField.java:125)
> ... 38 more
> Caused by: java.lang.ClassNotFoundException: org.antlr.runtime.tree.CommonTree
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2022-03-01 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-23556:

Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

merged into master. Thank you [~ibenny]!

> Support hive.metastore.limit.partition.request for get_partitions_ps
> 
>
> Key: HIVE-23556
> URL: https://issues.apache.org/jira/browse/HIVE-23556
> Project: Hive
>  Issue Type: Improvement
>Reporter: Toshihiko Uchida
>Assignee: iBenny
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23556.2.patch, HIVE-23556.3.patch, 
> HIVE-23556.4.patch, HIVE-23556.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
> limit the number of partitions that can be requested.
> Currently, it takes in effect for the following MetaStore APIs
> * get_partitions,
> * get_partitions_with_auth,
> * get_partitions_by_filter,
> * get_partitions_spec_by_filter,
> * get_partitions_by_expr,
> but not for
> * get_partitions_ps,
> * get_partitions_ps_with_auth.
> This issue proposes to apply the configuration also to get_partitions_ps and 
> get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25994) Analyze table runs into ClassNotFoundException-s in case binary distribution is used

2022-03-01 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-25994:

Fix Version/s: 4.0.0-alpha-1

> Analyze table runs into ClassNotFoundException-s in case binary distribution 
> is used
> 
>
> Key: HIVE-25994
> URL: https://issues.apache.org/jira/browse/HIVE-25994
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
> Fix For: 4.0.0-alpha-1
>
>
> any nightly release can be used to reproduce this:
> {code}
> create table t (a integer); insert into t values (1) ; analyze table t 
> compute statistics for columns;
> {code}
> results in
> {code}
> Caused by: java.lang.NoClassDefFoundError: org/antlr/runtime/tree/CommonTree
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
> at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
> at java.lang.Class.getDeclaredConstructors0(Native Method)
> at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
> at java.lang.Class.getConstructor0(Class.java:3075)
> at java.lang.Class.getDeclaredConstructor(Class.java:2178)
> at 
> org.apache.hive.com.esotericsoftware.reflectasm.ConstructorAccess.get(ConstructorAccess.java:65)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultInstantiatorStrategy.newInstantiatorOf(DefaultInstantiatorStrategy.java:60)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1119)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1128)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:153)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:118)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:729)
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ReflectField.read(ReflectField.java:125)
> ... 38 more
> Caused by: java.lang.ClassNotFoundException: org.antlr.runtime.tree.CommonTree
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25665) Checkstyle LGPL files must not be in the release sources/binaries

2022-03-01 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-25665:

Fix Version/s: 4.0.0-alpha-1

> Checkstyle LGPL files must not be in the release sources/binaries
> -
>
> Key: HIVE-25665
> URL: https://issues.apache.org/jira/browse/HIVE-25665
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Stamatis Zampetakis
>Priority: Blocker
> Fix For: 4.0.0-alpha-1
>
>
> As discussed in the [dev 
> list|https://lists.apache.org/thread/r13e3236aa72a070b3267ed95f7cb3b45d3c4783fd4ca35f5376b1a35@%3cdev.hive.apache.org%3e]
>  LGPL files must not be present in the Apache released sources/binaries.
> The following files must not be present in the release:
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/checkstyle/checkstyle-noframes-sorted.xsl
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/storage-api/checkstyle/checkstyle-noframes-sorted.xsl
> https://github.com/apache/hive/blob/6e152aa28bc5116bf9210f9deb0f95d2d73183f7/standalone-metastore/checkstyle/checkstyle-noframes-sorted.xsl
> There may be other checkstyle LGPL files in the repo. All these should either 
> be removed entirely from the repository or selectively excluded from the 
> release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25987) Incorrectly formatted pom.xml error in Beeline

2022-02-28 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498977#comment-17498977
 ] 

Zoltan Haindrich commented on HIVE-25987:
-

note: for the [PR|https://github.com/apache/hive/pull/2824] in question the 
"tests passed" label was added in November ; roughly 5 months before it was 
mergedmergining such changes without re-running the CI is really risky...

I think these labels should be removed after say 15 days they were given...not 
sure how that could be done...

> Incorrectly formatted pom.xml error in Beeline
> --
>
> Key: HIVE-25987
> URL: https://issues.apache.org/jira/browse/HIVE-25987
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Abhay
>Priority: Major
>
> After applying the patch [https://github.com/apache/hive/pull/3043,] 
> HIVE-25750, the precommit tests have started complaining of this 
> *!!! incorrectly formatted pom.xmls detected; see above!*
> The code built fine locally and the pre-commit tests had run fine. Need to 
> investigate further why this was not caught earlier but the pom.xml file 
> needs to be fixed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25970) Missing messages in HS2 operation logs

2022-02-25 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498120#comment-17498120
 ] 

Zoltan Haindrich commented on HIVE-25970:
-

we just talked with [~zabetak]; and HIVE-24590 makes HIVE-22753 unneccessary - 
and it may only cause trouble (lost messages)

> Missing messages in HS2 operation logs
> --
>
> Key: HIVE-25970
> URL: https://issues.apache.org/jira/browse/HIVE-25970
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> After HIVE-22753 & HIVE-24590, with some unlucky timing of events, operation 
> log messages can get lost and never appear in the appropriate files.
> The changes in HIVE-22753 will prevent a {{HushableRandomAccessFileAppender}} 
> from being created if the latter refers to a file that has been closed in the 
> last second. Preventing the creation of the appender also means that the 
> message which triggered the creation will be lost forever. In fact any 
> message (for the same query) that comes in the interval of 1 second will be 
> lost forever.
> Before HIVE-24590 the appender/file was closed only once (explicitly by HS2) 
> and thus the problem may be very hard to notice in practice. However, with 
> the arrival of HIVE-24590 appenders may close much more frequently (and not 
> via HS2) making the issue reproducible rather easily. It suffices to set 
> _hive.server2.operation.log.purgePolicy.timeToLive_ property very low and 
> check the operation logs.
> The problem was discovered by investigating some intermittent failures in 
> operation logging tests (e.g.,  TestOperationLoggingAPIWithTez).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25977) Enhance Compaction Cleaner to skip when there is nothing to do #2

2022-02-23 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-25977:
---


> Enhance Compaction Cleaner to skip when there is nothing to do #2
> -
>
> Key: HIVE-25977
> URL: https://issues.apache.org/jira/browse/HIVE-25977
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> initially this was just an addendum to the original patch ; but got delayed 
> and altered - so it should have its own ticket



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25874) Slow filter evaluation of nest struct fields in vectorized executions

2022-02-22 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-25874.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~kkasa] for reviewing the changes!

> Slow filter evaluation of nest struct fields in vectorized executions
> -
>
> Key: HIVE-25874
> URL: https://issues.apache.org/jira/browse/HIVE-25874
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> time is spent at resizing vectors around 
> [here|https://github.com/apache/hive/blob/200c0bf1feb259f4d95bf065a2ab38fe684383da/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java#L252]
>  or in some other "ensureSize" method
> {code:java}
> create table t as
> select
> named_struct('id',13,'str','string','nest',named_struct('id',12,'str','string','arr',array('value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value')))
> s;
> -- go up to 1M rows
> insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> -- insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> set hive.fetch.task.conversion=none;
> select count(1) from t;
> --explain
> select s
> .id from t
> where 
> s
> .nest
> .id  > 0;
>  {code}
> interestingly; the issue is not present:
> * for a query not looking into the nested struct
> * and in case the struct with the array is at the top level
> {code}
> select count(1) from t;
> --explain
> select s
> .id from t
> where 
> s
> -- .nest
> .id  > 0;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25844) Exception deserialization error-s may cause beeline to terminate immediately

2022-02-22 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-25844.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into branch-3. Thank you Krisztian for reviewing the changes@!

> Exception deserialization error-s may cause beeline to terminate immediately
> 
>
> Key: HIVE-25844
> URL: https://issues.apache.org/jira/browse/HIVE-25844
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 3.1.2
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> the exception on the server side happens:
>  * fetch task conversion is on
>  * there is an exception during reading the table the error bubbles up
>  * => transmits a message to beeline that error class name is: 
> "org.apache.phoenix.schema.ColumnNotFoundException" + the message
>  * it tries to reconstruct the exception around HiveSqlException
>  * but during the constructor call 
> org.apache.phoenix.exception.SQLExceptionCode is needed which fails to load 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service
>  * a
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service is thrown - which 
> is not handled in that method - so it becomes a real error ; and shuts down 
> the client
> {code:java}
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service
> [...]
> at java.lang.Class.forName(Class.java:264)
> at 
> org.apache.hive.service.cli.HiveSQLException.newInstance(HiveSQLException.java:245)
> at 
> org.apache.hive.service.cli.HiveSQLException.toStackTrace(HiveSQLException.java:211)
> [...]
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.Service
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-21152) Rewrite if expression to case and recognize simple case as an if

2022-02-17 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-21152:

Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

merged into master. Thank you [~kkasa] for revieweing the changes!

> Rewrite if expression to case and recognize simple case as an if
> 
>
> Key: HIVE-21152
> URL: https://issues.apache.org/jira/browse/HIVE-21152
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21152.01.patch, HIVE-21152.02.patch, 
> HIVE-21152.03.patch, HIVE-21152.04.patch, HIVE-21152.05.patch, 
> HIVE-21152.06.patch, HIVE-21152.07.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * {{IF}} is not part of the sql standard; however given its special form its 
> simpler - and currently in Hive it also has vectorized support
> * people writing standard sql may write: {{CASE WHEN member=1 THEN attr+1 
> else attr+2 END}} which is essentially an if.
> The idea is to rewrite IFs to CASEs for the cbo; and recognize simple 
> "CASE"-s as IFs to get vectorization on them if possible



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25715) Provide nightly builds

2022-02-17 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-25715.
-
Resolution: Fixed

merged into master. Thank you [~kkasa] for reviewing the changes!

> Provide nightly builds
> --
>
> Key: HIVE-25715
> URL: https://issues.apache.org/jira/browse/HIVE-25715
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> provide nightly builds for the master branch



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25944) Format pom.xml-s

2022-02-16 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-25944.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~dengzh] for reviewing the changes!

> Format pom.xml-s
> 
>
> Key: HIVE-25944
> URL: https://issues.apache.org/jira/browse/HIVE-25944
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> at the moment I touch pom.xml-s with xmlstarlet it starts fixing indentation 
> which makes seeing real diffs harder.
> fix and enforce that the pom.xmls are indented correctly



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25942) Upgrade commons-io to 2.8.0 due to CVE-2021-29425

2022-02-13 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-25942.
-
Resolution: Fixed

merged into master. Thank you [~srahman]!

> Upgrade commons-io to 2.8.0 due to CVE-2021-29425
> -
>
> Key: HIVE-25942
> URL: https://issues.apache.org/jira/browse/HIVE-25942
> Project: Hive
>  Issue Type: Bug
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Due to [CVE-2021-29425|https://nvd.nist.gov/vuln/detail/CVE-2021-29425] all 
> the commons-io versions below 2.7 are affected.
> Tez and Hadoop have upgraded commons-io to 2.8.0 in 
> [TEZ-4353|https://issues.apache.org/jira/browse/TEZ-4353] and 
> [HADOOP-17683|https://issues.apache.org/jira/browse/HADOOP-17683] 
> respectively and it will be good if Hive also follows the same.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25944) Format pom.xml-s

2022-02-09 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-25944:
---


> Format pom.xml-s
> 
>
> Key: HIVE-25944
> URL: https://issues.apache.org/jira/browse/HIVE-25944
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> at the moment I touch pom.xml-s with xmlstarlet it starts fixing indentation 
> which makes seeing real diffs harder.
> fix and enforce that the pom.xmls are indented correctly



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-23556) Support hive.metastore.limit.partition.request for get_partitions_ps

2022-02-09 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489476#comment-17489476
 ] 

Zoltan Haindrich commented on HIVE-23556:
-

[~touchida] could you open a PR against the hive repo on github?

https://github.com/apache/hive/pulls

> Support hive.metastore.limit.partition.request for get_partitions_ps
> 
>
> Key: HIVE-23556
> URL: https://issues.apache.org/jira/browse/HIVE-23556
> Project: Hive
>  Issue Type: Improvement
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Minor
> Attachments: HIVE-23556.2.patch, HIVE-23556.3.patch, 
> HIVE-23556.4.patch, HIVE-23556.patch
>
>
> HIVE-13884 added the configuration hive.metastore.limit.partition.request to 
> limit the number of partitions that can be requested.
> Currently, it takes in effect for the following MetaStore APIs
> * get_partitions,
> * get_partitions_with_auth,
> * get_partitions_by_filter,
> * get_partitions_spec_by_filter,
> * get_partitions_by_expr,
> but not for
> * get_partitions_ps,
> * get_partitions_ps_with_auth.
> This issue proposes to apply the configuration also to get_partitions_ps and 
> get_partitions_ps_with_auth.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-24887) getDatabase() to call translation code even if client has no capabilities

2022-02-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24887:

Labels: metastore_translator pull-request-available  (was: 
pull-request-available)

> getDatabase() to call translation code even if client has no capabilities
> -
>
> Key: HIVE-24887
> URL: https://issues.apache.org/jira/browse/HIVE-24887
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: metastore_translator, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We do this for other calls that go thru translation layer. For some reason, 
> the current code only calls it when the client sets the capabilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-24920) TRANSLATED_TO_EXTERNAL tables may write to the same location

2022-02-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24920:

Labels: metastore_translator pull-request-available  (was: 
pull-request-available)

> TRANSLATED_TO_EXTERNAL tables may write to the same location
> 
>
> Key: HIVE-24920
> URL: https://issues.apache.org/jira/browse/HIVE-24920
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: metastore_translator, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code}
> create table t (a integer);
> insert into t values(1);
> alter table t rename to t2;
> create table t (a integer); -- I expected an exception from this command 
> (location already exists) but because its an external table no exception
> insert into t values(2);
> select * from t;  -- shows 1 and 2
> drop table t2;-- wipes out data location
> select * from t;  -- empty resultset
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25303) CTAS hive.create.as.external.legacy tries to place data files in managed WH path

2022-02-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-25303:

Labels: metastore_translator pull-request-available  (was: 
pull-request-available)

> CTAS hive.create.as.external.legacy tries to place data files in managed WH 
> path
> 
>
> Key: HIVE-25303
> URL: https://issues.apache.org/jira/browse/HIVE-25303
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: metastore_translator, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Under legacy table creation mode (hive.create.as.external.legacy=true), when 
> a database has been created in a specific LOCATION, in a session where that 
> database is Used, tables are created using the following command:
> {code:java}
> CREATE TABLE  AS SELECT {code}
> should inherit the HDFS path from the database's location. Instead, Hive is 
> trying to write the table data into 
> /warehouse/tablespace/managed/hive//
> +Design+: 
>  In the CTAS query, first data is written in the target directory (which 
> happens in HS2) and then the table is created(This happens in HMS). So here 
> two decisions are being made i) target directory location ii) how the table 
> should be created (table type, sd e.t.c).
>  When HS2 needs a target location that needs to be set, it'll make create a 
> table dry run call to HMS (where table translation happens) and i) and ii) 
> decisions are made within HMS and returns table object. Then HS2 will use 
> this location set by HMS for placing the data.
> The  patch for issue addresses the table location being incorrect and table 
> data being empty for the following cases 1) when the external legacy config 
> is set i.e.., hive.create.as.external.legacy=true 2) when the table is 
> created with the transactional property set to false i.e.., TBLPROPERTIES 
> ('transactional'='false')



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25782) Create Table As Select fails for managed ACID tables

2022-02-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-25782:

Labels: metastore_translator pull-request-available  (was: 
pull-request-available)

> Create Table As Select fails for managed ACID tables
> 
>
> Key: HIVE-25782
> URL: https://issues.apache.org/jira/browse/HIVE-25782
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Csaba Juhász
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: metastore_translator, pull-request-available
> Attachments: ctas_acid_managed.q
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Create Table As Select fails for managed ACID tables:
> *MetaException(message:Processor has no capabilities, cannot create an ACID 
> table.)*
> HMSHandler.translate_table_dryrun invokes 
> MetastoreDefaultTransformer.transformCreateTable with null 
> processorCapabilities and processorId.
> https://github.com/apache/hive/blob/c7fdd459305f4bf6913dc4bed7e8df8c7bf9e458/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java#L2251
> {code:java}
> Dec 06 05:32:47 Starting translation for CreateTable for processor null with 
> null on table vectortab10korc
> Dec 06 05:32:47 MetaException(message:Processor has no capabilities, cannot 
> create an ACID table.)
>   at 
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer.transformCreateTable(MetastoreDefaultTransformer.java:663)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.translate_table_dryrun(HiveMetaStore.java:2159)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>   at com.sun.proxy.$Proxy29.translate_table_dryrun(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$translate_table_dryrun.getResult(ThriftHiveMetastore.java:16981)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$translate_table_dryrun.getResult(ThriftHiveMetastore.java:16965)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:643)
>   at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:638)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>   at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:638)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> {code}
> Reproduction ([^ctas_acid_managed.q]):
> {code:java}
> set hive.support.concurrency=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set 
> metastore.metadata.transformer.class=org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer;
> create table test stored as orc tblproperties ('transactional'='true') as 
> select from_unixtime(unix_timestamp("0002-01-01 09:57:21", "-MM-dd 
> HH:mm:ss")); {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25630) Transformer fixes

2022-02-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-25630:

Labels: metastore_translator pull-request-available  (was: 
pull-request-available)

> Transformer fixes
> -
>
> Key: HIVE-25630
> URL: https://issues.apache.org/jira/browse/HIVE-25630
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: metastore_translator, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> there are some issues:
> * AlreadyExistsException might be suppressed by the translator
> * uppercase letter usage may cause problems for some clients
> * add a way to suppress location checks for legacy clients



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-24954) MetastoreTransformer is disabled during testing

2022-02-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24954:

Labels: metastore_translator pull-request-available  (was: 
pull-request-available)

> MetastoreTransformer is disabled during testing
> ---
>
> Key: HIVE-24954
> URL: https://issues.apache.org/jira/browse/HIVE-24954
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: metastore_translator, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> all calls are fortified with "isInTest" guards to avoid testing those calls 
> (!@#$#)
> https://github.com/apache/hive/blob/86fa9b30fe347c7fc78a2930f4d20ece2e124f03/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java#L1647
> this causes some wierd behaviour:
> out of the box hive installation creates TRANSLATED_TO_EXTERNAL external 
> tables for plain CREATE TABLE commands
> meanwhile during when most testing is executed CREATE table creates regular 
> MANAGED tables...



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-24951) Table created with Uppercase name using CTAS does not produce result for select queries

2022-02-07 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-24951:

Labels: metastore_translator pull-request-available  (was: 
pull-request-available)

> Table created with Uppercase name using CTAS does not produce result for 
> select queries
> ---
>
> Key: HIVE-24951
> URL: https://issues.apache.org/jira/browse/HIVE-24951
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 4.0.0
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
>  Labels: metastore_translator, pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Steps to repro:
> {code:java}
> CREATE EXTERNAL TABLE MY_TEST AS SELECT * FROM source
> Table created with Location but does not have any data moved to it.
> /warehouse/tablespace/external/hive/MY_TEST
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25707) SchemaTool may leave the metastore in-between upgrade steps

2022-01-26 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17482378#comment-17482378
 ] 

Zoltan Haindrich commented on HIVE-25707:
-

[~rahulp] yes; it could probably catch a lot of problematic cases

I've wrote a test for it - but we run the sql-s using sqlline ; if I disable 
auto-commit - the file is executed without being committed in the end...unless 
the jdbc driver autocommit-s it...

I leave a reference to my branch here - in case someone picks this up later
https://github.com/kgyrtkirk/hive/tree/HIVE-25707-schematool-commit

> SchemaTool may leave the metastore in-between upgrade steps
> ---
>
> Key: HIVE-25707
> URL: https://issues.apache.org/jira/browse/HIVE-25707
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> it seems like:
> * schematool runs the sql files via beeline
> * autocommit is turned on
> * pressing ctrl+c or killing the process will result in an invalid schema
> https://github.com/apache/hive/blob/6e02f6164385a370ee8014c795bee1fa423d7937/beeline/src/java/org/apache/hive/beeline/schematool/HiveSchemaTool.java#L79



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25883) Enhance Compaction Cleaner to skip when there is nothing to do

2022-01-25 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17481814#comment-17481814
 ] 

Zoltan Haindrich commented on HIVE-25883:
-

{code}
  //  aborted txn: 3881
  //  
com:8020/warehouse/tablespace/managed/hive/test_835163/base_0003209_v0003877
  //  
com:8020/warehouse/tablespace/managed/hive/test_835163/delta_0003561_0003561_000
  //  
@,type:MAJOR,enqueueTime:0,start:0,properties:null,runAs:hive,tooManyAborts:false,
  //  hasOldAbort:false,highestWriteId:3309,errorMessage:null
{code}

> Enhance Compaction Cleaner to skip when there is nothing to do
> --
>
> Key: HIVE-25883
> URL: https://issues.apache.org/jira/browse/HIVE-25883
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> the cleaner works the following way:
> * it identifies obsolete directories (delta dirs ; which doesn't have open 
> txns)
> * removes them and done
> if there are no obsolete directoris that is attributed to that there might be 
> open txns so the request should be retried later.
> however if for some reason the directory was already cleaned - similarily it 
> has no obsolete directories; and thus the request is retried for forever 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25672) Hive isn't purging older compaction entries from show compaction command

2022-01-24 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17481313#comment-17481313
 ] 

Zoltan Haindrich commented on HIVE-25672:
-

HIVE-25633 could cause the AcidHouseKeeperService to not run

> Hive isn't purging older compaction entries from show compaction command
> 
>
> Key: HIVE-25672
> URL: https://issues.apache.org/jira/browse/HIVE-25672
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore, Transactions
>Affects Versions: 3.1.1
>Reporter: Rohan Nimmagadda
>Priority: Minor
>
> Added below properties in hive-site, but it's not enforced to auto purging.
> When we run show compaction command it takes forever and returns billions of 
> rows.
> Result of show compactions command :
> {code:java}
> 752,450 rows selected (198.066 seconds) 
> {code}
> {code:java}
> hive.compactor.history.retention.succeeded": "10",
> "hive.compactor.history.retention.failed": "10",  
> "hive.compactor.history.retention.attempted": "10",  
> "hive.compactor.history.reaper.interval": "10m" {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25672) Hive isn't purging older compaction entries from show compaction command

2022-01-24 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17481265#comment-17481265
 ] 

Zoltan Haindrich commented on HIVE-25672:
-

I tried to reproduce this issue; at first metastore.compactor.initiator.on was 
disabled on my cluster for some reason; but after turning that on things 
started working correctly:
* a metastore with 52M of heap was able to cleanup 10K of records in no time
** and was OOM-ed for 100K
* a metastore with 966K rows in the COMPLETED_COMPACTIONS table
** removed 50773 rows multiple times - and was able to reduce the volume to 
below 100 in around a minute

I don't know if we have an issue here - as it seems like that most likely for 
some reason either the `AcidHouseKeeperService` is not running - or stopped 
running for some reason




> Hive isn't purging older compaction entries from show compaction command
> 
>
> Key: HIVE-25672
> URL: https://issues.apache.org/jira/browse/HIVE-25672
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore, Transactions
>Affects Versions: 3.1.1
>Reporter: Rohan Nimmagadda
>Priority: Minor
>
> Added below properties in hive-site, but it's not enforced to auto purging.
> When we run show compaction command it takes forever and returns billions of 
> rows.
> Result of show compactions command :
> {code:java}
> 752,450 rows selected (198.066 seconds) 
> {code}
> {code:java}
> hive.compactor.history.retention.succeeded": "10",
> "hive.compactor.history.retention.failed": "10",  
> "hive.compactor.history.retention.attempted": "10",  
> "hive.compactor.history.reaper.interval": "10m" {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25883) Enhance Compaction Cleaner to skip when there is nothing to do

2022-01-22 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-25883.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master; Thank you Denys for reviewing the changes!

> Enhance Compaction Cleaner to skip when there is nothing to do
> --
>
> Key: HIVE-25883
> URL: https://issues.apache.org/jira/browse/HIVE-25883
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> the cleaner works the following way:
> * it identifies obsolete directories (delta dirs ; which doesn't have open 
> txns)
> * removes them and done
> if there are no obsolete directoris that is attributed to that there might be 
> open txns so the request should be retried later.
> however if for some reason the directory was already cleaned - similarily it 
> has no obsolete directories; and thus the request is retried for forever 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25883) Enhance Compaction Cleaner to skip when there is nothing to do

2022-01-20 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-25883:
---


> Enhance Compaction Cleaner to skip when there is nothing to do
> --
>
> Key: HIVE-25883
> URL: https://issues.apache.org/jira/browse/HIVE-25883
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> the cleaner works the following way:
> * it identifies obsolete directories (delta dirs ; which doesn't have open 
> txns)
> * removes them and done
> if there are no obsolete directoris that is attributed to that there might be 
> open txns so the request should be retried later.
> however if for some reason the directory was already cleaned - similarily it 
> has no obsolete directories; and thus the request is retried for forever 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25874) Slow filter evaluation of nest struct fields in vectorized executions

2022-01-18 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17477819#comment-17477819
 ] 

Zoltan Haindrich commented on HIVE-25874:
-

issue is caused by that VectorStructField doesnt resets the output vector - 
which causes that the array in it will retain all previous elementsand it 
will keep expanding the backing vector.

it took 21 minutes to execute the query before the patch; after it 2seconds

> Slow filter evaluation of nest struct fields in vectorized executions
> -
>
> Key: HIVE-25874
> URL: https://issues.apache.org/jira/browse/HIVE-25874
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> time is spent at resizing vectors around 
> [here|https://github.com/apache/hive/blob/200c0bf1feb259f4d95bf065a2ab38fe684383da/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java#L252]
>  or in some other "ensureSize" method
> {code:java}
> create table t as
> select
> named_struct('id',13,'str','string','nest',named_struct('id',12,'str','string','arr',array('value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value')))
> s;
> -- go up to 1M rows
> insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> -- insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> set hive.fetch.task.conversion=none;
> select count(1) from t;
> --explain
> select s
> .id from t
> where 
> s
> .nest
> .id  > 0;
>  {code}
> interestingly; the issue is not present:
> * for a query not looking into the nested struct
> * and in case the struct with the array is at the top level
> {code}
> select count(1) from t;
> --explain
> select s
> .id from t
> where 
> s
> -- .nest
> .id  > 0;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25874) Slow filter evaluation of nest struct fields in vectorized executions

2022-01-18 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-25874:
---

Assignee: Zoltan Haindrich

> Slow filter evaluation of nest struct fields in vectorized executions
> -
>
> Key: HIVE-25874
> URL: https://issues.apache.org/jira/browse/HIVE-25874
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> time is spent at resizing vectors around 
> [here|https://github.com/apache/hive/blob/200c0bf1feb259f4d95bf065a2ab38fe684383da/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java#L252]
>  or in some other "ensureSize" method
> {code:java}
> create table t as
> select
> named_struct('id',13,'str','string','nest',named_struct('id',12,'str','string','arr',array('value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value')))
> s;
> -- go up to 1M rows
> insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> -- insert into table t select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t union all select * from t union all select * from t union all 
> select * from t;
> set hive.fetch.task.conversion=none;
> select count(1) from t;
> --explain
> select s
> .id from t
> where 
> s
> .nest
> .id  > 0;
>  {code}
> interestingly; the issue is not present:
> * for a query not looking into the nested struct
> * and in case the struct with the array is at the top level
> {code}
> select count(1) from t;
> --explain
> select s
> .id from t
> where 
> s
> -- .nest
> .id  > 0;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25874) Slow filter evaluation of nest struct fields in vectorized executions

2022-01-18 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-25874:

Description: 
time is spent at resizing vectors around 
[here|https://github.com/apache/hive/blob/200c0bf1feb259f4d95bf065a2ab38fe684383da/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java#L252]
 or in some other "ensureSize" method

{code:java}

create table t as
select
named_struct('id',13,'str','string','nest',named_struct('id',12,'str','string','arr',array('value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value')))
s;

-- go up to 1M rows
insert into table t select * from t union all select * from t union all select 
* from t union all select * from t union all select * from t union all select * 
from t union all select * from t union all select * from t union all select * 
from t;
insert into table t select * from t union all select * from t union all select 
* from t union all select * from t union all select * from t union all select * 
from t union all select * from t union all select * from t union all select * 
from t;
insert into table t select * from t union all select * from t union all select 
* from t union all select * from t union all select * from t union all select * 
from t union all select * from t union all select * from t union all select * 
from t;
insert into table t select * from t union all select * from t union all select 
* from t union all select * from t union all select * from t union all select * 
from t union all select * from t union all select * from t union all select * 
from t;
insert into table t select * from t union all select * from t union all select 
* from t union all select * from t union all select * from t union all select * 
from t union all select * from t union all select * from t union all select * 
from t;
-- insert into table t select * from t union all select * from t union all 
select * from t union all select * from t union all select * from t union all 
select * from t union all select * from t union all select * from t union all 
select * from t;


set hive.fetch.task.conversion=none;

select count(1) from t;
--explain
select s
.id from t
where 
s
.nest
.id  > 0;

 {code}


interestingly; the issue is not present:
* for a query not looking into the nested struct
* and in case the struct with the array is at the top level

{code}
select count(1) from t;
--explain
select s
.id from t
where 
s
-- .nest
.id  > 0;
{code}

  was:
{code:java}

create table t as
select
named_struct('id',13,'str','string','nest',named_struct('id',12,'str','string','arr',array('value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value')))
s;

-- go up to 1M rows
insert into table t select * from t union all select * from t union all select 
* from t union all select * from t union all select * from t union all select * 
from t union all select * from t union all select * from t union all select * 
from t;
insert into table t select * from t union all select * from t union all select 
* from t union all select * from t union all select * from t union all select * 
from t union all select * from t union all select * from t union all select * 
from t;
insert into table t select * from t union all select * from t union all select 
* from t union all select * from t union all select * from t union all select * 
from t union all select * from t union all select * from t union all select * 
from t;
insert into table t select * from t union all select * from t union all select 
* from t union all select * from t union all select * from t union all select * 
from t union all select * from t union all select * from t union all select * 
from t;
insert into table t select * from t union all select * from t union all select 
* from t union all select * from t union all select * from t union all select * 
from t union all select * from t union all select * from t union all select * 
from t;
-- insert into table t select * from t union all select * from t union all 
select * from t union all select * from t union all select * from t union all 
select * from t union all select * from t union all select * from t union all 
select * from t;


set hive.fetch.task.conversion=none;

select count(1) from t;
--explain
select s
.id from t
where 
s
.nest
.id  > 0;

 {code}


interestingly; the issue is not present:
* for a query not looking into the nested struct
* and in case the struct with the array is at the top level

{code}
select count(1) from t;
--explain
select s
.id from 

[jira] [Commented] (HIVE-25844) Exception deserialization error-s may cause beeline to terminate immediately

2022-01-11 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472679#comment-17472679
 ] 

Zoltan Haindrich commented on HIVE-25844:
-

makes sense; backported HIVE-24772 instead

> Exception deserialization error-s may cause beeline to terminate immediately
> 
>
> Key: HIVE-25844
> URL: https://issues.apache.org/jira/browse/HIVE-25844
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 3.1.2
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> the exception on the server side happens:
>  * fetch task conversion is on
>  * there is an exception during reading the table the error bubbles up
>  * => transmits a message to beeline that error class name is: 
> "org.apache.phoenix.schema.ColumnNotFoundException" + the message
>  * it tries to reconstruct the exception around HiveSqlException
>  * but during the constructor call 
> org.apache.phoenix.exception.SQLExceptionCode is needed which fails to load 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service
>  * a
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service is thrown - which 
> is not handled in that method - so it becomes a real error ; and shuts down 
> the client
> {code:java}
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service
> [...]
> at java.lang.Class.forName(Class.java:264)
> at 
> org.apache.hive.service.cli.HiveSQLException.newInstance(HiveSQLException.java:245)
> at 
> org.apache.hive.service.cli.HiveSQLException.toStackTrace(HiveSQLException.java:211)
> [...]
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.Service
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25844) Exception deserialization error-s may cause beeline to terminate immediately

2022-01-04 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-25844:
---


> Exception deserialization error-s may cause beeline to terminate immediately
> 
>
> Key: HIVE-25844
> URL: https://issues.apache.org/jira/browse/HIVE-25844
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 3.1.2
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> the exception on the server side happens:
>  * fetch task conversion is on
>  * there is an exception during reading the table the error bubbles up
>  * => transmits a message to beeline that error class name is: 
> "org.apache.phoenix.schema.ColumnNotFoundException" + the message
>  * it tries to reconstruct the exception around HiveSqlException
>  * but during the constructor call 
> org.apache.phoenix.exception.SQLExceptionCode is needed which fails to load 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service
>  * a
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service is thrown - which 
> is not handled in that method - so it becomes a real error ; and shuts down 
> the client
> {code:java}
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/shaded/com/google/protobuf/Service
> [...]
> at java.lang.Class.forName(Class.java:264)
> at 
> org.apache.hive.service.cli.HiveSQLException.newInstance(HiveSQLException.java:245)
> at 
> org.apache.hive.service.cli.HiveSQLException.toStackTrace(HiveSQLException.java:211)
> [...]
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.shaded.com.google.protobuf.Service
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25820) Provide a way to disable join filters

2022-01-03 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-25820.
-
Resolution: Won't Fix

this is not really a good option - as optionally disabling this feature may 
result in incorrect results in some form

> Provide a way to disable join filters
> -
>
> Key: HIVE-25820
> URL: https://issues.apache.org/jira/browse/HIVE-25820
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25822) Incorrect False positive result rows may be outputted in case outer join has conditions only affecting one side

2021-12-23 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-25822.
-
Resolution: Fixed

Merged into master.

> Incorrect False positive result rows may be outputted in case outer join has 
> conditions only affecting one side
> ---
>
> Key: HIVE-25822
> URL: https://issues.apache.org/jira/browse/HIVE-25822
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> needed
> * outer join
> * on condition has at least one condition for one side of the join
> * in a single reducer:
> ** a right hand side only row outputted right before
> ** >=2 rows on LHS and 1 on RHS matching in the join keys but the first LHS 
> doesn't satisfies the filter condition
> ** second LHS row with good filter condition
> {code}
> with
> t_y as (select col1 as id,col2 as s from (VALUES(0,'a'),(1,'y')) as c),
> t_xy as (select col1 as id,col2 as s from (VALUES(1,'x'),(1,'y')) as c) 
> select * from t_xy l full outer join t_y r on (l.id=r.id and l.s='y');
> {code}
> null,null,1,y is a false positive result
> {code}
> +---+---+---+---+
> | l.id  |  l.s  | r.id  |  r.s  |
> +---+---+---+---+
> | NULL  | NULL  | 0 | a |
> | 1 | x | NULL  | NULL  |
> | NULL  | NULL  | 1 | y |
> | 1 | y | 1 | y |
> +---+---+---+---+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25823) Incorrect false positive results for outer join using non-satisfiable residual filters

2021-12-20 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-25823.
-
Resolution: Duplicate

this is the same issue as HIVE-25822; I most likely made some mistake with 
checking the results on a different branch

> Incorrect false positive results for outer join using non-satisfiable 
> residual filters
> --
>
> Key: HIVE-25823
> URL: https://issues.apache.org/jira/browse/HIVE-25823
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> similar to HIVE-25822 
> {code}
> create table t_y (id integer,s string);
> create table t_xy (id integer,s string);
> insert into t_y values(0,'a'),(1,'y'),(1,'x');
> insert into t_xy values(1,'x'),(1,'y');
> select * from t_xy l full outer join t_y r on (l.id=r.id and l.s='y' and 
> l.id+2*r.id=1);
> {code}
> the rows with full of NULLs are incorrect
> {code}
> +---+---+---+---+
> | l.id  |  l.s  | r.id  |  r.s  |
> +---+---+---+---+
> | NULL  | NULL  | 0 | a |
> | NULL  | NULL  | NULL  | NULL  |
> | 1 | y | NULL  | NULL  |
> | NULL  | NULL  | NULL  | NULL  |
> | NULL  | NULL  | 1 | y |
> | NULL  | NULL  | 1 | x |
> +---+---+---+---+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25823) Incorrect false positive results for outer join using non-satisfiable residual filters

2021-12-20 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-25823:
---

Assignee: Zoltan Haindrich

> Incorrect false positive results for outer join using non-satisfiable 
> residual filters
> --
>
> Key: HIVE-25823
> URL: https://issues.apache.org/jira/browse/HIVE-25823
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>
> similar to HIVE-25822 
> {code}
> create table t_y (id integer,s string);
> create table t_xy (id integer,s string);
> insert into t_y values(0,'a'),(1,'y'),(1,'x');
> insert into t_xy values(1,'x'),(1,'y');
> select * from t_xy l full outer join t_y r on (l.id=r.id and l.s='y' and 
> l.id+2*r.id=1);
> {code}
> the rows with full of NULLs are incorrect
> {code}
> +---+---+---+---+
> | l.id  |  l.s  | r.id  |  r.s  |
> +---+---+---+---+
> | NULL  | NULL  | 0 | a |
> | NULL  | NULL  | NULL  | NULL  |
> | 1 | y | NULL  | NULL  |
> | NULL  | NULL  | NULL  | NULL  |
> | NULL  | NULL  | 1 | y |
> | NULL  | NULL  | 1 | x |
> +---+---+---+---+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


  1   2   3   4   5   6   7   8   9   10   >