[jira] [Commented] (HIVE-22417) Remove stringifyException from MetaStore

2022-07-26 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571707#comment-17571707
 ] 

David Mollitor commented on HIVE-22417:
---

[~zabetak] Can you please start with a review of this one?  I just put a PR up 
on GitHub.

> Remove stringifyException from MetaStore
> 
>
> Key: HIVE-22417
> URL: https://issues.apache.org/jira/browse/HIVE-22417
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22417.1.patch, HIVE-22417.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-22417) Remove stringifyException from MetaStore

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-22417:
--
Labels: pull-request-available  (was: )

> Remove stringifyException from MetaStore
> 
>
> Key: HIVE-22417
> URL: https://issues.apache.org/jira/browse/HIVE-22417
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-22417.1.patch, HIVE-22417.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-22417) Remove stringifyException from MetaStore

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22417?focusedWorklogId=795506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795506
 ]

ASF GitHub Bot logged work on HIVE-22417:
-

Author: ASF GitHub Bot
Created on: 27/Jul/22 03:05
Start Date: 27/Jul/22 03:05
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request, #3478:
URL: https://github.com/apache/hive/pull/3478

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 795506)
Remaining Estimate: 0h
Time Spent: 10m

> Remove stringifyException from MetaStore
> 
>
> Key: HIVE-22417
> URL: https://issues.apache.org/jira/browse/HIVE-22417
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore, Standalone Metastore
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-22417.1.patch, HIVE-22417.2.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26400) Provide a self-contained docker

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=795489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795489
 ]

ASF GitHub Bot logged work on HIVE-26400:
-

Author: ASF GitHub Bot
Created on: 27/Jul/22 01:54
Start Date: 27/Jul/22 01:54
Worklog Time Spent: 10m 
  Work Description: achennagiri commented on code in PR #3448:
URL: https://github.com/apache/hive/pull/3448#discussion_r930548578


##
dev-support/docker/README.md:
##
@@ -0,0 +1,125 @@
+### Introduction
+
+---
+Run Apache Hive inside docker container in pseudo-distributed mode, with MySQL 
as its back database.
+Provide the following
+- Quick-start/Debugging/Prepare a test env for Hive
+- Images can be used as the basis for the Kubernetes operator
+
+### Overview
+
+---
+ Files
+- docker-compose.yml: Docker compose file
+- Dockerfile-*, scripts/docker-entrypoint.sh: Instructions to build images.
+- conf/hiveserver2-site.xml: Configuration for HiveServer2
+- conf/metastore-site.xml: Configuration for Hive Metastore
+- deploy.sh Entrance to build images and run them.
+
+### Quickstart
+
+---
+ Build images
+Hive relies on Hadoop, Tez and MySQL to work correctly. Up to now, there are 
so many versions that these dependents have been released, including Hive 
itself. 
+Providing a way to build Hive against a specified version of the dependent 
sounds reasonable. There are some build args for this purpose, as listed below:
+```shell
+--hadoop 
+--tez 
+--hive  
+```
+If the versions are not given during build, then it will read the version info 
from project top `pom.xml`: project.version, hadoop.version, tez.version, 
+these are assigned to hive version, hadoop version, tez version accordingly. 
There are two different ways to build the image, the key difference is whether 
we
+have specified the Hive version or not.
+- Build remotely
+
+The Hive version is picked up by `--hive `, for example:
+```shell
+sh deploy.sh --hive 3.1.3 
+```
+This command will pull the Hive tar ball from Apache to local, together with 
Hadoop and Tez, while those two versions are defined in top `pom.xml`
+to build the image.
+
+- Build locally
+
+If the Hive version is not specified, then it will search the file: 
`packaging/target/apache-hive-${project.version}-bin.tar.gz` to make sure it 
exists, otherwise it will
+stop building.
+```shell
+sh deploy.sh --hadoop 3.1.0 --tez 0.10.1
+```
+The above example will use the local 
`apache-hive-${project.version}-bin.tar.gz`, Hadoop 3.1.0 and Tez 0.10.1 to 
build the target image.
+
+ Run services
+
+- Launch a single standalone Metastore
+
+If you want to just test Metastore or play around with it, execute the 
following:
+```shell
+sh deploy.sh --metastore
+```
+or run with docker if Metastore image is already here. 
+```shell
+docker run --name metastore-standalone hive:metastore-$HIVE_VERSION 
+```
+
+- Launch a single standalone HiveServer2 for a quick start 
+
+The HiveServer2 will be started with an embedded Metastore. To launch it, 
execute the following:
+```shell
+sh deploy.sh --hiveserver2
+```
+Or if the image for HiveServer2 has been built successfully, simply running 
with:
+```shell
+docker run --name hiveserver2-standalone hive:hiveserver2-$HIVE_VERSION 
+```
+Please pay attention to that the data of the HiveServer2 would be lost between 
container restarts.
+In order to save the data, try to bring up the Hive with the following way.
+
+- Launch a cluster with HiveServer2, Metastore and MySQL as its back database.
+
+To save data between container restarts, we use Docker's volume to persist 
data to the local disk. Just by executing: 

Review Comment:
   Thank you! This work is really cool and helpful. Thank you for working on 
this!





Issue Time Tracking
---

Worklog Id: (was: 795489)
Time Spent: 2h 40m  (was: 2.5h)

> Provide a self-contained docker
> ---
>
> Key: HIVE-26400
> URL: https://issues.apache.org/jira/browse/HIVE-26400
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26400) Provide a self-contained docker

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=795487=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795487
 ]

ASF GitHub Bot logged work on HIVE-26400:
-

Author: ASF GitHub Bot
Created on: 27/Jul/22 01:47
Start Date: 27/Jul/22 01:47
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #3448:
URL: https://github.com/apache/hive/pull/3448#discussion_r930546009


##
dev-support/docker/deploy.sh:
##
@@ -0,0 +1,156 @@
+#!/bin/bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+usage() {
+cat <&2
+Usage: $0 [--help] [--hadoop ] [--tez ] [--hive 
] [--repo ]
+  [--hiveserver2 ] [--metastore ]
+Build the Hive Docker image and Run services
+--help   Help
+--hadoop Hadoop version for building image
+--tezTez version for building image
+--hive   Hive version for building image
+--repo   Docker repository
+--hiveserver2Start HiveServer2 only, with embedded Metastore
+--metastore  Start Metastore only, with embedded derby
+EOF
+}
+
+# components for building image
+MYSQL_VERSION=8.0.27

Review Comment:
   yeah, that's right





Issue Time Tracking
---

Worklog Id: (was: 795487)
Time Spent: 2.5h  (was: 2h 20m)

> Provide a self-contained docker
> ---
>
> Key: HIVE-26400
> URL: https://issues.apache.org/jira/browse/HIVE-26400
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26400) Provide a self-contained docker

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=795486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795486
 ]

ASF GitHub Bot logged work on HIVE-26400:
-

Author: ASF GitHub Bot
Created on: 27/Jul/22 01:46
Start Date: 27/Jul/22 01:46
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #3448:
URL: https://github.com/apache/hive/pull/3448#discussion_r930545582


##
dev-support/docker/deploy.sh:
##
@@ -0,0 +1,156 @@
+#!/bin/bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+usage() {
+cat <&2
+Usage: $0 [--help] [--hadoop ] [--tez ] [--hive 
] [--repo ]
+  [--hiveserver2 ] [--metastore ]
+Build the Hive Docker image and Run services
+--help   Help
+--hadoop Hadoop version for building image
+--tezTez version for building image
+--hive   Hive version for building image
+--repo   Docker repository

Review Comment:
   It's used to tag and push the image to remote hub repository





Issue Time Tracking
---

Worklog Id: (was: 795486)
Time Spent: 2h 20m  (was: 2h 10m)

> Provide a self-contained docker
> ---
>
> Key: HIVE-26400
> URL: https://issues.apache.org/jira/browse/HIVE-26400
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26400) Provide a self-contained docker

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=795483=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795483
 ]

ASF GitHub Bot logged work on HIVE-26400:
-

Author: ASF GitHub Bot
Created on: 27/Jul/22 01:41
Start Date: 27/Jul/22 01:41
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #3448:
URL: https://github.com/apache/hive/pull/3448#discussion_r930543306


##
dev-support/docker/README.md:
##
@@ -0,0 +1,125 @@
+### Introduction
+
+---
+Run Apache Hive inside docker container in pseudo-distributed mode, with MySQL 
as its back database.
+Provide the following
+- Quick-start/Debugging/Prepare a test env for Hive
+- Images can be used as the basis for the Kubernetes operator
+
+### Overview
+
+---
+ Files
+- docker-compose.yml: Docker compose file
+- Dockerfile-*, scripts/docker-entrypoint.sh: Instructions to build images.
+- conf/hiveserver2-site.xml: Configuration for HiveServer2
+- conf/metastore-site.xml: Configuration for Hive Metastore
+- deploy.sh Entrance to build images and run them.
+
+### Quickstart
+
+---
+ Build images
+Hive relies on Hadoop, Tez and MySQL to work correctly. Up to now, there are 
so many versions that these dependents have been released, including Hive 
itself. 
+Providing a way to build Hive against a specified version of the dependent 
sounds reasonable. There are some build args for this purpose, as listed below:
+```shell
+--hadoop 
+--tez 
+--hive  
+```
+If the versions are not given during build, then it will read the version info 
from project top `pom.xml`: project.version, hadoop.version, tez.version, 
+these are assigned to hive version, hadoop version, tez version accordingly. 
There are two different ways to build the image, the key difference is whether 
we
+have specified the Hive version or not.
+- Build remotely
+
+The Hive version is picked up by `--hive `, for example:
+```shell
+sh deploy.sh --hive 3.1.3 
+```
+This command will pull the Hive tar ball from Apache to local, together with 
Hadoop and Tez, while those two versions are defined in top `pom.xml`
+to build the image.
+
+- Build locally
+
+If the Hive version is not specified, then it will search the file: 
`packaging/target/apache-hive-${project.version}-bin.tar.gz` to make sure it 
exists, otherwise it will
+stop building.
+```shell
+sh deploy.sh --hadoop 3.1.0 --tez 0.10.1
+```
+The above example will use the local 
`apache-hive-${project.version}-bin.tar.gz`, Hadoop 3.1.0 and Tez 0.10.1 to 
build the target image.
+
+ Run services
+
+- Launch a single standalone Metastore
+
+If you want to just test Metastore or play around with it, execute the 
following:
+```shell
+sh deploy.sh --metastore
+```
+or run with docker if Metastore image is already here. 
+```shell
+docker run --name metastore-standalone hive:metastore-$HIVE_VERSION 
+```
+
+- Launch a single standalone HiveServer2 for a quick start 
+
+The HiveServer2 will be started with an embedded Metastore. To launch it, 
execute the following:
+```shell
+sh deploy.sh --hiveserver2
+```
+Or if the image for HiveServer2 has been built successfully, simply running 
with:
+```shell
+docker run --name hiveserver2-standalone hive:hiveserver2-$HIVE_VERSION 
+```
+Please pay attention to that the data of the HiveServer2 would be lost between 
container restarts.
+In order to save the data, try to bring up the Hive with the following way.
+
+- Launch a cluster with HiveServer2, Metastore and MySQL as its back database.
+
+To save data between container restarts, we use Docker's volume to persist 
data to the local disk. Just by executing: 

Review Comment:
   Hi, @achennagiri, thank you for your feedback. The volume is defined here:
   
https://github.com/apache/hive/blob/9cbe6d2a5081c7a72910ac337990bf3defea3101/dev-support/docker/docker-compose.yml#L64-L66
   The docker will create it if not exits.





Issue Time Tracking
---

Worklog Id: (was: 795483)
Time Spent: 2h 10m  (was: 2h)

> Provide a self-contained docker
> ---
>
> Key: HIVE-26400
> URL: https://issues.apache.org/jira/browse/HIVE-26400
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26400) Provide a self-contained docker

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=795482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795482
 ]

ASF GitHub Bot logged work on HIVE-26400:
-

Author: ASF GitHub Bot
Created on: 27/Jul/22 01:40
Start Date: 27/Jul/22 01:40
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #3448:
URL: https://github.com/apache/hive/pull/3448#discussion_r930543306


##
dev-support/docker/README.md:
##
@@ -0,0 +1,125 @@
+### Introduction
+
+---
+Run Apache Hive inside docker container in pseudo-distributed mode, with MySQL 
as its back database.
+Provide the following
+- Quick-start/Debugging/Prepare a test env for Hive
+- Images can be used as the basis for the Kubernetes operator
+
+### Overview
+
+---
+ Files
+- docker-compose.yml: Docker compose file
+- Dockerfile-*, scripts/docker-entrypoint.sh: Instructions to build images.
+- conf/hiveserver2-site.xml: Configuration for HiveServer2
+- conf/metastore-site.xml: Configuration for Hive Metastore
+- deploy.sh Entrance to build images and run them.
+
+### Quickstart
+
+---
+ Build images
+Hive relies on Hadoop, Tez and MySQL to work correctly. Up to now, there are 
so many versions that these dependents have been released, including Hive 
itself. 
+Providing a way to build Hive against a specified version of the dependent 
sounds reasonable. There are some build args for this purpose, as listed below:
+```shell
+--hadoop 
+--tez 
+--hive  
+```
+If the versions are not given during build, then it will read the version info 
from project top `pom.xml`: project.version, hadoop.version, tez.version, 
+these are assigned to hive version, hadoop version, tez version accordingly. 
There are two different ways to build the image, the key difference is whether 
we
+have specified the Hive version or not.
+- Build remotely
+
+The Hive version is picked up by `--hive `, for example:
+```shell
+sh deploy.sh --hive 3.1.3 
+```
+This command will pull the Hive tar ball from Apache to local, together with 
Hadoop and Tez, while those two versions are defined in top `pom.xml`
+to build the image.
+
+- Build locally
+
+If the Hive version is not specified, then it will search the file: 
`packaging/target/apache-hive-${project.version}-bin.tar.gz` to make sure it 
exists, otherwise it will
+stop building.
+```shell
+sh deploy.sh --hadoop 3.1.0 --tez 0.10.1
+```
+The above example will use the local 
`apache-hive-${project.version}-bin.tar.gz`, Hadoop 3.1.0 and Tez 0.10.1 to 
build the target image.
+
+ Run services
+
+- Launch a single standalone Metastore
+
+If you want to just test Metastore or play around with it, execute the 
following:
+```shell
+sh deploy.sh --metastore
+```
+or run with docker if Metastore image is already here. 
+```shell
+docker run --name metastore-standalone hive:metastore-$HIVE_VERSION 
+```
+
+- Launch a single standalone HiveServer2 for a quick start 
+
+The HiveServer2 will be started with an embedded Metastore. To launch it, 
execute the following:
+```shell
+sh deploy.sh --hiveserver2
+```
+Or if the image for HiveServer2 has been built successfully, simply running 
with:
+```shell
+docker run --name hiveserver2-standalone hive:hiveserver2-$HIVE_VERSION 
+```
+Please pay attention to that the data of the HiveServer2 would be lost between 
container restarts.
+In order to save the data, try to bring up the Hive with the following way.
+
+- Launch a cluster with HiveServer2, Metastore and MySQL as its back database.
+
+To save data between container restarts, we use Docker's volume to persist 
data to the local disk. Just by executing: 

Review Comment:
   Hi, @achennagiri, thank you for your feedback. The volume is defined here:
   
https://github.com/apache/hive/blob/9cbe6d2a5081c7a72910ac337990bf3defea3101/dev-support/docker/docker-compose.yml#L17-L18
   The docker will create it if not exits.





Issue Time Tracking
---

Worklog Id: (was: 795482)
Time Spent: 2h  (was: 1h 50m)

> Provide a self-contained docker
> ---
>
> Key: HIVE-26400
> URL: https://issues.apache.org/jira/browse/HIVE-26400
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26400) Provide a self-contained docker

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=795475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795475
 ]

ASF GitHub Bot logged work on HIVE-26400:
-

Author: ASF GitHub Bot
Created on: 27/Jul/22 01:10
Start Date: 27/Jul/22 01:10
Worklog Time Spent: 10m 
  Work Description: achennagiri commented on code in PR #3448:
URL: https://github.com/apache/hive/pull/3448#discussion_r930532023


##
dev-support/docker/deploy.sh:
##
@@ -0,0 +1,156 @@
+#!/bin/bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+usage() {
+cat <&2
+Usage: $0 [--help] [--hadoop ] [--tez ] [--hive 
] [--repo ]
+  [--hiveserver2 ] [--metastore ]
+Build the Hive Docker image and Run services
+--help   Help
+--hadoop Hadoop version for building image
+--tezTez version for building image
+--hive   Hive version for building image
+--repo   Docker repository
+--hiveserver2Start HiveServer2 only, with embedded Metastore
+--metastore  Start Metastore only, with embedded derby
+EOF
+}
+
+# components for building image
+MYSQL_VERSION=8.0.27

Review Comment:
   As I understand, we can create hive docker images currently with derby and 
Mysql as options for backend DB right? We could probably extend this to the 
other flavors like Oracle, MSSQL, Postgres in the future right?





Issue Time Tracking
---

Worklog Id: (was: 795475)
Time Spent: 1h 50m  (was: 1h 40m)

> Provide a self-contained docker
> ---
>
> Key: HIVE-26400
> URL: https://issues.apache.org/jira/browse/HIVE-26400
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26400) Provide a self-contained docker

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=795474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795474
 ]

ASF GitHub Bot logged work on HIVE-26400:
-

Author: ASF GitHub Bot
Created on: 27/Jul/22 01:08
Start Date: 27/Jul/22 01:08
Worklog Time Spent: 10m 
  Work Description: achennagiri commented on code in PR #3448:
URL: https://github.com/apache/hive/pull/3448#discussion_r930531218


##
dev-support/docker/deploy.sh:
##
@@ -0,0 +1,156 @@
+#!/bin/bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+usage() {
+cat <&2
+Usage: $0 [--help] [--hadoop ] [--tez ] [--hive 
] [--repo ]
+  [--hiveserver2 ] [--metastore ]
+Build the Hive Docker image and Run services
+--help   Help
+--hadoop Hadoop version for building image
+--tezTez version for building image
+--hive   Hive version for building image
+--repo   Docker repository

Review Comment:
   What does repo signify here?





Issue Time Tracking
---

Worklog Id: (was: 795474)
Time Spent: 1h 40m  (was: 1.5h)

> Provide a self-contained docker
> ---
>
> Key: HIVE-26400
> URL: https://issues.apache.org/jira/browse/HIVE-26400
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-25813) CREATE TABLE x LIKE storagehandler-based-source fails

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25813?focusedWorklogId=795471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795471
 ]

ASF GitHub Bot logged work on HIVE-25813:
-

Author: ASF GitHub Bot
Created on: 27/Jul/22 00:25
Start Date: 27/Jul/22 00:25
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #3301: 
HIVE-25813: Create Table x Like commands based storage handlers fail …
URL: https://github.com/apache/hive/pull/3301




Issue Time Tracking
---

Worklog Id: (was: 795471)
Time Spent: 1h 10m  (was: 1h)

> CREATE TABLE x LIKE storagehandler-based-source fails 
> --
>
> Key: HIVE-25813
> URL: https://issues.apache.org/jira/browse/HIVE-25813
> Project: Hive
>  Issue Type: Bug
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {code:java}
> CREATE EXTERNAL TABLE default.dbs (
>   DB_IDbigint,
>   DB_LOCATION_URI  string,
>   NAME string,
>   OWNER_NAME   string,
>   OWNER_TYPE   string )
> STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> TBLPROPERTIES (
>   'hive.sql.database.type' = 'MYSQL',
>   'hive.sql.jdbc.driver'   = 'com.mysql.jdbc.Driver',
>   'hive.sql.jdbc.url'  = 'jdbc:mysql://localhost:3306/hive1',
>   'hive.sql.dbcp.username' = 'hive1',
>   'hive.sql.dbcp.password' = 'cloudera',
>   'hive.sql.query' = 'SELECT DB_ID, DB_LOCATION_URI, NAME, OWNER_NAME, 
> OWNER_TYPE FROM DBS'
> );
> CREATE TABLE default.dbscopy LIKE default.dbs;
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getFieldsFromDeserializer(HiveMetaStoreUtils.java:186)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26400) Provide a self-contained docker

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=795469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795469
 ]

ASF GitHub Bot logged work on HIVE-26400:
-

Author: ASF GitHub Bot
Created on: 27/Jul/22 00:06
Start Date: 27/Jul/22 00:06
Worklog Time Spent: 10m 
  Work Description: achennagiri commented on code in PR #3448:
URL: https://github.com/apache/hive/pull/3448#discussion_r930507758


##
dev-support/docker/README.md:
##
@@ -0,0 +1,125 @@
+### Introduction
+
+---
+Run Apache Hive inside docker container in pseudo-distributed mode, with MySQL 
as its back database.
+Provide the following
+- Quick-start/Debugging/Prepare a test env for Hive
+- Images can be used as the basis for the Kubernetes operator
+
+### Overview
+
+---
+ Files
+- docker-compose.yml: Docker compose file
+- Dockerfile-*, scripts/docker-entrypoint.sh: Instructions to build images.
+- conf/hiveserver2-site.xml: Configuration for HiveServer2
+- conf/metastore-site.xml: Configuration for Hive Metastore
+- deploy.sh Entrance to build images and run them.
+
+### Quickstart
+
+---
+ Build images
+Hive relies on Hadoop, Tez and MySQL to work correctly. Up to now, there are 
so many versions that these dependents have been released, including Hive 
itself. 
+Providing a way to build Hive against a specified version of the dependent 
sounds reasonable. There are some build args for this purpose, as listed below:
+```shell
+--hadoop 
+--tez 
+--hive  
+```
+If the versions are not given during build, then it will read the version info 
from project top `pom.xml`: project.version, hadoop.version, tez.version, 
+these are assigned to hive version, hadoop version, tez version accordingly. 
There are two different ways to build the image, the key difference is whether 
we
+have specified the Hive version or not.
+- Build remotely
+
+The Hive version is picked up by `--hive `, for example:
+```shell
+sh deploy.sh --hive 3.1.3 
+```
+This command will pull the Hive tar ball from Apache to local, together with 
Hadoop and Tez, while those two versions are defined in top `pom.xml`
+to build the image.
+
+- Build locally
+
+If the Hive version is not specified, then it will search the file: 
`packaging/target/apache-hive-${project.version}-bin.tar.gz` to make sure it 
exists, otherwise it will
+stop building.
+```shell
+sh deploy.sh --hadoop 3.1.0 --tez 0.10.1
+```
+The above example will use the local 
`apache-hive-${project.version}-bin.tar.gz`, Hadoop 3.1.0 and Tez 0.10.1 to 
build the target image.
+
+ Run services
+
+- Launch a single standalone Metastore
+
+If you want to just test Metastore or play around with it, execute the 
following:
+```shell
+sh deploy.sh --metastore
+```
+or run with docker if Metastore image is already here. 
+```shell
+docker run --name metastore-standalone hive:metastore-$HIVE_VERSION 
+```
+
+- Launch a single standalone HiveServer2 for a quick start 
+
+The HiveServer2 will be started with an embedded Metastore. To launch it, 
execute the following:
+```shell
+sh deploy.sh --hiveserver2
+```
+Or if the image for HiveServer2 has been built successfully, simply running 
with:
+```shell
+docker run --name hiveserver2-standalone hive:hiveserver2-$HIVE_VERSION 
+```
+Please pay attention to that the data of the HiveServer2 would be lost between 
container restarts.
+In order to save the data, try to bring up the Hive with the following way.
+
+- Launch a cluster with HiveServer2, Metastore and MySQL as its back database.
+
+To save data between container restarts, we use Docker's volume to persist 
data to the local disk. Just by executing: 

Review Comment:
   @dengzhhu653 I didn't get this part. Where are we creating the docker volume?





Issue Time Tracking
---

Worklog Id: (was: 795469)
Time Spent: 1.5h  (was: 1h 20m)

> Provide a self-contained docker
> ---
>
> Key: HIVE-26400
> URL: https://issues.apache.org/jira/browse/HIVE-26400
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26426) Avoid StringIndexOutOfBoundsException in canCBOHandleAst() method.

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26426?focusedWorklogId=795339=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795339
 ]

ASF GitHub Bot logged work on HIVE-26426:
-

Author: ASF GitHub Bot
Created on: 26/Jul/22 16:25
Start Date: 26/Jul/22 16:25
Worklog Time Spent: 10m 
  Work Description: jfsii commented on code in PR #3474:
URL: https://github.com/apache/hive/pull/3474#discussion_r930178880


##
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##
@@ -947,7 +947,7 @@ Pair canCBOHandleAst(ASTNode ast, QB qb, 
PreCboCtx cboCtx) {
 // Now check QB in more detail. canHandleQbForCbo returns null if query can
 // be handled.
 msg = CalcitePlanner.canHandleQbForCbo(queryProperties, conf, true, 
needToLogMessage);
-if (msg == null) {
+if (msg == null || msg.isEmpty()) {
   return Pair.of(true, msg);
 }
 msg = msg.substring(0, msg.length() - 2);

Review Comment:
   I would likely either - move the substring to canHandleQbForCbo, so that 
method always returns a valid looking message or change canHandleQbForCbo to 
construct a List of reasons and then use String.join with "; " (and 
remove the substring code).
   
   I might also even change the method not to bother with taking in verbose and 
to always create and return error messages. This is on the error path and not 
some inner loop reading data, so the performance optimization of checking log 
levels, etc is in the camp of unneeded optimization making the code ugly.
   
   Though I'd go with what @zabetak prefers since they are the primary reviewer.





Issue Time Tracking
---

Worklog Id: (was: 795339)
Time Spent: 1h 20m  (was: 1h 10m)

> Avoid StringIndexOutOfBoundsException in canCBOHandleAst() method.
> --
>
> Key: HIVE-26426
> URL: https://issues.apache.org/jira/browse/HIVE-26426
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Abhay
>Assignee: Abhay
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The call to canHandleQbForCbo() can result in an 
> StringIndexOutOfBoundsException. The assumption in the code is that the msg 
> can only be null and we handle that but the msg can also be an empty string 
> if the *verbose* is set to false. This can happen if INFO Logging is not 
> enabled. We need to handle that case.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L913]
> Here is the stack trace for reference: 
> {noformat}
> FAILED: StringIndexOutOfBoundsException String index out of range: -2 
> 15:10:24.192 [HiveServer2-Background-Pool: Thread-305] ERROR 
> org.apache.hadoop.hive.ql.Driver - FAILED: StringIndexOutOfBoundsException 
> String index out of range: -2
> java.lang.StringIndexOutOfBoundsException: String index out of range: -2 
> at java.lang.String.substring(String.java:1967)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.canCBOHandleAst(CalcitePlanner.java:996)
>  
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:572)
>  
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13063)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:472)
>  
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:314)
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223) 
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:201) 
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:650)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:596)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:590)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:336)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:358)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at 

[jira] [Work logged] (HIVE-26288) NPE in CompactionTxnHandler.markFailed()

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26288?focusedWorklogId=795303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795303
 ]

ASF GitHub Bot logged work on HIVE-26288:
-

Author: ASF GitHub Bot
Created on: 26/Jul/22 15:08
Start Date: 26/Jul/22 15:08
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3451:
URL: https://github.com/apache/hive/pull/3451#discussion_r930086597


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java:
##
@@ -600,6 +602,10 @@ public CompactorMR getMrCompactor() {
   }
 
   private void markFailed(CompactionInfo ci, String errorMessage) {
+if (ci == null) {
+  LOG.warn("CompactionInfo client was null. Could not mark failed: {}", 
ci);

Review Comment:
   why are you passing ci to the log if it's null? `Could not mark failed: {}", 
ci` - just remove this part of log





Issue Time Tracking
---

Worklog Id: (was: 795303)
Remaining Estimate: 0h
Time Spent: 10m

> NPE in CompactionTxnHandler.markFailed()
> 
>
> Key: HIVE-26288
> URL: https://issues.apache.org/jira/browse/HIVE-26288
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: László Végh
>Assignee: Zsolt Miskolczi
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Unhandled exceptions in 
> IMetaStoreClient.findNextCompact(FindNextCompactRequest) handled incorrectly 
> in worker. I these cases the CompcationInfo remains null, but the catch block 
> passes it to CompactionTxnHandler.markFailed() which causes an NPE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26288) NPE in CompactionTxnHandler.markFailed()

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26288:
--
Labels: pull-request-available  (was: )

> NPE in CompactionTxnHandler.markFailed()
> 
>
> Key: HIVE-26288
> URL: https://issues.apache.org/jira/browse/HIVE-26288
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: László Végh
>Assignee: Zsolt Miskolczi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Unhandled exceptions in 
> IMetaStoreClient.findNextCompact(FindNextCompactRequest) handled incorrectly 
> in worker. I these cases the CompcationInfo remains null, but the catch block 
> passes it to CompactionTxnHandler.markFailed() which causes an NPE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26430) hive-metastore partition_keys add switch

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26430:
--
Labels: pull-request-available  (was: )

> hive-metastore partition_keys add switch
> 
>
> Key: HIVE-26430
> URL: https://issues.apache.org/jira/browse/HIVE-26430
> Project: Hive
>  Issue Type: Improvement
>Reporter: huibo.liu
>Assignee: Weison Wei
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After multiple verifications, when the hive table deletes the table, it will 
> perform an associated query on the partitions table regardless of whether the 
> table is partitioned or not; the metadata of a large amount of data will 
> cause unnecessary pressure on the metadata database; you can pass 
> table.getPartitionKeysSize() To determine whether to perform an associated 
> query of the partitions table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26430) hive-metastore partition_keys add switch

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26430?focusedWorklogId=795286=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795286
 ]

ASF GitHub Bot logged work on HIVE-26430:
-

Author: ASF GitHub Bot
Created on: 26/Jul/22 14:10
Start Date: 26/Jul/22 14:10
Worklog Time Spent: 10m 
  Work Description: WeisonWei opened a new pull request, #3476:
URL: https://github.com/apache/hive/pull/3476

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 795286)
Remaining Estimate: 0h
Time Spent: 10m

> hive-metastore partition_keys add switch
> 
>
> Key: HIVE-26430
> URL: https://issues.apache.org/jira/browse/HIVE-26430
> Project: Hive
>  Issue Type: Improvement
>Reporter: huibo.liu
>Assignee: Weison Wei
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After multiple verifications, when the hive table deletes the table, it will 
> perform an associated query on the partitions table regardless of whether the 
> table is partitioned or not; the metadata of a large amount of data will 
> cause unnecessary pressure on the metadata database; you can pass 
> table.getPartitionKeysSize() To determine whether to perform an associated 
> query of the partitions table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26430) hive-metastore partition_keys add switch

2022-07-26 Thread Weison Wei (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weison Wei reassigned HIVE-26430:
-

Assignee: Weison Wei

> hive-metastore partition_keys add switch
> 
>
> Key: HIVE-26430
> URL: https://issues.apache.org/jira/browse/HIVE-26430
> Project: Hive
>  Issue Type: Improvement
>Reporter: huibo.liu
>Assignee: Weison Wei
>Priority: Critical
>
> After multiple verifications, when the hive table deletes the table, it will 
> perform an associated query on the partitions table regardless of whether the 
> table is partitioned or not; the metadata of a large amount of data will 
> cause unnecessary pressure on the metadata database; you can pass 
> table.getPartitionKeysSize() To determine whether to perform an associated 
> query of the partitions table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26425) Skip SSL cert verification for downloading JWKS in HS2

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26425?focusedWorklogId=795237=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795237
 ]

ASF GitHub Bot logged work on HIVE-26425:
-

Author: ASF GitHub Bot
Created on: 26/Jul/22 12:29
Start Date: 26/Jul/22 12:29
Worklog Time Spent: 10m 
  Work Description: hsnusonic commented on code in PR #3473:
URL: https://github.com/apache/hive/pull/3473#discussion_r929902912


##
service/src/java/org/apache/hive/service/auth/jwt/URLBasedJWKSProvider.java:
##
@@ -52,12 +62,42 @@ public URLBasedJWKSProvider(HiveConf conf) throws 
IOException, ParseException {
* Fetches the JWKS and stores into memory. The JWKS are expected to be in 
the standard form as defined here -
* https://datatracker.ietf.org/doc/html/rfc7517#appendix-A.
*/
-  private void loadJWKSets() throws IOException, ParseException {
+  private void loadJWKSets() throws IOException, ParseException, 
GeneralSecurityException {
 String jwksURL = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_SERVER2_AUTHENTICATION_JWT_JWKS_URL);
+if (jwksURL == null || jwksURL.isEmpty()) {
+  throw new IOException("Invalid value of property: " + 
+  HiveConf.ConfVars.HIVE_SERVER2_AUTHENTICATION_JWT_JWKS_URL.varname);
+}
 String[] jwksURLs = jwksURL.split(",");
 for (String urlString : jwksURLs) {
-  URL url = new URL(urlString);
-  jwkSets.add(JWKSet.load(url));
+  SSLContext context = null;
+  if (HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_SERVER2_AUTHENTICATION_JWT_JWKS_SKIP_SSL_CERT, false)) {
+context = SSLContext.getInstance("TLS");
+X509TrustManager trustAllManager = new X509TrustManager() {
+  @Override
+  public void checkClientTrusted(X509Certificate[] chain, String 
authType)
+  throws CertificateException {
+  }
+  @Override
+  public void checkServerTrusted(X509Certificate[] chain, String 
authType)
+  throws CertificateException {
+  }
+  @Override
+  public X509Certificate[] getAcceptedIssuers() {
+return new X509Certificate[0];
+  }
+};
+context.init(null, new X509TrustManager[]{trustAllManager}, new 
SecureRandom());
+  }
+  HttpGet get = new HttpGet(urlString);
+  try (CloseableHttpClient httpClient = (context == null) ?

Review Comment:
   Thanks for the suggestion!





Issue Time Tracking
---

Worklog Id: (was: 795237)
Time Spent: 1h 10m  (was: 1h)

> Skip SSL cert verification for downloading JWKS in HS2
> --
>
> Key: HIVE-26425
> URL: https://issues.apache.org/jira/browse/HIVE-26425
> Project: Hive
>  Issue Type: New Feature
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In a dev/test/staging environment, we would probably use letsencrypt staging 
> certificate for a token generation service. However, its certificate is not 
> accepted by JVM by default. To ease JWT testing in those kind of 
> environments, we can introduce a property to disable the certificate 
> verification just for JWKS downloads.
> Ref: https://letsencrypt.org/docs/staging-environment/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26429) Set default value of hive.txn.xlock.ctas to true and update lineage info for CTAS queries.

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26429:
--
Labels: pull-request-available  (was: )

> Set default value of hive.txn.xlock.ctas to true and update lineage info for 
> CTAS queries.
> --
>
> Key: HIVE-26429
> URL: https://issues.apache.org/jira/browse/HIVE-26429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26429) Set default value of hive.txn.xlock.ctas to true and update lineage info for CTAS queries.

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26429?focusedWorklogId=795235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795235
 ]

ASF GitHub Bot logged work on HIVE-26429:
-

Author: ASF GitHub Bot
Created on: 26/Jul/22 12:11
Start Date: 26/Jul/22 12:11
Worklog Time Spent: 10m 
  Work Description: simhadri-g opened a new pull request, #3475:
URL: https://github.com/apache/hive/pull/3475

   …te lineage info for CTAS queries.
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   




Issue Time Tracking
---

Worklog Id: (was: 795235)
Remaining Estimate: 0h
Time Spent: 10m

> Set default value of hive.txn.xlock.ctas to true and update lineage info for 
> CTAS queries.
> --
>
> Key: HIVE-26429
> URL: https://issues.apache.org/jira/browse/HIVE-26429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26429) Set default value of hive.txn.xlock.ctas to true and update lineage info for CTAS queries.

2022-07-26 Thread Simhadri G (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simhadri G reassigned HIVE-26429:
-


> Set default value of hive.txn.xlock.ctas to true and update lineage info for 
> CTAS queries.
> --
>
> Key: HIVE-26429
> URL: https://issues.apache.org/jira/browse/HIVE-26429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Simhadri G
>Assignee: Simhadri G
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26213) "hive.limit.pushdown.memory.usage" better not be equal to 1.0, otherwise it will raise an error

2022-07-26 Thread Jingxuan Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingxuan Fu updated HIVE-26213:
---
Issue Type: Improvement  (was: Bug)

> "hive.limit.pushdown.memory.usage" better not be equal to 1.0, otherwise it 
> will raise an error
> ---
>
> Key: HIVE-26213
> URL: https://issues.apache.org/jira/browse/HIVE-26213
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.1.2
> Environment: Hive 3.1.2
> os.name=Linux
> os.arch=amd64
> os.version=5.4.0-72-generic
> java.version=1.8.0_162
> java.vendor=Oracle Corporation
>Reporter: Jingxuan Fu
>Assignee: Jingxuan Fu
>Priority: Major
>
> In hive-default.xml.template
> {code:java}
> 
>   hive.limit.pushdown.memory.usage
>   0.1
>   
>     Expects value between 0.0f and 1.0f.
>     The fraction of available memory to be used for buffering rows in 
> Reducesink operator for limit pushdown optimization.
>   
> {code}
> Based on the description of hive-default.xml.template, 
> hive.limit.pushdown.memory.usage expects a value between 0.0 and 1.0, setting 
> hive.limit.pushdown.memory.usage to 1.0 means that it expects the available 
> memory of all buffered lines for the limit pushdown optimization, and 
> successfully start hiveserver2.
> Then, call the java api to write a program to establish a jdbc connection as 
> a client to access hive, using JDBCDemo as an example.
> {code:java}
> import demo.utils.JDBCUtils;
> public class JDBCDemo{
> public static void main(String[] args) throws Exception
> {   JDBCUtils.init();   JDBCUtils.createDatabase();   
> JDBCUtils.showDatabases();   JDBCUtils.createTable();   
> JDBCUtils.showTables();   JDBCUtils.descTable();   JDBCUtils.loadData();   
> JDBCUtils.selectData();   JDBCUtils.countData();   JDBCUtils.dropDatabase();  
>  JDBCUtils.dropTable();   JDBCUtils.destory(); }
> }
> {code}
> After running the client program, both the client and the hiveserver throw 
> exceptions.
> {code:java}
> 2022-05-09 19:05:36: Starting HiveServer2
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Hive Session ID = 67a6db8d-f957-4d5d-ac18-28403adab7f3
> Hive Session ID = f9f8772c-5765-4c3e-bcff-ca605c667be7
> OK
> OK
> OK
> OK
> OK
> OK
> OK
> Loading data to table default.emp
> OK
> FAILED: SemanticException Invalid memory usage value 1.0 for 
> hive.limit.pushdown.memory.usage{code}
> {code:java}
> liky@ljq1:~/hive_jdbc_test$ ./startJDBC_0.sh 
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/liky/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.17.1/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/liky/.m2/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Running: drop database if exists hive_jdbc_test
> Running: create database hive_jdbc_test
> Running: show databases
> default
> hive_jdbc_test
> Running: drop table if exists emp
> Running: create table emp(
> empno int,
> ename string,
> job string,
> mgr int,
> hiredate string,
> sal double,
> comm double,
> deptno int
> )
> row format delimited fields terminated by '\t'
> Running: show tables
> emp
> Running: desc emp
> empno   int
> ename   string
> job     string
> mgr     int
> hiredate       string
> sal     double
> comm   double
> deptno int
> Running: load data local inpath '/home/liky/hiveJDBCTestData/data.txt' 
> overwrite into table emp
> Running: select * from emp
> Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: 
> Error while compiling statement: FAILED: SemanticException Invalid memory 
> usage value 1.0 for hive.limit.pushdown.memory.usage
>       at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:380)
>       at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:366)
>       at 
> org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:354)
>       at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:293)
>       at 
> org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:509)
>       at 

[jira] [Updated] (HIVE-26213) "hive.limit.pushdown.memory.usage" better not be equal to 1.0, otherwise it will raise an error

2022-07-26 Thread Jingxuan Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingxuan Fu updated HIVE-26213:
---
Priority: Minor  (was: Major)

> "hive.limit.pushdown.memory.usage" better not be equal to 1.0, otherwise it 
> will raise an error
> ---
>
> Key: HIVE-26213
> URL: https://issues.apache.org/jira/browse/HIVE-26213
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.1.2
> Environment: Hive 3.1.2
> os.name=Linux
> os.arch=amd64
> os.version=5.4.0-72-generic
> java.version=1.8.0_162
> java.vendor=Oracle Corporation
>Reporter: Jingxuan Fu
>Assignee: Jingxuan Fu
>Priority: Minor
>
> In hive-default.xml.template
> {code:java}
> 
>   hive.limit.pushdown.memory.usage
>   0.1
>   
>     Expects value between 0.0f and 1.0f.
>     The fraction of available memory to be used for buffering rows in 
> Reducesink operator for limit pushdown optimization.
>   
> {code}
> Based on the description of hive-default.xml.template, 
> hive.limit.pushdown.memory.usage expects a value between 0.0 and 1.0, setting 
> hive.limit.pushdown.memory.usage to 1.0 means that it expects the available 
> memory of all buffered lines for the limit pushdown optimization, and 
> successfully start hiveserver2.
> Then, call the java api to write a program to establish a jdbc connection as 
> a client to access hive, using JDBCDemo as an example.
> {code:java}
> import demo.utils.JDBCUtils;
> public class JDBCDemo{
> public static void main(String[] args) throws Exception
> {   JDBCUtils.init();   JDBCUtils.createDatabase();   
> JDBCUtils.showDatabases();   JDBCUtils.createTable();   
> JDBCUtils.showTables();   JDBCUtils.descTable();   JDBCUtils.loadData();   
> JDBCUtils.selectData();   JDBCUtils.countData();   JDBCUtils.dropDatabase();  
>  JDBCUtils.dropTable();   JDBCUtils.destory(); }
> }
> {code}
> After running the client program, both the client and the hiveserver throw 
> exceptions.
> {code:java}
> 2022-05-09 19:05:36: Starting HiveServer2
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Hive Session ID = 67a6db8d-f957-4d5d-ac18-28403adab7f3
> Hive Session ID = f9f8772c-5765-4c3e-bcff-ca605c667be7
> OK
> OK
> OK
> OK
> OK
> OK
> OK
> Loading data to table default.emp
> OK
> FAILED: SemanticException Invalid memory usage value 1.0 for 
> hive.limit.pushdown.memory.usage{code}
> {code:java}
> liky@ljq1:~/hive_jdbc_test$ ./startJDBC_0.sh 
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/home/liky/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.17.1/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/home/liky/.m2/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Running: drop database if exists hive_jdbc_test
> Running: create database hive_jdbc_test
> Running: show databases
> default
> hive_jdbc_test
> Running: drop table if exists emp
> Running: create table emp(
> empno int,
> ename string,
> job string,
> mgr int,
> hiredate string,
> sal double,
> comm double,
> deptno int
> )
> row format delimited fields terminated by '\t'
> Running: show tables
> emp
> Running: desc emp
> empno   int
> ename   string
> job     string
> mgr     int
> hiredate       string
> sal     double
> comm   double
> deptno int
> Running: load data local inpath '/home/liky/hiveJDBCTestData/data.txt' 
> overwrite into table emp
> Running: select * from emp
> Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: 
> Error while compiling statement: FAILED: SemanticException Invalid memory 
> usage value 1.0 for hive.limit.pushdown.memory.usage
>       at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:380)
>       at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:366)
>       at 
> org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:354)
>       at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:293)
>       at 
> org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:509)
>       at 

[jira] [Work logged] (HIVE-26425) Skip SSL cert verification for downloading JWKS in HS2

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26425?focusedWorklogId=795168=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795168
 ]

ASF GitHub Bot logged work on HIVE-26425:
-

Author: ASF GitHub Bot
Created on: 26/Jul/22 07:58
Start Date: 26/Jul/22 07:58
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #3473:
URL: https://github.com/apache/hive/pull/3473#discussion_r929646063


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -4250,6 +4250,9 @@ public static enum ConfVars {
 
HIVE_SERVER2_AUTHENTICATION_JWT_JWKS_URL("hive.server2.authentication.jwt.jwks.url",
 "",
 "URL of the file from where URLBasedJWKSProvider will try to load JWKS 
if JWT is enabled for the\n" +
 "authentication mode."),
+
HIVE_SERVER2_AUTHENTICATION_JWT_JWKS_SKIP_SSL_CERT("hive.server2.authentication.jwt.jwks.skip.ssl.cert",
 false,

Review Comment:
   I see, thanks for the explanation





Issue Time Tracking
---

Worklog Id: (was: 795168)
Time Spent: 1h  (was: 50m)

> Skip SSL cert verification for downloading JWKS in HS2
> --
>
> Key: HIVE-26425
> URL: https://issues.apache.org/jira/browse/HIVE-26425
> Project: Hive
>  Issue Type: New Feature
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In a dev/test/staging environment, we would probably use letsencrypt staging 
> certificate for a token generation service. However, its certificate is not 
> accepted by JVM by default. To ease JWT testing in those kind of 
> environments, we can introduce a property to disable the certificate 
> verification just for JWKS downloads.
> Ref: https://letsencrypt.org/docs/staging-environment/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-26428) Limit usage of LLAP BPWrapper to threads of IO threadpools

2022-07-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26428 started by Ádám Szita.
-
> Limit usage of LLAP BPWrapper to threads of IO threadpools
> --
>
> Key: HIVE-26428
> URL: https://issues.apache.org/jira/browse/HIVE-26428
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>
> BPWrapper is used in LRFU cache eviction policy to decrease the time spent 
> waiting for lock on the heap. This is done by adding a buffer as threadlocal 
> and accumulating CacheableBuffer instances there before trying to acquire a 
> lock. This works well when we have threads from pools such as IO-Elevator 
> threads or OrcEncode threads.
> For ephemeral threads there's no advantage of doing this as the buffers in 
> threadlocals may never reach the heap or list structures of LRFU, thereby 
> also making evictions less efficient. This can happen e.g. LLAPCacheAwareFS 
> is used with Parquet, where we're using the Tez threads for both execution 
> and IO.
> We should disable BPWrappers for such cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26428) Limit usage of LLAP BPWrapper to threads of IO threadpools

2022-07-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita reassigned HIVE-26428:
-


> Limit usage of LLAP BPWrapper to threads of IO threadpools
> --
>
> Key: HIVE-26428
> URL: https://issues.apache.org/jira/browse/HIVE-26428
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ádám Szita
>Assignee: Ádám Szita
>Priority: Major
>
> BPWrapper is used in LRFU cache eviction policy to decrease the time spent 
> waiting for lock on the heap. This is done by adding a buffer as threadlocal 
> and accumulating CacheableBuffer instances there before trying to acquire a 
> lock. This works well when we have threads from pools such as IO-Elevator 
> threads or OrcEncode threads.
> For ephemeral threads there's no advantage of doing this as the buffers in 
> threadlocals may never reach the heap or list structures of LRFU, thereby 
> also making evictions less efficient. This can happen e.g. LLAPCacheAwareFS 
> is used with Parquet, where we're using the Tez threads for both execution 
> and IO.
> We should disable BPWrappers for such cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26425) Skip SSL cert verification for downloading JWKS in HS2

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26425?focusedWorklogId=795153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795153
 ]

ASF GitHub Bot logged work on HIVE-26425:
-

Author: ASF GitHub Bot
Created on: 26/Jul/22 07:22
Start Date: 26/Jul/22 07:22
Worklog Time Spent: 10m 
  Work Description: hsnusonic commented on code in PR #3473:
URL: https://github.com/apache/hive/pull/3473#discussion_r929612792


##
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##
@@ -4250,6 +4250,9 @@ public static enum ConfVars {
 
HIVE_SERVER2_AUTHENTICATION_JWT_JWKS_URL("hive.server2.authentication.jwt.jwks.url",
 "",
 "URL of the file from where URLBasedJWKSProvider will try to load JWKS 
if JWT is enabled for the\n" +
 "authentication mode."),
+
HIVE_SERVER2_AUTHENTICATION_JWT_JWKS_SKIP_SSL_CERT("hive.server2.authentication.jwt.jwks.skip.ssl.cert",
 false,

Review Comment:
   I feel `hive.in.test` is used in unit tests only and some server behaviors 
are changed. Won't `hive.in.test` interfere other functionalities when we spin 
up a cluster?





Issue Time Tracking
---

Worklog Id: (was: 795153)
Time Spent: 50m  (was: 40m)

> Skip SSL cert verification for downloading JWKS in HS2
> --
>
> Key: HIVE-26425
> URL: https://issues.apache.org/jira/browse/HIVE-26425
> Project: Hive
>  Issue Type: New Feature
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In a dev/test/staging environment, we would probably use letsencrypt staging 
> certificate for a token generation service. However, its certificate is not 
> accepted by JVM by default. To ease JWT testing in those kind of 
> environments, we can introduce a property to disable the certificate 
> verification just for JWKS downloads.
> Ref: https://letsencrypt.org/docs/staging-environment/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26419) Use a different pool for DataNucleus' secondary connection factory

2022-07-26 Thread Yu-Wen Lai (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu-Wen Lai resolved HIVE-26419.
---
Resolution: Fixed

Thank [~jfs] and [~dkuzmenko] for reviewing the patch.

> Use a different pool for DataNucleus' secondary connection factory
> --
>
> Key: HIVE-26419
> URL: https://issues.apache.org/jira/browse/HIVE-26419
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Quote from DataNucleus documentation:
> {quote}The secondary connection factory is used for schema generation, and 
> for value generation operations (unless specified to use primary).
> {quote}
> We should not use same connection pool for DataNucleus' primary and secondary 
> connection factory. An awful situation is that each thread holds one 
> connection and request for another connection for value generation, but no 
> connection is available in the pool. It will keep retrying and fail at the 
> end.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26384) Compactor worker should not stop heartbeat for TXN 0

2022-07-26 Thread Yu-Wen Lai (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu-Wen Lai resolved HIVE-26384.
---
Resolution: Fixed

Thank [~dkuzmenko] for reviewing the patch.

> Compactor worker should not stop heartbeat for TXN 0
> 
>
> Key: HIVE-26384
> URL: https://issues.apache.org/jira/browse/HIVE-26384
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When there is no compaction to execute, the worker still calls stopHeartbeat 
> and it throws a IllegalStateException as below.
> {code:java}
> 2022-07-01T10:18:55,273 ERROR 
> [impala-ec2-centos74-m5-4xlarge-ondemand-09b3.vpc.cloudera.com-44_executor] 
> compactor.Worker: Caught an exception in the main loop of compactor worker 
> impala-ec2-centos74-m5-4xlarge-ondemand-09b3.vpc.cloudera.com-44, 
> java.lang.IllegalStateException: No registered heartbeat found for TXN 0
>         at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactionHeartbeatService.stopHeartbeat(CompactionHeartbeatService.java:108)
>         at 
> org.apache.hadoop.hive.ql.txn.compactor.Worker$CompactionTxn.close(Worker.java:692)
>         at 
> org.apache.hadoop.hive.ql.txn.compactor.Worker.findNextCompactionAndExecute(Worker.java:516)
>         at 
> org.apache.hadoop.hive.ql.txn.compactor.Worker.lambda$run$0(Worker.java:115)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26408) Vectorization: Fix deallocation of scratch columns, don't reuse a child ConstantVectorExpression as an output

2022-07-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-26408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571227#comment-17571227
 ] 

László Bodor commented on HIVE-26408:
-

merged to master, thanks [~ayushtkn] for the review!

> Vectorization: Fix deallocation of scratch columns, don't reuse a child 
> ConstantVectorExpression as an output
> -
>
> Key: HIVE-26408
> URL: https://issues.apache.org/jira/browse/HIVE-26408
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is similar to HIVE-15588. With a customer query, I reproduced a 
> vectorized expression tree like the below one (I'll attach a simple repro 
> query when it's possible):
> {code}
> selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col 
> 61:string)(children: StringColumnInList(col 13, values TermDeposit, 
> RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns 
> [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( 
> _col1 AS DATE)), 'MM-dd-'))(children: VectorUDFUnixTimeStampDate(col 
> 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> 
> 61:string, ConstantVectorExpression(val  ) -> 62:string) -> 63:string, 
> ConstantVectorExpression(val ) -> 61:string) -> 62:string
> {code}
> query part was:
> {code}
>   CASE WHEN DLY_BAL.PDELP_VALUE in (
> 'TermDeposit', 'RecurringDeposit',
> 'CertificateOfDeposit'
>   ) THEN NVL(
> (
>   from_unixtime(
> unix_timestamp(
>   cast(DLY_BAL.APATD_MTRTY_DATE as date)
> ),
> 'MM-dd-'
>   )
> ),
> ' '
>   ) ELSE '' END AS MAT_DTE
> {code}
> Here is the problem described:
> 1. IfExprCondExprColumn has 62:string as its 
> [outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64],
>  which is a reused scratch column (see 5) )
> 2. in evaluation time, [isRepeating is 
> reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68]
> 3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of 
> children is required, so we go to 
> [conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95]
> 4. one of the children is ConstantVectorExpression(val  ) -> 62:string, which 
> belongs to the second branch of VectorCoalesce, so to the '' empty string in 
> NVL's second argument
> 5. in 4) 62: string column is set to an isRepeating column (and it's released 
> by 
> [freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]),
>  so it's marked as a reusable scratch column
> 6. after the conditional evaluation in 3), the final output of 
> IfExprCondExprColumn set 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99],
>  but here we get an exception 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]:
> {code}
> 2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: 
> java.lang.AssertionError: Output column number expected to be 0 when 
> isRepeating
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyStringOperator.processBatch(VectorMapJoinInnerBigOnlyStringOperator.java:371)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.process(VectorMapJoinCommonOperator.java:839)
>   at 
> 

[jira] [Updated] (HIVE-26408) Vectorization: Fix deallocation of scratch columns, don't reuse a child ConstantVectorExpression as an output

2022-07-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-26408:

Fix Version/s: 4.0.0-alpha-2

> Vectorization: Fix deallocation of scratch columns, don't reuse a child 
> ConstantVectorExpression as an output
> -
>
> Key: HIVE-26408
> URL: https://issues.apache.org/jira/browse/HIVE-26408
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is similar to HIVE-15588. With a customer query, I reproduced a 
> vectorized expression tree like the below one (I'll attach a simple repro 
> query when it's possible):
> {code}
> selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col 
> 61:string)(children: StringColumnInList(col 13, values TermDeposit, 
> RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns 
> [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( 
> _col1 AS DATE)), 'MM-dd-'))(children: VectorUDFUnixTimeStampDate(col 
> 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> 
> 61:string, ConstantVectorExpression(val  ) -> 62:string) -> 63:string, 
> ConstantVectorExpression(val ) -> 61:string) -> 62:string
> {code}
> query part was:
> {code}
>   CASE WHEN DLY_BAL.PDELP_VALUE in (
> 'TermDeposit', 'RecurringDeposit',
> 'CertificateOfDeposit'
>   ) THEN NVL(
> (
>   from_unixtime(
> unix_timestamp(
>   cast(DLY_BAL.APATD_MTRTY_DATE as date)
> ),
> 'MM-dd-'
>   )
> ),
> ' '
>   ) ELSE '' END AS MAT_DTE
> {code}
> Here is the problem described:
> 1. IfExprCondExprColumn has 62:string as its 
> [outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64],
>  which is a reused scratch column (see 5) )
> 2. in evaluation time, [isRepeating is 
> reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68]
> 3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of 
> children is required, so we go to 
> [conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95]
> 4. one of the children is ConstantVectorExpression(val  ) -> 62:string, which 
> belongs to the second branch of VectorCoalesce, so to the '' empty string in 
> NVL's second argument
> 5. in 4) 62: string column is set to an isRepeating column (and it's released 
> by 
> [freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]),
>  so it's marked as a reusable scratch column
> 6. after the conditional evaluation in 3), the final output of 
> IfExprCondExprColumn set 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99],
>  but here we get an exception 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]:
> {code}
> 2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: 
> java.lang.AssertionError: Output column number expected to be 0 when 
> isRepeating
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyStringOperator.processBatch(VectorMapJoinInnerBigOnlyStringOperator.java:371)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.process(VectorMapJoinCommonOperator.java:839)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
> 

[jira] [Resolved] (HIVE-26408) Vectorization: Fix deallocation of scratch columns, don't reuse a child ConstantVectorExpression as an output

2022-07-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-26408.
-
Resolution: Fixed

> Vectorization: Fix deallocation of scratch columns, don't reuse a child 
> ConstantVectorExpression as an output
> -
>
> Key: HIVE-26408
> URL: https://issues.apache.org/jira/browse/HIVE-26408
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is similar to HIVE-15588. With a customer query, I reproduced a 
> vectorized expression tree like the below one (I'll attach a simple repro 
> query when it's possible):
> {code}
> selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col 
> 61:string)(children: StringColumnInList(col 13, values TermDeposit, 
> RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns 
> [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( 
> _col1 AS DATE)), 'MM-dd-'))(children: VectorUDFUnixTimeStampDate(col 
> 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> 
> 61:string, ConstantVectorExpression(val  ) -> 62:string) -> 63:string, 
> ConstantVectorExpression(val ) -> 61:string) -> 62:string
> {code}
> query part was:
> {code}
>   CASE WHEN DLY_BAL.PDELP_VALUE in (
> 'TermDeposit', 'RecurringDeposit',
> 'CertificateOfDeposit'
>   ) THEN NVL(
> (
>   from_unixtime(
> unix_timestamp(
>   cast(DLY_BAL.APATD_MTRTY_DATE as date)
> ),
> 'MM-dd-'
>   )
> ),
> ' '
>   ) ELSE '' END AS MAT_DTE
> {code}
> Here is the problem described:
> 1. IfExprCondExprColumn has 62:string as its 
> [outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64],
>  which is a reused scratch column (see 5) )
> 2. in evaluation time, [isRepeating is 
> reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68]
> 3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of 
> children is required, so we go to 
> [conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95]
> 4. one of the children is ConstantVectorExpression(val  ) -> 62:string, which 
> belongs to the second branch of VectorCoalesce, so to the '' empty string in 
> NVL's second argument
> 5. in 4) 62: string column is set to an isRepeating column (and it's released 
> by 
> [freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]),
>  so it's marked as a reusable scratch column
> 6. after the conditional evaluation in 3), the final output of 
> IfExprCondExprColumn set 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99],
>  but here we get an exception 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]:
> {code}
> 2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: 
> java.lang.AssertionError: Output column number expected to be 0 when 
> isRepeating
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinInnerBigOnlyStringOperator.processBatch(VectorMapJoinInnerBigOnlyStringOperator.java:371)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinCommonOperator.process(VectorMapJoinCommonOperator.java:839)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
> {code}
> this 

[jira] [Work logged] (HIVE-26408) Vectorization: Fix deallocation of scratch columns, don't reuse a child ConstantVectorExpression as an output

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26408?focusedWorklogId=795141=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795141
 ]

ASF GitHub Bot logged work on HIVE-26408:
-

Author: ASF GitHub Bot
Created on: 26/Jul/22 06:40
Start Date: 26/Jul/22 06:40
Worklog Time Spent: 10m 
  Work Description: abstractdog merged PR #3452:
URL: https://github.com/apache/hive/pull/3452




Issue Time Tracking
---

Worklog Id: (was: 795141)
Time Spent: 20m  (was: 10m)

> Vectorization: Fix deallocation of scratch columns, don't reuse a child 
> ConstantVectorExpression as an output
> -
>
> Key: HIVE-26408
> URL: https://issues.apache.org/jira/browse/HIVE-26408
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This is similar to HIVE-15588. With a customer query, I reproduced a 
> vectorized expression tree like the below one (I'll attach a simple repro 
> query when it's possible):
> {code}
> selectExpressions: IfExprCondExprColumn(col 67:boolean, col 63:string, col 
> 61:string)(children: StringColumnInList(col 13, values TermDeposit, 
> RecurringDeposit, CertificateOfDeposit) -> 67:boolean, VectorCoalesce(columns 
> [61, 62])(children: VectorUDFAdaptor(from_unixtime(to_unix_timestamp(CAST( 
> _col1 AS DATE)), 'MM-dd-'))(children: VectorUDFUnixTimeStampDate(col 
> 68)(children: CastStringToDate(col 33:string) -> 68:date) -> 69:bigint) -> 
> 61:string, ConstantVectorExpression(val  ) -> 62:string) -> 63:string, 
> ConstantVectorExpression(val ) -> 61:string) -> 62:string
> {code}
> query part was:
> {code}
>   CASE WHEN DLY_BAL.PDELP_VALUE in (
> 'TermDeposit', 'RecurringDeposit',
> 'CertificateOfDeposit'
>   ) THEN NVL(
> (
>   from_unixtime(
> unix_timestamp(
>   cast(DLY_BAL.APATD_MTRTY_DATE as date)
> ),
> 'MM-dd-'
>   )
> ),
> ' '
>   ) ELSE '' END AS MAT_DTE
> {code}
> Here is the problem described:
> 1. IfExprCondExprColumn has 62:string as its 
> [outputColumn|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L64],
>  which is a reused scratch column (see 5) )
> 2. in evaluation time, [isRepeating is 
> reset|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L68]
> 3. in order to evaluate IfExprCondExprColumn, the conditional evaluation of 
> children is required, so we go to 
> [conditionalEvaluate|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L95]
> 4. one of the children is ConstantVectorExpression(val  ) -> 62:string, which 
> belongs to the second branch of VectorCoalesce, so to the '' empty string in 
> NVL's second argument
> 5. in 4) 62: string column is set to an isRepeating column (and it's released 
> by 
> [freeNonColumns|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java#L2459]),
>  so it's marked as a reusable scratch column
> 6. after the conditional evaluation in 3), the final output of 
> IfExprCondExprColumn set 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/IfExprCondExprColumn.java#L99],
>  but here we get an exception 
> [here|https://github.com/apache/hive/blob/d3309c0ea9da907af4d27427805084b7331a6c24/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java#L484]:
> {code}
> 2022-07-01T04:26:24,567 ERROR [TezTR-745267_1_35_6_0_0] tez.MapRecordSource: 
> java.lang.AssertionError: Output column number expected to be 0 when 
> isRepeating
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setElement(BytesColumnVector.java:494)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.IfExprCondExprColumn.evaluate(IfExprCondExprColumn.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:146)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.VectorMapJoinGenerateResultOperator.forwardBigTableBatch(VectorMapJoinGenerateResultOperator.java:694)
>   at 
> 

[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=795137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795137
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 26/Jul/22 06:33
Start Date: 26/Jul/22 06:33
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r929574342


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java:
##
@@ -161,7 +161,7 @@ void replTableWriteIdState(String validWriteIdList, String 
dbName, String tableN
* @throws LockException if there is no current transaction or the
* transaction has already been committed or aborted.
*/
-  void rollbackTxn() throws LockException;
+  void rollbackTxn(Context ctx) throws LockException;

Review Comment:
   Added `void rollbackTxn(Context ctx)` and retained the old signature. Done. 
   
   However most of the places will use `void rollbackTxn(Context ctx)` from now 
on.





Issue Time Tracking
---

Worklog Id: (was: 795137)
Time Spent: 5.5h  (was: 5h 20m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)