[jira] [Work logged] (HIVE-25986) Statement id is incorrect in case of load in path to MM table

2022-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25986?focusedWorklogId=733482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733482
 ]

ASF GitHub Bot logged work on HIVE-25986:
-

Author: ASF GitHub Bot
Created on: 26/Feb/22 07:04
Start Date: 26/Feb/22 07:04
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3055:
URL: https://github.com/apache/hive/pull/3055#discussion_r815279139



##
File path: ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java
##
@@ -226,7 +226,7 @@ public Integer getStatementIdForAcidWriteType(long writeId, 
String moveTaskId, A
 if (result != null) {
   return result.getStatementId();
 } else {
-  return -1;
+  return 0;

Review comment:
   Please add some comments here 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 733482)
Time Spent: 0.5h  (was: 20m)

> Statement id is incorrect in case of load in path to MM table
> -
>
> Key: HIVE-25986
> URL: https://issues.apache.org/jira/browse/HIVE-25986
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: ACID, pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25986) Statement id is incorrect in case of load in path to MM table

2022-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25986?focusedWorklogId=733480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733480
 ]

ASF GitHub Bot logged work on HIVE-25986:
-

Author: ASF GitHub Bot
Created on: 26/Feb/22 07:03
Start Date: 26/Feb/22 07:03
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #3055:
URL: https://github.com/apache/hive/pull/3055#issuecomment-1051721851


   Please add a test case which checks the dir name 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 733480)
Time Spent: 20m  (was: 10m)

> Statement id is incorrect in case of load in path to MM table
> -
>
> Key: HIVE-25986
> URL: https://issues.apache.org/jira/browse/HIVE-25986
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: ACID, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25750) Beeline: Creating a standalone tarball by isolating dependencies

2022-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25750?focusedWorklogId=733427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733427
 ]

ASF GitHub Bot logged work on HIVE-25750:
-

Author: ASF GitHub Bot
Created on: 26/Feb/22 00:23
Start Date: 26/Feb/22 00:23
Worklog Time Spent: 10m 
  Work Description: achennagiri commented on pull request #3043:
URL: https://github.com/apache/hive/pull/3043#issuecomment-1051392895


   retest


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 733427)
Time Spent: 3h 20m  (was: 3h 10m)

> Beeline: Creating a standalone tarball by isolating dependencies
> 
>
> Key: HIVE-25750
> URL: https://issues.apache.org/jira/browse/HIVE-25750
> Project: Hive
>  Issue Type: Bug
>Reporter: Abhay
>Assignee: Abhay
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The code to create a standalone beeline tarball was created as part of this 
> ticket https://issues.apache.org/jira/browse/HIVE-24348. However, a bug was 
> reported in the case when the beeline is tried to install without the hadoop 
> installed. 
> The beeline script complains of missing dependencies when it is run.
> The ask as part of this ticket is to fix that bug. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25750) Beeline: Creating a standalone tarball by isolating dependencies

2022-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25750?focusedWorklogId=733360=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733360
 ]

ASF GitHub Bot logged work on HIVE-25750:
-

Author: ASF GitHub Bot
Created on: 25/Feb/22 21:36
Start Date: 25/Feb/22 21:36
Worklog Time Spent: 10m 
  Work Description: achennagiri commented on pull request #3043:
URL: https://github.com/apache/hive/pull/3043#issuecomment-1051288787


   retest


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 733360)
Time Spent: 3h 10m  (was: 3h)

> Beeline: Creating a standalone tarball by isolating dependencies
> 
>
> Key: HIVE-25750
> URL: https://issues.apache.org/jira/browse/HIVE-25750
> Project: Hive
>  Issue Type: Bug
>Reporter: Abhay
>Assignee: Abhay
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The code to create a standalone beeline tarball was created as part of this 
> ticket https://issues.apache.org/jira/browse/HIVE-24348. However, a bug was 
> reported in the case when the beeline is tried to install without the hadoop 
> installed. 
> The beeline script complains of missing dependencies when it is run.
> The ask as part of this ticket is to fix that bug. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25750) Beeline: Creating a standalone tarball by isolating dependencies

2022-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25750?focusedWorklogId=733354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733354
 ]

ASF GitHub Bot logged work on HIVE-25750:
-

Author: ASF GitHub Bot
Created on: 25/Feb/22 21:28
Start Date: 25/Feb/22 21:28
Worklog Time Spent: 10m 
  Work Description: achennagiri opened a new pull request #3043:
URL: https://github.com/apache/hive/pull/3043


   The code to create a standalone beeline tarball was created as part of this 
ticket https://issues.apache.org/jira/browse/HIVE-24348. However, a bug was 
reported in the case when the beeline is tried to install without the hadoop 
installed. 
   The beeline script complains of missing dependencies when it is run. 
   
   Update:
   Was running in to the below error with the file mode on in Beeline
   
   ```
   Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.JobConf
   at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
   at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
   ```
   Added a fix to resolve this.
   
   ### What changes were proposed in this pull request?
   The beeline script can be run with/without hadoop installed. All the 
required dependencies are bundled into a single downloadable tar file. 
   `mvn clean package install -Pdist -Pitests -DskipTests -Denforcer.skip=true` 
generates something along the lines of 
   **apache-hive-beeline-4.0.0-SNAPSHOT.tar.gz** in the **packaging/target** 
folder.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   
   Created a docker container using the command
   `sudo docker run  --rm -it -v /Users/achennagiri/Downloads:/container --user 
root docker-private.infra.cloudera.com/cloudera_base/ubi8/python-38:1-68 
/bin/bash`
   
   Need to install `yum install -y java-11-openjdk` java in the container. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 733354)
Time Spent: 3h  (was: 2h 50m)

> Beeline: Creating a standalone tarball by isolating dependencies
> 
>
> Key: HIVE-25750
> URL: https://issues.apache.org/jira/browse/HIVE-25750
> Project: Hive
>  Issue Type: Bug
>Reporter: Abhay
>Assignee: Abhay
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The code to create a standalone beeline tarball was created as part of this 
> ticket https://issues.apache.org/jira/browse/HIVE-24348. However, a bug was 
> reported in the case when the beeline is tried to install without the hadoop 
> installed. 
> The beeline script complains of missing dependencies when it is run.
> The ask as part of this ticket is to fix that bug. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25988) CreateTableEvent should have database object as one of the hive privilege object.

2022-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25988:
--
Labels: pull-request-available  (was: )

> CreateTableEvent should have database object as one of the hive privilege 
> object.
> -
>
> Key: HIVE-25988
> URL: https://issues.apache.org/jira/browse/HIVE-25988
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The CreateTableEvent in HMS should have a database object as one of the 
> HivePrivilege Objects so that it is consistent with HS2's CreateTable Event.
> Also, we need to move the DFS_URI object into the InputList so that this is 
> also consistent with HS2's behavior.
> Having database objects in the create table events hive privilege objects 
> helps to determine if a user has the right permissions to create a table in a 
> particular database via ranger/sentry.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25988) CreateTableEvent should have database object as one of the hive privilege object.

2022-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25988?focusedWorklogId=733250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733250
 ]

ASF GitHub Bot logged work on HIVE-25988:
-

Author: ASF GitHub Bot
Created on: 25/Feb/22 18:47
Start Date: 25/Feb/22 18:47
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera opened a new pull request #3057:
URL: https://github.com/apache/hive/pull/3057


   …e hive privilege object
   
   
   
   ### What changes were proposed in this pull request?
   Included Database object in the HivePrivilegeObjects for the CreateTableEvent
   
   
   
   ### Why are the changes needed?
   Ranger/sentry can use this information to evaluate if the user has the right 
permissions to create a table or not.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   Local machine, Remote cluster
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 733250)
Remaining Estimate: 0h
Time Spent: 10m

> CreateTableEvent should have database object as one of the hive privilege 
> object.
> -
>
> Key: HIVE-25988
> URL: https://issues.apache.org/jira/browse/HIVE-25988
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The CreateTableEvent in HMS should have a database object as one of the 
> HivePrivilege Objects so that it is consistent with HS2's CreateTable Event.
> Also, we need to move the DFS_URI object into the InputList so that this is 
> also consistent with HS2's behavior.
> Having database objects in the create table events hive privilege objects 
> helps to determine if a user has the right permissions to create a table in a 
> particular database via ranger/sentry.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25988) CreateTableEvent should have database object as one of the hive privilege object.

2022-02-25 Thread Sai Hemanth Gantasala (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sai Hemanth Gantasala reassigned HIVE-25988:



> CreateTableEvent should have database object as one of the hive privilege 
> object.
> -
>
> Key: HIVE-25988
> URL: https://issues.apache.org/jira/browse/HIVE-25988
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>
> The CreateTableEvent in HMS should have a database object as one of the 
> HivePrivilege Objects so that it is consistent with HS2's CreateTable Event.
> Also, we need to move the DFS_URI object into the InputList so that this is 
> also consistent with HS2's behavior.
> Having database objects in the create table events hive privilege objects 
> helps to determine if a user has the right permissions to create a table in a 
> particular database via ranger/sentry.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25970) Missing messages in HS2 operation logs

2022-02-25 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-25970.

Fix Version/s: 4.0.0
   Resolution: Fixed

Fixed in 
https://github.com/apache/hive/commit/d3cd596aa15ebedd58f99628d43a03eb2f5f3909. 
Thanks for the review [~kgyrtkirk]!

> Missing messages in HS2 operation logs
> --
>
> Key: HIVE-25970
> URL: https://issues.apache.org/jira/browse/HIVE-25970
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> After HIVE-22753 & HIVE-24590, with some unlucky timing of events, operation 
> log messages can get lost and never appear in the appropriate files.
> The changes in HIVE-22753 will prevent a {{HushableRandomAccessFileAppender}} 
> from being created if the latter refers to a file that has been closed in the 
> last second. Preventing the creation of the appender also means that the 
> message which triggered the creation will be lost forever. In fact any 
> message (for the same query) that comes in the interval of 1 second will be 
> lost forever.
> Before HIVE-24590 the appender/file was closed only once (explicitly by HS2) 
> and thus the problem may be very hard to notice in practice. However, with 
> the arrival of HIVE-24590 appenders may close much more frequently (and not 
> via HS2) making the issue reproducible rather easily. It suffices to set 
> _hive.server2.operation.log.purgePolicy.timeToLive_ property very low and 
> check the operation logs.
> The problem was discovered by investigating some intermittent failures in 
> operation logging tests (e.g.,  TestOperationLoggingAPIWithTez).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25970) Missing messages in HS2 operation logs

2022-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25970?focusedWorklogId=733189=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733189
 ]

ASF GitHub Bot logged work on HIVE-25970:
-

Author: ASF GitHub Bot
Created on: 25/Feb/22 17:30
Start Date: 25/Feb/22 17:30
Worklog Time Spent: 10m 
  Work Description: zabetak closed pull request #3048:
URL: https://github.com/apache/hive/pull/3048


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 733189)
Time Spent: 40m  (was: 0.5h)

> Missing messages in HS2 operation logs
> --
>
> Key: HIVE-25970
> URL: https://issues.apache.org/jira/browse/HIVE-25970
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> After HIVE-22753 & HIVE-24590, with some unlucky timing of events, operation 
> log messages can get lost and never appear in the appropriate files.
> The changes in HIVE-22753 will prevent a {{HushableRandomAccessFileAppender}} 
> from being created if the latter refers to a file that has been closed in the 
> last second. Preventing the creation of the appender also means that the 
> message which triggered the creation will be lost forever. In fact any 
> message (for the same query) that comes in the interval of 1 second will be 
> lost forever.
> Before HIVE-24590 the appender/file was closed only once (explicitly by HS2) 
> and thus the problem may be very hard to notice in practice. However, with 
> the arrival of HIVE-24590 appenders may close much more frequently (and not 
> via HS2) making the issue reproducible rather easily. It suffices to set 
> _hive.server2.operation.log.purgePolicy.timeToLive_ property very low and 
> check the operation logs.
> The problem was discovered by investigating some intermittent failures in 
> operation logging tests (e.g.,  TestOperationLoggingAPIWithTez).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25896) Remove getThreadId from IHMSHandler

2022-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25896?focusedWorklogId=733114=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733114
 ]

ASF GitHub Bot logged work on HIVE-25896:
-

Author: ASF GitHub Bot
Created on: 25/Feb/22 15:34
Start Date: 25/Feb/22 15:34
Worklog Time Spent: 10m 
  Work Description: klcopp commented on pull request #3017:
URL: https://github.com/apache/hive/pull/3017#issuecomment-1050957427


   LGTM. Compaction does log/store the thread ids, but it gets those directly 
from Thread#getId.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 733114)
Time Spent: 1h 20m  (was: 1h 10m)

> Remove getThreadId from IHMSHandler
> ---
>
> Key: HIVE-25896
> URL: https://issues.apache.org/jira/browse/HIVE-25896
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In IHMSHandler which is annotated as 'InterfaceAudience.Private', we use 
> getThreadId to log the thread information now,  the threadId can be logged 
> automatically if we configure the logger properly, the method can be removed 
> for better maintenance of IMSHandler.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25986) Statement id is incorrect in case of load in path to MM table

2022-02-25 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-25986:
---
Status: Patch Available  (was: Open)

> Statement id is incorrect in case of load in path to MM table
> -
>
> Key: HIVE-25986
> URL: https://issues.apache.org/jira/browse/HIVE-25986
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: ACID, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25986) Statement id is incorrect in case of load in path to MM table

2022-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25986:
--
Labels: ACID pull-request-available  (was: ACID)

> Statement id is incorrect in case of load in path to MM table
> -
>
> Key: HIVE-25986
> URL: https://issues.apache.org/jira/browse/HIVE-25986
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: ACID, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25986) Statement id is incorrect in case of load in path to MM table

2022-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25986?focusedWorklogId=733107=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733107
 ]

ASF GitHub Bot logged work on HIVE-25986:
-

Author: ASF GitHub Bot
Created on: 25/Feb/22 15:20
Start Date: 25/Feb/22 15:20
Worklog Time Spent: 10m 
  Work Description: asinkovits opened a new pull request #3055:
URL: https://github.com/apache/hive/pull/3055


   
   
   ### What changes were proposed in this pull request?
   
   Statement id is incorrect is incorrect if the table is insert only acid 
table and the load in path is used to load the data.
   
   ### Why are the changes needed?
   
   the format of the delta directory is incorrect because of the wrong 
statement id
   
   ### Does this PR introduce _any_ user-facing change?
   
   no
   
   
   ### How was this patch tested?
   
   manual testing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 733107)
Remaining Estimate: 0h
Time Spent: 10m

> Statement id is incorrect in case of load in path to MM table
> -
>
> Key: HIVE-25986
> URL: https://issues.apache.org/jira/browse/HIVE-25986
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: ACID
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25986) Statement id is incorrect in case of load in path to MM table

2022-02-25 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-25986:
---
Summary: Statement id is incorrect in case of load in path to MM table  
(was: statement id is incorrect in case of load in path to MM table)

> Statement id is incorrect in case of load in path to MM table
> -
>
> Key: HIVE-25986
> URL: https://issues.apache.org/jira/browse/HIVE-25986
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: ACID
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25986) statement id is incorrect in case of load in path to MM table

2022-02-25 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-25986:
---
Summary: statement id is incorrect in case of load in path to MM table  
(was: statement id in incorrect in case of load in path to MM table)

> statement id is incorrect in case of load in path to MM table
> -
>
> Key: HIVE-25986
> URL: https://issues.apache.org/jira/browse/HIVE-25986
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: ACID
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25986) statement id in incorrect in case of load in path to MM table

2022-02-25 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-25986:
---
Labels: ACID  (was: )

> statement id in incorrect in case of load in path to MM table
> -
>
> Key: HIVE-25986
> URL: https://issues.apache.org/jira/browse/HIVE-25986
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: ACID
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25986) statement id in incorrect in case of load in path to MM table

2022-02-25 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits updated HIVE-25986:
---
Affects Version/s: 4.0.0

> statement id in incorrect in case of load in path to MM table
> -
>
> Key: HIVE-25986
> URL: https://issues.apache.org/jira/browse/HIVE-25986
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25986) statement id in incorrect in case of load in path to MM table

2022-02-25 Thread Antal Sinkovits (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Sinkovits reassigned HIVE-25986:
--


> statement id in incorrect in case of load in path to MM table
> -
>
> Key: HIVE-25986
> URL: https://issues.apache.org/jira/browse/HIVE-25986
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25985) Estimate stats gives out incorrect number of columns during query planning when using predicates like c=22

2022-02-25 Thread Sindhu Subhas (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sindhu Subhas updated HIVE-25985:
-
Summary: Estimate stats gives out incorrect number of columns during query 
planning when using predicates like c=22  (was: Estimate stats gives out 
incorrect number of columns when using predicates like c=22)

> Estimate stats gives out incorrect number of columns during query planning 
> when using predicates like c=22
> --
>
> Key: HIVE-25985
> URL: https://issues.apache.org/jira/browse/HIVE-25985
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
> Environment: Hive 3
>Reporter: Sindhu Subhas
>Priority: Major
>
> Table type: External 
> Stats: No stats collected.
> When filter operator appeared in the plan and the row estimates went bad. 
> Changed the original query on table and modifying the filter predicate form.
>  
> |*predicate form*|*optimised as* |*filter Op rows out*|*estimate quality*|
> |prd_i_tmp.type = '22'|predicate:(type = '22')|Filter Operator [FIL_12] 
> (rows=5 width=3707) \||bad|
> |prd_i_tmp.type in ('22')|predicate:(type = '22')|Filter Operator [FIL_12] 
> (rows=5 width=3707)|bad|
> |prd_i_tmp.type < '23' and prd_i_tmp.type > '21'|predicate:((type < '23') and 
> (type > '21'))|Filter Operator [FIL_12] (rows=8706269 width=3707) |good|
> |prd_i_tmp.type like '22'|predicate:(type like '22')|Filter Operator [FIL_12] 
> (rows=39178213 width=3707)|best|
> |prd_i_tmp.type in ('22','AA','BB')|predicate:(type) IN ('22', 'AA', 
> 'BB')|Filter Operator [FIL_12] (rows=15 width=3707)|bad|
> |prd_i_tmp.type rlike '22'|predicate:type regexp '22'| Filter Operator 
> [FIL_12] (rows=39178213 width=3707)|good|



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-25979) Order of Lineage is flaky in qtest output

2022-02-25 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-25979.
---
Resolution: Fixed

Pushed to master. Thanks [~kgyrtkirk] for review.

[~ayushtkn] this patch should fix {{stats_part_multi_insert_acid}} flakyness.

> Order of Lineage is flaky in qtest output
> -
>
> Key: HIVE-25979
> URL: https://issues.apache.org/jira/browse/HIVE-25979
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When running
> {code:java}
> mvn test -Dtest=TestMiniLlapLocalCliDriver 
> -Dqfile=stats_part_multi_insert_acid.q -pl itests/qtest -Pitests
> {code}
> The lineage output of statement:
> {code:java}
> from source
> insert into stats_part select key, value, p
> insert into stats_part select key, value, p
> {code}
> is expected to be
> {code:java}
> POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE 
> [(source)source.FieldSchema(name:key, type:int, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE 
> [(source)source.FieldSchema(name:key, type:int, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE 
> [(source)source.FieldSchema(name:value, type:string, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE 
> [(source)source.FieldSchema(name:value, type:string, comment:null), ]
> {code}
> but sometimes it is
> {code:java}
> POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE 
> [(source)source.FieldSchema(name:key, type:int, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE 
> [(source)source.FieldSchema(name:value, type:string, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE 
> [(source)source.FieldSchema(name:key, type:int, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE 
> [(source)source.FieldSchema(name:value, type:string, comment:null), ]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25979) Order of Lineage is flaky in qtest output

2022-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25979?focusedWorklogId=733039=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733039
 ]

ASF GitHub Bot logged work on HIVE-25979:
-

Author: ASF GitHub Bot
Created on: 25/Feb/22 13:10
Start Date: 25/Feb/22 13:10
Worklog Time Spent: 10m 
  Work Description: kasakrisz merged pull request #3050:
URL: https://github.com/apache/hive/pull/3050


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 733039)
Time Spent: 20m  (was: 10m)

> Order of Lineage is flaky in qtest output
> -
>
> Key: HIVE-25979
> URL: https://issues.apache.org/jira/browse/HIVE-25979
> Project: Hive
>  Issue Type: Bug
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When running
> {code:java}
> mvn test -Dtest=TestMiniLlapLocalCliDriver 
> -Dqfile=stats_part_multi_insert_acid.q -pl itests/qtest -Pitests
> {code}
> The lineage output of statement:
> {code:java}
> from source
> insert into stats_part select key, value, p
> insert into stats_part select key, value, p
> {code}
> is expected to be
> {code:java}
> POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE 
> [(source)source.FieldSchema(name:key, type:int, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE 
> [(source)source.FieldSchema(name:key, type:int, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE 
> [(source)source.FieldSchema(name:value, type:string, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE 
> [(source)source.FieldSchema(name:value, type:string, comment:null), ]
> {code}
> but sometimes it is
> {code:java}
> POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE 
> [(source)source.FieldSchema(name:key, type:int, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE 
> [(source)source.FieldSchema(name:value, type:string, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE 
> [(source)source.FieldSchema(name:key, type:int, comment:null), ]
> POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE 
> [(source)source.FieldSchema(name:value, type:string, comment:null), ]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25984) TTTT

2022-02-25 Thread lkl (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lkl updated HIVE-25984:
---
Summary:   (was: when set hive.auto.convert.join=true; and set 
hive.exec.parallel=true; in the case cause error)

> 
> 
>
> Key: HIVE-25984
> URL: https://issues.apache.org/jira/browse/HIVE-25984
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.0.0, 3.1.1, 3.1.2
>Reporter: lkl
>Assignee: lkl
>Priority: Major
> Fix For: All Versions
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25984) TTTT

2022-02-25 Thread lkl (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lkl updated HIVE-25984:
---
  Component/s: (was: Hive)
Fix Version/s: (was: All Versions)
Affects Version/s: (was: 3.0.0)
   (was: 3.1.1)
   (was: 3.1.2)
   Issue Type: Test  (was: Improvement)
 Priority: Trivial  (was: Major)

> 
> 
>
> Key: HIVE-25984
> URL: https://issues.apache.org/jira/browse/HIVE-25984
> Project: Hive
>  Issue Type: Test
>Reporter: lkl
>Assignee: lkl
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] (HIVE-25984) TTTT

2022-02-25 Thread lkl (Jira)


[ https://issues.apache.org/jira/browse/HIVE-25984 ]


lkl deleted comment on HIVE-25984:


was (Author: JIRAUSER284773):
set hive.auto.convert.join=false;

set hive.exec.parallel=true;

 

change param value can run success.

> 
> 
>
> Key: HIVE-25984
> URL: https://issues.apache.org/jira/browse/HIVE-25984
> Project: Hive
>  Issue Type: Test
>Reporter: lkl
>Assignee: lkl
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25984) when set hive.auto.convert.join=true; and set hive.exec.parallel=true; in the case cause error

2022-02-25 Thread lkl (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lkl updated HIVE-25984:
---
Fix Version/s: All Versions
  Description: (was: {code:java}
    > set hive.exec.parallel=true;
hive> set hive.exec.parallel.thread.number=16;
Query ID = hadoop_20220225202936_1afb51d0-ce67-4bc2-9794-8c82b32efe99
Total jobs = 11
Launching Job 1 out of 11
Launching Job 2 out of 11
Launching Job 3 out of 11
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
Number of reduce tasks not specified. Estimated from input data size: 1
  set hive.exec.reducers.max=
In order to change the average load for a reducer (in bytes):
In order to set a constant number of reducers:
  set hive.exec.reducers.bytes.per.reducer=
  set mapreduce.job.reduces=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Launching Job 4 out of 11
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1645755235953_36462, Tracking URL = 
http://172.21.126.228:5004/proxy/application_1645755235953_36462/
Kill Command = /usr/local/service/hadoop/bin/mapred job  -kill 
job_1645755235953_36462
Starting Job = job_1645755235953_36460, Tracking URL = 
http://172.21.126.228:5004/proxy/application_1645755235953_36460/
Starting Job = job_1645755235953_36463, Tracking URL = 
http://172.21.126.228:5004/proxy/application_1645755235953_36463/
Kill Command = /usr/local/service/hadoop/bin/mapred job  -kill 
job_1645755235953_36460
Kill Command = /usr/local/service/hadoop/bin/mapred job  -kill 
job_1645755235953_36463
Starting Job = job_1645755235953_36461, Tracking URL = 
http://172.21.126.228:5004/proxy/application_1645755235953_36461/
Kill Command = /usr/local/service/hadoop/bin/mapred job  -kill 
job_1645755235953_36461
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1
2022-02-25 20:29:43,598 Stage-3 map = 0%,  reduce = 0%
Hadoop job information for Stage-9: number of mappers: 1; number of reducers: 1
2022-02-25 20:29:43,634 Stage-9 map = 0%,  reduce = 0%
Hadoop job information for Stage-7: number of mappers: 1; number of reducers: 1
2022-02-25 20:29:43,658 Stage-7 map = 0%,  reduce = 0%
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2022-02-25 20:29:44,646 Stage-1 map = 0%,  reduce = 0%
2022-02-25 20:29:51,767 Stage-9 map = 100%,  reduce = 0%, Cumulative CPU 5.29 
sec
2022-02-25 20:29:51,782 Stage-7 map = 100%,  reduce = 0%, Cumulative CPU 5.45 
sec
2022-02-25 20:29:52,750 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 6.06 
sec
2022-02-25 20:29:54,835 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.76 
sec
2022-02-25 20:29:58,872 Stage-9 map = 100%,  reduce = 100%, Cumulative CPU 7.49 
sec
2022-02-25 20:29:58,883 Stage-7 map = 100%,  reduce = 100%, Cumulative CPU 8.86 
sec
2022-02-25 20:29:59,868 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 9.96 
sec
MapReduce Total cumulative CPU time: 7 seconds 490 msec
Ended Job = job_1645755235953_36463
MapReduce Total cumulative CPU time: 8 seconds 860 msec
Ended Job = job_1645755235953_36461
Stage-15 is selected by condition resolver.
Stage-8 is filtered out by condition resolver.
MapReduce Total cumulative CPU time: 9 seconds 960 msec
Ended Job = job_1645755235953_36462
Launching Job 6 out of 11
FAILED: Hive Internal Error: java.util.ConcurrentModificationException(null)
java.util.ConcurrentModificationException
    at java.util.Hashtable$Enumerator.next(Hashtable.java:1387)
    at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:2910)
    at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.initialize(ExecDriver.java:178)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2649)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
    at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
    at 

[jira] [Assigned] (HIVE-25984) when set hive.auto.convert.join=true; and set hive.exec.parallel=true; in the case cause error

2022-02-25 Thread lkl (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lkl reassigned HIVE-25984:
--

Assignee: lkl

> when set hive.auto.convert.join=true; and set hive.exec.parallel=true; in the 
> case cause error
> --
>
> Key: HIVE-25984
> URL: https://issues.apache.org/jira/browse/HIVE-25984
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.0.0, 3.1.1, 3.1.2
>Reporter: lkl
>Assignee: lkl
>Priority: Major
> Fix For: All Versions
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25970) Missing messages in HS2 operation logs

2022-02-25 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498120#comment-17498120
 ] 

Zoltan Haindrich commented on HIVE-25970:
-

we just talked with [~zabetak]; and HIVE-24590 makes HIVE-22753 unneccessary - 
and it may only cause trouble (lost messages)

> Missing messages in HS2 operation logs
> --
>
> Key: HIVE-25970
> URL: https://issues.apache.org/jira/browse/HIVE-25970
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> After HIVE-22753 & HIVE-24590, with some unlucky timing of events, operation 
> log messages can get lost and never appear in the appropriate files.
> The changes in HIVE-22753 will prevent a {{HushableRandomAccessFileAppender}} 
> from being created if the latter refers to a file that has been closed in the 
> last second. Preventing the creation of the appender also means that the 
> message which triggered the creation will be lost forever. In fact any 
> message (for the same query) that comes in the interval of 1 second will be 
> lost forever.
> Before HIVE-24590 the appender/file was closed only once (explicitly by HS2) 
> and thus the problem may be very hard to notice in practice. However, with 
> the arrival of HIVE-24590 appenders may close much more frequently (and not 
> via HS2) making the issue reproducible rather easily. It suffices to set 
> _hive.server2.operation.log.purgePolicy.timeToLive_ property very low and 
> check the operation logs.
> The problem was discovered by investigating some intermittent failures in 
> operation logging tests (e.g.,  TestOperationLoggingAPIWithTez).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25984) when set hive.auto.convert.join=true; and set hive.exec.parallel=true; in the case cause error

2022-02-25 Thread lkl (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lkl updated HIVE-25984:
---
Description: 
{code:java}
    > set hive.exec.parallel=true;
hive> set hive.exec.parallel.thread.number=16;
Query ID = hadoop_20220225202936_1afb51d0-ce67-4bc2-9794-8c82b32efe99
Total jobs = 11
Launching Job 1 out of 11
Launching Job 2 out of 11
Launching Job 3 out of 11
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
Number of reduce tasks not specified. Estimated from input data size: 1
  set hive.exec.reducers.max=
In order to change the average load for a reducer (in bytes):
In order to set a constant number of reducers:
  set hive.exec.reducers.bytes.per.reducer=
  set mapreduce.job.reduces=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Launching Job 4 out of 11
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1645755235953_36462, Tracking URL = 
http://172.21.126.228:5004/proxy/application_1645755235953_36462/
Kill Command = /usr/local/service/hadoop/bin/mapred job  -kill 
job_1645755235953_36462
Starting Job = job_1645755235953_36460, Tracking URL = 
http://172.21.126.228:5004/proxy/application_1645755235953_36460/
Starting Job = job_1645755235953_36463, Tracking URL = 
http://172.21.126.228:5004/proxy/application_1645755235953_36463/
Kill Command = /usr/local/service/hadoop/bin/mapred job  -kill 
job_1645755235953_36460
Kill Command = /usr/local/service/hadoop/bin/mapred job  -kill 
job_1645755235953_36463
Starting Job = job_1645755235953_36461, Tracking URL = 
http://172.21.126.228:5004/proxy/application_1645755235953_36461/
Kill Command = /usr/local/service/hadoop/bin/mapred job  -kill 
job_1645755235953_36461
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1
2022-02-25 20:29:43,598 Stage-3 map = 0%,  reduce = 0%
Hadoop job information for Stage-9: number of mappers: 1; number of reducers: 1
2022-02-25 20:29:43,634 Stage-9 map = 0%,  reduce = 0%
Hadoop job information for Stage-7: number of mappers: 1; number of reducers: 1
2022-02-25 20:29:43,658 Stage-7 map = 0%,  reduce = 0%
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2022-02-25 20:29:44,646 Stage-1 map = 0%,  reduce = 0%
2022-02-25 20:29:51,767 Stage-9 map = 100%,  reduce = 0%, Cumulative CPU 5.29 
sec
2022-02-25 20:29:51,782 Stage-7 map = 100%,  reduce = 0%, Cumulative CPU 5.45 
sec
2022-02-25 20:29:52,750 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 6.06 
sec
2022-02-25 20:29:54,835 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.76 
sec
2022-02-25 20:29:58,872 Stage-9 map = 100%,  reduce = 100%, Cumulative CPU 7.49 
sec
2022-02-25 20:29:58,883 Stage-7 map = 100%,  reduce = 100%, Cumulative CPU 8.86 
sec
2022-02-25 20:29:59,868 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 9.96 
sec
MapReduce Total cumulative CPU time: 7 seconds 490 msec
Ended Job = job_1645755235953_36463
MapReduce Total cumulative CPU time: 8 seconds 860 msec
Ended Job = job_1645755235953_36461
Stage-15 is selected by condition resolver.
Stage-8 is filtered out by condition resolver.
MapReduce Total cumulative CPU time: 9 seconds 960 msec
Ended Job = job_1645755235953_36462
Launching Job 6 out of 11
FAILED: Hive Internal Error: java.util.ConcurrentModificationException(null)
java.util.ConcurrentModificationException
    at java.util.Hashtable$Enumerator.next(Hashtable.java:1387)
    at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:2910)
    at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.initialize(ExecDriver.java:178)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2649)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
    at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
    at 

[jira] [Commented] (HIVE-25984) when set hive.auto.convert.join=true; and set hive.exec.parallel=true; in the case cause error

2022-02-25 Thread lkl (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498113#comment-17498113
 ] 

lkl commented on HIVE-25984:


set hive.auto.convert.join=false;

set hive.exec.parallel=true;

 

change param value can run success.

> when set hive.auto.convert.join=true; and set hive.exec.parallel=true; in the 
> case cause error
> --
>
> Key: HIVE-25984
> URL: https://issues.apache.org/jira/browse/HIVE-25984
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 3.0.0, 3.1.1, 3.1.2
>Reporter: lkl
>Priority: Major
>
> {code:java}
>     > set hive.exec.parallel=true;
> hive> set hive.exec.parallel.thread.number=16;
> hive> ADD JAR 
> ofs://f4muzj1eelr-SyDy.chdfs.ap-beijing.myqcloud.com/datam/dota-archive-ningxia/dota/emr-steps/bigdata-dw-udf-0.0.1-SNAPSHOT-jar-with-dependencies.jar;
> Added 
> [/data/emr/hive/tmp/2fbfd169-5bd0-4a63-922a-a25e88737375_resources/bigdata-dw-udf-0.0.1-SNAPSHOT-jar-with-dependencies.jar]
>  to class path
> Added resources: 
> [ofs://f4muzj1eelr-SyDy.chdfs.ap-beijing.myqcloud.com/datam/dota-archive-ningxia/dota/emr-steps/bigdata-dw-udf-0.0.1-SNAPSHOT-jar-with-dependencies.jar]
> hive> 
>     > --INSERT OVERWRITE TABLE mgdm.dm_log_weixin_sdk_playtime_hour 
> PARTITION(pday=20220212,phour='08',pbid='weixin')
>     > select
>     >  a.ip               as ip,       -- ip
>     >  a.isp_id           as isp_id,   -- 运营商ID
>     >  a.isp              as isp,      -- 运营商名称
>     >  a.country_id       as country_id,  -- 国家id
>     >  a.country          as country,     -- 国家名称
>     >  a.is_domestic      as is_domestic, -- 
>     >  a.province_id      as province_id, -- 省份ID
>     >  a.province         as province,    -- 省份名称
>     >  a.city_id          as city_id,     -- 城市ID
>     >  a.city             as city,        -- 城市名称
>     >  a.did , 
>     >  a.sessionid , 
>     >  a.uuid , 
>     >  a.uvip , 
>     >  a.url , 
>     >  a.ver , 
>     >  a.suuid , 
>     >  a.termid ,  
>     >  a.pix , 
>     >  a.bid , 
>     >  a.sdkver , 
>     >  a.`from` , 
>     >  a.pay , 
>     >  a.pt , 
>     >  a.cpt , 
>     >  a.plid , 
>     >  a.istry , 
>     >  a.def , 
>     >  a.ap , 
>     >  a.pstatus , 
>     >  a.cdnip , 
>     >  a.cp , 
>     >  a.bdid , 
>     >  a.bsid , 
>     >  a.cf , 
>     >  a.cid , 
>     >  a.idx , 
>     >  a.vts , 
>     >  a.td , 
>     >  a.unionid , 
>     >  a.src , 
>     >  a.ct , 
>     >  a.ht , 
>     >  a.clip_id , 
>     >  a.part_id , 
>     >  a.class_id , 
>     >  a.is_full , 
>     >  a.duration , 
>     >  IF(b.play_time>4000, 4000, IF(b.play_time > 0, b.play_time, 0))
>     >                           as playtime,   -- 播放时长
>     >  current_timestamp()      as fetl_time   -- etl时间
>     >   from (select a.*
>     >           from (select a.*
>     >                   from (select a.*,
>     >                                row_number() over(partition by suuid, 
> pday, phour order by event_time desc) rn
>     >                           from mgdw.dw_log_weixin_sdk_hb_hour a
>     >                          where pday = 20220212
>     >                            and phour = '08'
>     >                            and pbid = 'weixin'
>     >                            and suuid is not null
>     >                            and logtype='hb') a
>     >                  where rn = 1) a) a
>     >   left join (select a.pday,
>     >                     a.phour,
>     >                     a.suuid,
>     >                     ceil(a.play_hb_time - coalesce(buffer_play_time, 
> 0)) as play_time
>     >                from (select a.pday,
>     >                             a.phour,
>     >                             a.suuid,
>     >                             sum(play_hb_time) as play_hb_time
>     >                        from (select a.pday,
>     >                                     a.phour,
>     >                                     a.suuid,
>     >                                     case
>     >                                       when idx = min_idx then
>     >                                        if(unix_timestamp(event_time) -
>     >                                           unix_timestamp(min_stime) > 
> hb_time,
>     >                                           hb_time,
>     >                                           unix_timestamp(event_time) -
>     >                                           unix_timestamp(min_stime))
>     >                                       when idx = max_idx then
>     >                                        if(unix_timestamp(event_time) -
>     >                                           unix_timestamp(pre_time) > 
> hb_time,
>     >                                           hb_time,
>     >            

[jira] [Commented] (HIVE-24905) only CURRENT ROW end frame is supported for RANGE

2022-02-25 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498059#comment-17498059
 ] 

Stamatis Zampetakis commented on HIVE-24905:


Since {{RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING}} is not 
vectorized at the moment there is a hack in 
[ASTConverter|https://github.com/apache/hive/blob/2a1a73f665eee497ebdb0745ab2c31c1614de017/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java#L724]
 to transform RANGE to ROWS (i.e.,g {{ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING}}) when the window is unbounded since they are equivalent. 
When this issue is resolved we could remove the respective code in ASTConverter.


> only CURRENT ROW end frame is supported for RANGE
> -
>
> Key: HIVE-24905
> URL: https://issues.apache.org/jira/browse/HIVE-24905
> Project: Hive
>  Issue Type: Sub-task
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> This one is about to take care of vectorizing the FOLLOWING rows case:
> {code}
> avg(p_retailprice) over(partition by p_mfgr order by p_date range between 1 
> preceding and 3 following) as avg1,
> {code}
> {code}
> Reduce Vectorization:
> enabled: true
> enableConditionsMet: hive.vectorized.execution.reduce.enabled 
> IS true, hive.execution.engine tez IN [tez, spark] IS true
> notVectorizedReason: PTF operator: count only CURRENT ROW end 
> frame is supported for RANGE
> vectorized: false
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25981) Avoid checking for archived parts in analyze table

2022-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25981?focusedWorklogId=732984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-732984
 ]

ASF GitHub Bot logged work on HIVE-25981:
-

Author: ASF GitHub Bot
Created on: 25/Feb/22 10:03
Start Date: 25/Feb/22 10:03
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3052:
URL: https://github.com/apache/hive/pull/3052#discussion_r814635583



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -13131,7 +13131,8 @@ public void validate() throws SemanticException {
   LOG.debug("validated " + usedp.getName());
   LOG.debug(usedp.getTable().getTableName());
   WriteEntity.WriteType writeType = writeEntity.getWriteType();
-  if (writeType != WriteType.UPDATE && writeType != WriteType.DELETE) {
+  if (writeType != WriteType.UPDATE && writeType != WriteType.DELETE 
&& writeType != WriteType.DDL_NO_LOCK

Review comment:
   Yeah, checking `AcidUtils.isTransactionalTable(tbl)` seems more 
reasonable than depending on the `WriteType`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 732984)
Time Spent: 50m  (was: 40m)

> Avoid checking for archived parts in analyze table
> --
>
> Key: HIVE-25981
> URL: https://issues.apache.org/jira/browse/HIVE-25981
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Analyze table on large partitioned table is expensive due to unwanted checks 
> on archived data.
>  
> {noformat}
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:3908)
>     - locked <0x0003d4c4c070> (a 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler)
>     at com.sun.proxy.$Proxy56.listPartitionsWithAuthInfo(Unknown Source)
>     at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:3845)
>     at 
> org.apache.hadoop.hive.ql.exec.ArchiveUtils.conflictingArchiveNameOrNull(ArchiveUtils.java:299)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:13579)
>     at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:241)
>     at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:196)
>     at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:615)
>     at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:561)
>     at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:555)
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:265)
>     at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:285)  
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25981) Avoid checking for archived parts in analyze table

2022-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25981?focusedWorklogId=732983=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-732983
 ]

ASF GitHub Bot logged work on HIVE-25981:
-

Author: ASF GitHub Bot
Created on: 25/Feb/22 10:00
Start Date: 25/Feb/22 10:00
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #3052:
URL: https://github.com/apache/hive/pull/3052#discussion_r814632395



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -13131,7 +13131,8 @@ public void validate() throws SemanticException {
   LOG.debug("validated " + usedp.getName());
   LOG.debug(usedp.getTable().getTableName());
   WriteEntity.WriteType writeType = writeEntity.getWriteType();
-  if (writeType != WriteType.UPDATE && writeType != WriteType.DELETE) {
+  if (writeType != WriteType.UPDATE && writeType != WriteType.DELETE 
&& writeType != WriteType.DDL_NO_LOCK

Review comment:
   I am fine with removing this completely for "transactionalInQuery", 
given this is never used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 732983)
Time Spent: 40m  (was: 0.5h)

> Avoid checking for archived parts in analyze table
> --
>
> Key: HIVE-25981
> URL: https://issues.apache.org/jira/browse/HIVE-25981
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Analyze table on large partitioned table is expensive due to unwanted checks 
> on archived data.
>  
> {noformat}
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:3908)
>     - locked <0x0003d4c4c070> (a 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler)
>     at com.sun.proxy.$Proxy56.listPartitionsWithAuthInfo(Unknown Source)
>     at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:3845)
>     at 
> org.apache.hadoop.hive.ql.exec.ArchiveUtils.conflictingArchiveNameOrNull(ArchiveUtils.java:299)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:13579)
>     at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:241)
>     at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:196)
>     at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:615)
>     at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:561)
>     at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:555)
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:265)
>     at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:285)  
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25981) Avoid checking for archived parts in analyze table

2022-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25981?focusedWorklogId=732947=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-732947
 ]

ASF GitHub Bot logged work on HIVE-25981:
-

Author: ASF GitHub Bot
Created on: 25/Feb/22 08:57
Start Date: 25/Feb/22 08:57
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3052:
URL: https://github.com/apache/hive/pull/3052#discussion_r814586334



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -13131,7 +13131,8 @@ public void validate() throws SemanticException {
   LOG.debug("validated " + usedp.getName());
   LOG.debug(usedp.getTable().getTableName());
   WriteEntity.WriteType writeType = writeEntity.getWriteType();
-  if (writeType != WriteType.UPDATE && writeType != WriteType.DELETE) {
+  if (writeType != WriteType.UPDATE && writeType != WriteType.DELETE 
&& writeType != WriteType.DDL_NO_LOCK

Review comment:
   Why we think that the WriteType defines whether we need to check for 
archived parts, or not?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 732947)
Time Spent: 0.5h  (was: 20m)

> Avoid checking for archived parts in analyze table
> --
>
> Key: HIVE-25981
> URL: https://issues.apache.org/jira/browse/HIVE-25981
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Analyze table on large partitioned table is expensive due to unwanted checks 
> on archived data.
>  
> {noformat}
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:3908)
>     - locked <0x0003d4c4c070> (a 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler)
>     at com.sun.proxy.$Proxy56.listPartitionsWithAuthInfo(Unknown Source)
>     at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:3845)
>     at 
> org.apache.hadoop.hive.ql.exec.ArchiveUtils.conflictingArchiveNameOrNull(ArchiveUtils.java:299)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:13579)
>     at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:241)
>     at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:196)
>     at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:615)
>     at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:561)
>     at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:555)
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127)
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:265)
>     at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:285)  
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded

2022-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=732944=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-732944
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 25/Feb/22 08:54
Start Date: 25/Feb/22 08:54
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r814583976



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -303,56 +304,132 @@ void checkTable(Table table, PartitionIterable parts, 
byte[] filterExp, CheckRes
 if (tablePath == null) {
   return;
 }
-FileSystem fs = tablePath.getFileSystem(conf);
-if (!fs.exists(tablePath)) {
+final FileSystem[] fs = {tablePath.getFileSystem(conf)};
+if (!fs[0].exists(tablePath)) {
   result.getTablesNotOnFs().add(table.getTableName());
   return;
 }
 
 Set partPaths = new HashSet<>();
 
-// check that the partition folders exist on disk
-for (Partition partition : parts) {
-  if (partition == null) {
-// most likely the user specified an invalid partition
-continue;
-  }
-  Path partPath = getDataLocation(table, partition);
-  if (partPath == null) {
-continue;
-  }
-  fs = partPath.getFileSystem(conf);
+int threadCount = MetastoreConf.getIntVar(conf, 
MetastoreConf.ConfVars.METASTORE_MSCK_FS_HANDLER_THREADS_COUNT);
+
+final ExecutorService pool = (threadCount > 1) ?
+Executors.newFixedThreadPool(threadCount,
+new ThreadFactoryBuilder()
+.setDaemon(true)
+.setNameFormat("CheckTable-PartitionOptimizer-%d").build()) : 
null;
 
-  CheckResult.PartitionResult prFromMetastore = new 
CheckResult.PartitionResult();
-  prFromMetastore.setPartitionName(getPartitionName(table, partition));
-  prFromMetastore.setTableName(partition.getTableName());
-  if (!fs.exists(partPath)) {
-result.getPartitionsNotOnFs().add(prFromMetastore);
+try {
+  Queue> futures = new LinkedList<>();
+  if (pool != null) {
+// check that the partition folders exist on disk using multi-thread
+for (Partition partition : parts) {

Review comment:
   I think this will fetch all of the partitions from the partition 
iterator immediately and keep them in memory.
   
   The goal was with the partition iterator to prevent OOM when there are big 
tables with huge number of partitions. We do not want every partition in the 
memory once, so the iterator fetched them in batches, and after we did not use 
them we let the GC take care of the batch.
   
   With this change I expect that we create a `Future` immediately for all of 
the partitions and we will keep all of the partitions in memory until all of 
the checks are finished.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 732944)
Time Spent: 20m  (was: 10m)

> Support HiveMetaStoreChecker.checkTable operation with multi-threaded
> -
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
>