[jira] [Work logged] (HIVE-25986) Statement id is incorrect in case of load in path to MM table
[ https://issues.apache.org/jira/browse/HIVE-25986?focusedWorklogId=733482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733482 ] ASF GitHub Bot logged work on HIVE-25986: - Author: ASF GitHub Bot Created on: 26/Feb/22 07:04 Start Date: 26/Feb/22 07:04 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #3055: URL: https://github.com/apache/hive/pull/3055#discussion_r815279139 ## File path: ql/src/java/org/apache/hadoop/hive/ql/QueryPlan.java ## @@ -226,7 +226,7 @@ public Integer getStatementIdForAcidWriteType(long writeId, String moveTaskId, A if (result != null) { return result.getStatementId(); } else { - return -1; + return 0; Review comment: Please add some comments here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 733482) Time Spent: 0.5h (was: 20m) > Statement id is incorrect in case of load in path to MM table > - > > Key: HIVE-25986 > URL: https://issues.apache.org/jira/browse/HIVE-25986 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: ACID, pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25986) Statement id is incorrect in case of load in path to MM table
[ https://issues.apache.org/jira/browse/HIVE-25986?focusedWorklogId=733480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733480 ] ASF GitHub Bot logged work on HIVE-25986: - Author: ASF GitHub Bot Created on: 26/Feb/22 07:03 Start Date: 26/Feb/22 07:03 Worklog Time Spent: 10m Work Description: pvary commented on pull request #3055: URL: https://github.com/apache/hive/pull/3055#issuecomment-1051721851 Please add a test case which checks the dir name -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 733480) Time Spent: 20m (was: 10m) > Statement id is incorrect in case of load in path to MM table > - > > Key: HIVE-25986 > URL: https://issues.apache.org/jira/browse/HIVE-25986 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: ACID, pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25750) Beeline: Creating a standalone tarball by isolating dependencies
[ https://issues.apache.org/jira/browse/HIVE-25750?focusedWorklogId=733427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733427 ] ASF GitHub Bot logged work on HIVE-25750: - Author: ASF GitHub Bot Created on: 26/Feb/22 00:23 Start Date: 26/Feb/22 00:23 Worklog Time Spent: 10m Work Description: achennagiri commented on pull request #3043: URL: https://github.com/apache/hive/pull/3043#issuecomment-1051392895 retest -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 733427) Time Spent: 3h 20m (was: 3h 10m) > Beeline: Creating a standalone tarball by isolating dependencies > > > Key: HIVE-25750 > URL: https://issues.apache.org/jira/browse/HIVE-25750 > Project: Hive > Issue Type: Bug >Reporter: Abhay >Assignee: Abhay >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > The code to create a standalone beeline tarball was created as part of this > ticket https://issues.apache.org/jira/browse/HIVE-24348. However, a bug was > reported in the case when the beeline is tried to install without the hadoop > installed. > The beeline script complains of missing dependencies when it is run. > The ask as part of this ticket is to fix that bug. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25750) Beeline: Creating a standalone tarball by isolating dependencies
[ https://issues.apache.org/jira/browse/HIVE-25750?focusedWorklogId=733360=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733360 ] ASF GitHub Bot logged work on HIVE-25750: - Author: ASF GitHub Bot Created on: 25/Feb/22 21:36 Start Date: 25/Feb/22 21:36 Worklog Time Spent: 10m Work Description: achennagiri commented on pull request #3043: URL: https://github.com/apache/hive/pull/3043#issuecomment-1051288787 retest -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 733360) Time Spent: 3h 10m (was: 3h) > Beeline: Creating a standalone tarball by isolating dependencies > > > Key: HIVE-25750 > URL: https://issues.apache.org/jira/browse/HIVE-25750 > Project: Hive > Issue Type: Bug >Reporter: Abhay >Assignee: Abhay >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > The code to create a standalone beeline tarball was created as part of this > ticket https://issues.apache.org/jira/browse/HIVE-24348. However, a bug was > reported in the case when the beeline is tried to install without the hadoop > installed. > The beeline script complains of missing dependencies when it is run. > The ask as part of this ticket is to fix that bug. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25750) Beeline: Creating a standalone tarball by isolating dependencies
[ https://issues.apache.org/jira/browse/HIVE-25750?focusedWorklogId=733354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733354 ] ASF GitHub Bot logged work on HIVE-25750: - Author: ASF GitHub Bot Created on: 25/Feb/22 21:28 Start Date: 25/Feb/22 21:28 Worklog Time Spent: 10m Work Description: achennagiri opened a new pull request #3043: URL: https://github.com/apache/hive/pull/3043 The code to create a standalone beeline tarball was created as part of this ticket https://issues.apache.org/jira/browse/HIVE-24348. However, a bug was reported in the case when the beeline is tried to install without the hadoop installed. The beeline script complains of missing dependencies when it is run. Update: Was running in to the below error with the file mode on in Beeline ``` Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.JobConf at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) ``` Added a fix to resolve this. ### What changes were proposed in this pull request? The beeline script can be run with/without hadoop installed. All the required dependencies are bundled into a single downloadable tar file. `mvn clean package install -Pdist -Pitests -DskipTests -Denforcer.skip=true` generates something along the lines of **apache-hive-beeline-4.0.0-SNAPSHOT.tar.gz** in the **packaging/target** folder. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Created a docker container using the command `sudo docker run --rm -it -v /Users/achennagiri/Downloads:/container --user root docker-private.infra.cloudera.com/cloudera_base/ubi8/python-38:1-68 /bin/bash` Need to install `yum install -y java-11-openjdk` java in the container. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 733354) Time Spent: 3h (was: 2h 50m) > Beeline: Creating a standalone tarball by isolating dependencies > > > Key: HIVE-25750 > URL: https://issues.apache.org/jira/browse/HIVE-25750 > Project: Hive > Issue Type: Bug >Reporter: Abhay >Assignee: Abhay >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h > Remaining Estimate: 0h > > The code to create a standalone beeline tarball was created as part of this > ticket https://issues.apache.org/jira/browse/HIVE-24348. However, a bug was > reported in the case when the beeline is tried to install without the hadoop > installed. > The beeline script complains of missing dependencies when it is run. > The ask as part of this ticket is to fix that bug. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25988) CreateTableEvent should have database object as one of the hive privilege object.
[ https://issues.apache.org/jira/browse/HIVE-25988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25988: -- Labels: pull-request-available (was: ) > CreateTableEvent should have database object as one of the hive privilege > object. > - > > Key: HIVE-25988 > URL: https://issues.apache.org/jira/browse/HIVE-25988 > Project: Hive > Issue Type: Bug > Components: Hive, Standalone Metastore >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The CreateTableEvent in HMS should have a database object as one of the > HivePrivilege Objects so that it is consistent with HS2's CreateTable Event. > Also, we need to move the DFS_URI object into the InputList so that this is > also consistent with HS2's behavior. > Having database objects in the create table events hive privilege objects > helps to determine if a user has the right permissions to create a table in a > particular database via ranger/sentry. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25988) CreateTableEvent should have database object as one of the hive privilege object.
[ https://issues.apache.org/jira/browse/HIVE-25988?focusedWorklogId=733250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733250 ] ASF GitHub Bot logged work on HIVE-25988: - Author: ASF GitHub Bot Created on: 25/Feb/22 18:47 Start Date: 25/Feb/22 18:47 Worklog Time Spent: 10m Work Description: saihemanth-cloudera opened a new pull request #3057: URL: https://github.com/apache/hive/pull/3057 …e hive privilege object ### What changes were proposed in this pull request? Included Database object in the HivePrivilegeObjects for the CreateTableEvent ### Why are the changes needed? Ranger/sentry can use this information to evaluate if the user has the right permissions to create a table or not. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Local machine, Remote cluster -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 733250) Remaining Estimate: 0h Time Spent: 10m > CreateTableEvent should have database object as one of the hive privilege > object. > - > > Key: HIVE-25988 > URL: https://issues.apache.org/jira/browse/HIVE-25988 > Project: Hive > Issue Type: Bug > Components: Hive, Standalone Metastore >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The CreateTableEvent in HMS should have a database object as one of the > HivePrivilege Objects so that it is consistent with HS2's CreateTable Event. > Also, we need to move the DFS_URI object into the InputList so that this is > also consistent with HS2's behavior. > Having database objects in the create table events hive privilege objects > helps to determine if a user has the right permissions to create a table in a > particular database via ranger/sentry. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25988) CreateTableEvent should have database object as one of the hive privilege object.
[ https://issues.apache.org/jira/browse/HIVE-25988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sai Hemanth Gantasala reassigned HIVE-25988: > CreateTableEvent should have database object as one of the hive privilege > object. > - > > Key: HIVE-25988 > URL: https://issues.apache.org/jira/browse/HIVE-25988 > Project: Hive > Issue Type: Bug > Components: Hive, Standalone Metastore >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > > The CreateTableEvent in HMS should have a database object as one of the > HivePrivilege Objects so that it is consistent with HS2's CreateTable Event. > Also, we need to move the DFS_URI object into the InputList so that this is > also consistent with HS2's behavior. > Having database objects in the create table events hive privilege objects > helps to determine if a user has the right permissions to create a table in a > particular database via ranger/sentry. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25970) Missing messages in HS2 operation logs
[ https://issues.apache.org/jira/browse/HIVE-25970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-25970. Fix Version/s: 4.0.0 Resolution: Fixed Fixed in https://github.com/apache/hive/commit/d3cd596aa15ebedd58f99628d43a03eb2f5f3909. Thanks for the review [~kgyrtkirk]! > Missing messages in HS2 operation logs > -- > > Key: HIVE-25970 > URL: https://issues.apache.org/jira/browse/HIVE-25970 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > After HIVE-22753 & HIVE-24590, with some unlucky timing of events, operation > log messages can get lost and never appear in the appropriate files. > The changes in HIVE-22753 will prevent a {{HushableRandomAccessFileAppender}} > from being created if the latter refers to a file that has been closed in the > last second. Preventing the creation of the appender also means that the > message which triggered the creation will be lost forever. In fact any > message (for the same query) that comes in the interval of 1 second will be > lost forever. > Before HIVE-24590 the appender/file was closed only once (explicitly by HS2) > and thus the problem may be very hard to notice in practice. However, with > the arrival of HIVE-24590 appenders may close much more frequently (and not > via HS2) making the issue reproducible rather easily. It suffices to set > _hive.server2.operation.log.purgePolicy.timeToLive_ property very low and > check the operation logs. > The problem was discovered by investigating some intermittent failures in > operation logging tests (e.g., TestOperationLoggingAPIWithTez). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25970) Missing messages in HS2 operation logs
[ https://issues.apache.org/jira/browse/HIVE-25970?focusedWorklogId=733189=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733189 ] ASF GitHub Bot logged work on HIVE-25970: - Author: ASF GitHub Bot Created on: 25/Feb/22 17:30 Start Date: 25/Feb/22 17:30 Worklog Time Spent: 10m Work Description: zabetak closed pull request #3048: URL: https://github.com/apache/hive/pull/3048 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 733189) Time Spent: 40m (was: 0.5h) > Missing messages in HS2 operation logs > -- > > Key: HIVE-25970 > URL: https://issues.apache.org/jira/browse/HIVE-25970 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > After HIVE-22753 & HIVE-24590, with some unlucky timing of events, operation > log messages can get lost and never appear in the appropriate files. > The changes in HIVE-22753 will prevent a {{HushableRandomAccessFileAppender}} > from being created if the latter refers to a file that has been closed in the > last second. Preventing the creation of the appender also means that the > message which triggered the creation will be lost forever. In fact any > message (for the same query) that comes in the interval of 1 second will be > lost forever. > Before HIVE-24590 the appender/file was closed only once (explicitly by HS2) > and thus the problem may be very hard to notice in practice. However, with > the arrival of HIVE-24590 appenders may close much more frequently (and not > via HS2) making the issue reproducible rather easily. It suffices to set > _hive.server2.operation.log.purgePolicy.timeToLive_ property very low and > check the operation logs. > The problem was discovered by investigating some intermittent failures in > operation logging tests (e.g., TestOperationLoggingAPIWithTez). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25896) Remove getThreadId from IHMSHandler
[ https://issues.apache.org/jira/browse/HIVE-25896?focusedWorklogId=733114=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733114 ] ASF GitHub Bot logged work on HIVE-25896: - Author: ASF GitHub Bot Created on: 25/Feb/22 15:34 Start Date: 25/Feb/22 15:34 Worklog Time Spent: 10m Work Description: klcopp commented on pull request #3017: URL: https://github.com/apache/hive/pull/3017#issuecomment-1050957427 LGTM. Compaction does log/store the thread ids, but it gets those directly from Thread#getId. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 733114) Time Spent: 1h 20m (was: 1h 10m) > Remove getThreadId from IHMSHandler > --- > > Key: HIVE-25896 > URL: https://issues.apache.org/jira/browse/HIVE-25896 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > In IHMSHandler which is annotated as 'InterfaceAudience.Private', we use > getThreadId to log the thread information now, the threadId can be logged > automatically if we configure the logger properly, the method can be removed > for better maintenance of IMSHandler. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25986) Statement id is incorrect in case of load in path to MM table
[ https://issues.apache.org/jira/browse/HIVE-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits updated HIVE-25986: --- Status: Patch Available (was: Open) > Statement id is incorrect in case of load in path to MM table > - > > Key: HIVE-25986 > URL: https://issues.apache.org/jira/browse/HIVE-25986 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: ACID, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25986) Statement id is incorrect in case of load in path to MM table
[ https://issues.apache.org/jira/browse/HIVE-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25986: -- Labels: ACID pull-request-available (was: ACID) > Statement id is incorrect in case of load in path to MM table > - > > Key: HIVE-25986 > URL: https://issues.apache.org/jira/browse/HIVE-25986 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: ACID, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25986) Statement id is incorrect in case of load in path to MM table
[ https://issues.apache.org/jira/browse/HIVE-25986?focusedWorklogId=733107=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733107 ] ASF GitHub Bot logged work on HIVE-25986: - Author: ASF GitHub Bot Created on: 25/Feb/22 15:20 Start Date: 25/Feb/22 15:20 Worklog Time Spent: 10m Work Description: asinkovits opened a new pull request #3055: URL: https://github.com/apache/hive/pull/3055 ### What changes were proposed in this pull request? Statement id is incorrect is incorrect if the table is insert only acid table and the load in path is used to load the data. ### Why are the changes needed? the format of the delta directory is incorrect because of the wrong statement id ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manual testing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 733107) Remaining Estimate: 0h Time Spent: 10m > Statement id is incorrect in case of load in path to MM table > - > > Key: HIVE-25986 > URL: https://issues.apache.org/jira/browse/HIVE-25986 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: ACID > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25986) Statement id is incorrect in case of load in path to MM table
[ https://issues.apache.org/jira/browse/HIVE-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits updated HIVE-25986: --- Summary: Statement id is incorrect in case of load in path to MM table (was: statement id is incorrect in case of load in path to MM table) > Statement id is incorrect in case of load in path to MM table > - > > Key: HIVE-25986 > URL: https://issues.apache.org/jira/browse/HIVE-25986 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: ACID > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25986) statement id is incorrect in case of load in path to MM table
[ https://issues.apache.org/jira/browse/HIVE-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits updated HIVE-25986: --- Summary: statement id is incorrect in case of load in path to MM table (was: statement id in incorrect in case of load in path to MM table) > statement id is incorrect in case of load in path to MM table > - > > Key: HIVE-25986 > URL: https://issues.apache.org/jira/browse/HIVE-25986 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: ACID > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25986) statement id in incorrect in case of load in path to MM table
[ https://issues.apache.org/jira/browse/HIVE-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits updated HIVE-25986: --- Labels: ACID (was: ) > statement id in incorrect in case of load in path to MM table > - > > Key: HIVE-25986 > URL: https://issues.apache.org/jira/browse/HIVE-25986 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: ACID > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25986) statement id in incorrect in case of load in path to MM table
[ https://issues.apache.org/jira/browse/HIVE-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits updated HIVE-25986: --- Affects Version/s: 4.0.0 > statement id in incorrect in case of load in path to MM table > - > > Key: HIVE-25986 > URL: https://issues.apache.org/jira/browse/HIVE-25986 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25986) statement id in incorrect in case of load in path to MM table
[ https://issues.apache.org/jira/browse/HIVE-25986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Sinkovits reassigned HIVE-25986: -- > statement id in incorrect in case of load in path to MM table > - > > Key: HIVE-25986 > URL: https://issues.apache.org/jira/browse/HIVE-25986 > Project: Hive > Issue Type: Bug >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25985) Estimate stats gives out incorrect number of columns during query planning when using predicates like c=22
[ https://issues.apache.org/jira/browse/HIVE-25985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sindhu Subhas updated HIVE-25985: - Summary: Estimate stats gives out incorrect number of columns during query planning when using predicates like c=22 (was: Estimate stats gives out incorrect number of columns when using predicates like c=22) > Estimate stats gives out incorrect number of columns during query planning > when using predicates like c=22 > -- > > Key: HIVE-25985 > URL: https://issues.apache.org/jira/browse/HIVE-25985 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.0.0 > Environment: Hive 3 >Reporter: Sindhu Subhas >Priority: Major > > Table type: External > Stats: No stats collected. > When filter operator appeared in the plan and the row estimates went bad. > Changed the original query on table and modifying the filter predicate form. > > |*predicate form*|*optimised as* |*filter Op rows out*|*estimate quality*| > |prd_i_tmp.type = '22'|predicate:(type = '22')|Filter Operator [FIL_12] > (rows=5 width=3707) \||bad| > |prd_i_tmp.type in ('22')|predicate:(type = '22')|Filter Operator [FIL_12] > (rows=5 width=3707)|bad| > |prd_i_tmp.type < '23' and prd_i_tmp.type > '21'|predicate:((type < '23') and > (type > '21'))|Filter Operator [FIL_12] (rows=8706269 width=3707) |good| > |prd_i_tmp.type like '22'|predicate:(type like '22')|Filter Operator [FIL_12] > (rows=39178213 width=3707)|best| > |prd_i_tmp.type in ('22','AA','BB')|predicate:(type) IN ('22', 'AA', > 'BB')|Filter Operator [FIL_12] (rows=15 width=3707)|bad| > |prd_i_tmp.type rlike '22'|predicate:type regexp '22'| Filter Operator > [FIL_12] (rows=39178213 width=3707)|good| -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HIVE-25979) Order of Lineage is flaky in qtest output
[ https://issues.apache.org/jira/browse/HIVE-25979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa resolved HIVE-25979. --- Resolution: Fixed Pushed to master. Thanks [~kgyrtkirk] for review. [~ayushtkn] this patch should fix {{stats_part_multi_insert_acid}} flakyness. > Order of Lineage is flaky in qtest output > - > > Key: HIVE-25979 > URL: https://issues.apache.org/jira/browse/HIVE-25979 > Project: Hive > Issue Type: Bug >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When running > {code:java} > mvn test -Dtest=TestMiniLlapLocalCliDriver > -Dqfile=stats_part_multi_insert_acid.q -pl itests/qtest -Pitests > {code} > The lineage output of statement: > {code:java} > from source > insert into stats_part select key, value, p > insert into stats_part select key, value, p > {code} > is expected to be > {code:java} > POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE > [(source)source.FieldSchema(name:key, type:int, comment:null), ] > POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE > [(source)source.FieldSchema(name:key, type:int, comment:null), ] > POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE > [(source)source.FieldSchema(name:value, type:string, comment:null), ] > POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE > [(source)source.FieldSchema(name:value, type:string, comment:null), ] > {code} > but sometimes it is > {code:java} > POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE > [(source)source.FieldSchema(name:key, type:int, comment:null), ] > POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE > [(source)source.FieldSchema(name:value, type:string, comment:null), ] > POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE > [(source)source.FieldSchema(name:key, type:int, comment:null), ] > POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE > [(source)source.FieldSchema(name:value, type:string, comment:null), ] > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25979) Order of Lineage is flaky in qtest output
[ https://issues.apache.org/jira/browse/HIVE-25979?focusedWorklogId=733039=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733039 ] ASF GitHub Bot logged work on HIVE-25979: - Author: ASF GitHub Bot Created on: 25/Feb/22 13:10 Start Date: 25/Feb/22 13:10 Worklog Time Spent: 10m Work Description: kasakrisz merged pull request #3050: URL: https://github.com/apache/hive/pull/3050 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 733039) Time Spent: 20m (was: 10m) > Order of Lineage is flaky in qtest output > - > > Key: HIVE-25979 > URL: https://issues.apache.org/jira/browse/HIVE-25979 > Project: Hive > Issue Type: Bug >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When running > {code:java} > mvn test -Dtest=TestMiniLlapLocalCliDriver > -Dqfile=stats_part_multi_insert_acid.q -pl itests/qtest -Pitests > {code} > The lineage output of statement: > {code:java} > from source > insert into stats_part select key, value, p > insert into stats_part select key, value, p > {code} > is expected to be > {code:java} > POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE > [(source)source.FieldSchema(name:key, type:int, comment:null), ] > POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE > [(source)source.FieldSchema(name:key, type:int, comment:null), ] > POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE > [(source)source.FieldSchema(name:value, type:string, comment:null), ] > POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE > [(source)source.FieldSchema(name:value, type:string, comment:null), ] > {code} > but sometimes it is > {code:java} > POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE > [(source)source.FieldSchema(name:key, type:int, comment:null), ] > POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE > [(source)source.FieldSchema(name:value, type:string, comment:null), ] > POSTHOOK: Lineage: stats_part PARTITION(p=101).key SIMPLE > [(source)source.FieldSchema(name:key, type:int, comment:null), ] > POSTHOOK: Lineage: stats_part PARTITION(p=101).value SIMPLE > [(source)source.FieldSchema(name:value, type:string, comment:null), ] > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25984) TTTT
[ https://issues.apache.org/jira/browse/HIVE-25984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lkl updated HIVE-25984: --- Summary: (was: when set hive.auto.convert.join=true; and set hive.exec.parallel=true; in the case cause error) > > > > Key: HIVE-25984 > URL: https://issues.apache.org/jira/browse/HIVE-25984 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.0.0, 3.1.1, 3.1.2 >Reporter: lkl >Assignee: lkl >Priority: Major > Fix For: All Versions > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25984) TTTT
[ https://issues.apache.org/jira/browse/HIVE-25984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lkl updated HIVE-25984: --- Component/s: (was: Hive) Fix Version/s: (was: All Versions) Affects Version/s: (was: 3.0.0) (was: 3.1.1) (was: 3.1.2) Issue Type: Test (was: Improvement) Priority: Trivial (was: Major) > > > > Key: HIVE-25984 > URL: https://issues.apache.org/jira/browse/HIVE-25984 > Project: Hive > Issue Type: Test >Reporter: lkl >Assignee: lkl >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] (HIVE-25984) TTTT
[ https://issues.apache.org/jira/browse/HIVE-25984 ] lkl deleted comment on HIVE-25984: was (Author: JIRAUSER284773): set hive.auto.convert.join=false; set hive.exec.parallel=true; change param value can run success. > > > > Key: HIVE-25984 > URL: https://issues.apache.org/jira/browse/HIVE-25984 > Project: Hive > Issue Type: Test >Reporter: lkl >Assignee: lkl >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25984) when set hive.auto.convert.join=true; and set hive.exec.parallel=true; in the case cause error
[ https://issues.apache.org/jira/browse/HIVE-25984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lkl updated HIVE-25984: --- Fix Version/s: All Versions Description: (was: {code:java} > set hive.exec.parallel=true; hive> set hive.exec.parallel.thread.number=16; Query ID = hadoop_20220225202936_1afb51d0-ce67-4bc2-9794-8c82b32efe99 Total jobs = 11 Launching Job 1 out of 11 Launching Job 2 out of 11 Launching Job 3 out of 11 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: Number of reduce tasks not specified. Estimated from input data size: 1 set hive.exec.reducers.max= In order to change the average load for a reducer (in bytes): In order to set a constant number of reducers: set hive.exec.reducers.bytes.per.reducer= set mapreduce.job.reduces= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Launching Job 4 out of 11 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Job = job_1645755235953_36462, Tracking URL = http://172.21.126.228:5004/proxy/application_1645755235953_36462/ Kill Command = /usr/local/service/hadoop/bin/mapred job -kill job_1645755235953_36462 Starting Job = job_1645755235953_36460, Tracking URL = http://172.21.126.228:5004/proxy/application_1645755235953_36460/ Starting Job = job_1645755235953_36463, Tracking URL = http://172.21.126.228:5004/proxy/application_1645755235953_36463/ Kill Command = /usr/local/service/hadoop/bin/mapred job -kill job_1645755235953_36460 Kill Command = /usr/local/service/hadoop/bin/mapred job -kill job_1645755235953_36463 Starting Job = job_1645755235953_36461, Tracking URL = http://172.21.126.228:5004/proxy/application_1645755235953_36461/ Kill Command = /usr/local/service/hadoop/bin/mapred job -kill job_1645755235953_36461 Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1 2022-02-25 20:29:43,598 Stage-3 map = 0%, reduce = 0% Hadoop job information for Stage-9: number of mappers: 1; number of reducers: 1 2022-02-25 20:29:43,634 Stage-9 map = 0%, reduce = 0% Hadoop job information for Stage-7: number of mappers: 1; number of reducers: 1 2022-02-25 20:29:43,658 Stage-7 map = 0%, reduce = 0% Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2022-02-25 20:29:44,646 Stage-1 map = 0%, reduce = 0% 2022-02-25 20:29:51,767 Stage-9 map = 100%, reduce = 0%, Cumulative CPU 5.29 sec 2022-02-25 20:29:51,782 Stage-7 map = 100%, reduce = 0%, Cumulative CPU 5.45 sec 2022-02-25 20:29:52,750 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 6.06 sec 2022-02-25 20:29:54,835 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.76 sec 2022-02-25 20:29:58,872 Stage-9 map = 100%, reduce = 100%, Cumulative CPU 7.49 sec 2022-02-25 20:29:58,883 Stage-7 map = 100%, reduce = 100%, Cumulative CPU 8.86 sec 2022-02-25 20:29:59,868 Stage-3 map = 100%, reduce = 100%, Cumulative CPU 9.96 sec MapReduce Total cumulative CPU time: 7 seconds 490 msec Ended Job = job_1645755235953_36463 MapReduce Total cumulative CPU time: 8 seconds 860 msec Ended Job = job_1645755235953_36461 Stage-15 is selected by condition resolver. Stage-8 is filtered out by condition resolver. MapReduce Total cumulative CPU time: 9 seconds 960 msec Ended Job = job_1645755235953_36462 Launching Job 6 out of 11 FAILED: Hive Internal Error: java.util.ConcurrentModificationException(null) java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1387) at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:2910) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.initialize(ExecDriver.java:178) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2649) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) at
[jira] [Assigned] (HIVE-25984) when set hive.auto.convert.join=true; and set hive.exec.parallel=true; in the case cause error
[ https://issues.apache.org/jira/browse/HIVE-25984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lkl reassigned HIVE-25984: -- Assignee: lkl > when set hive.auto.convert.join=true; and set hive.exec.parallel=true; in the > case cause error > -- > > Key: HIVE-25984 > URL: https://issues.apache.org/jira/browse/HIVE-25984 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.0.0, 3.1.1, 3.1.2 >Reporter: lkl >Assignee: lkl >Priority: Major > Fix For: All Versions > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25970) Missing messages in HS2 operation logs
[ https://issues.apache.org/jira/browse/HIVE-25970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498120#comment-17498120 ] Zoltan Haindrich commented on HIVE-25970: - we just talked with [~zabetak]; and HIVE-24590 makes HIVE-22753 unneccessary - and it may only cause trouble (lost messages) > Missing messages in HS2 operation logs > -- > > Key: HIVE-25970 > URL: https://issues.apache.org/jira/browse/HIVE-25970 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > After HIVE-22753 & HIVE-24590, with some unlucky timing of events, operation > log messages can get lost and never appear in the appropriate files. > The changes in HIVE-22753 will prevent a {{HushableRandomAccessFileAppender}} > from being created if the latter refers to a file that has been closed in the > last second. Preventing the creation of the appender also means that the > message which triggered the creation will be lost forever. In fact any > message (for the same query) that comes in the interval of 1 second will be > lost forever. > Before HIVE-24590 the appender/file was closed only once (explicitly by HS2) > and thus the problem may be very hard to notice in practice. However, with > the arrival of HIVE-24590 appenders may close much more frequently (and not > via HS2) making the issue reproducible rather easily. It suffices to set > _hive.server2.operation.log.purgePolicy.timeToLive_ property very low and > check the operation logs. > The problem was discovered by investigating some intermittent failures in > operation logging tests (e.g., TestOperationLoggingAPIWithTez). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25984) when set hive.auto.convert.join=true; and set hive.exec.parallel=true; in the case cause error
[ https://issues.apache.org/jira/browse/HIVE-25984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lkl updated HIVE-25984: --- Description: {code:java} > set hive.exec.parallel=true; hive> set hive.exec.parallel.thread.number=16; Query ID = hadoop_20220225202936_1afb51d0-ce67-4bc2-9794-8c82b32efe99 Total jobs = 11 Launching Job 1 out of 11 Launching Job 2 out of 11 Launching Job 3 out of 11 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: Number of reduce tasks not specified. Estimated from input data size: 1 set hive.exec.reducers.max= In order to change the average load for a reducer (in bytes): In order to set a constant number of reducers: set hive.exec.reducers.bytes.per.reducer= set mapreduce.job.reduces= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Launching Job 4 out of 11 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Job = job_1645755235953_36462, Tracking URL = http://172.21.126.228:5004/proxy/application_1645755235953_36462/ Kill Command = /usr/local/service/hadoop/bin/mapred job -kill job_1645755235953_36462 Starting Job = job_1645755235953_36460, Tracking URL = http://172.21.126.228:5004/proxy/application_1645755235953_36460/ Starting Job = job_1645755235953_36463, Tracking URL = http://172.21.126.228:5004/proxy/application_1645755235953_36463/ Kill Command = /usr/local/service/hadoop/bin/mapred job -kill job_1645755235953_36460 Kill Command = /usr/local/service/hadoop/bin/mapred job -kill job_1645755235953_36463 Starting Job = job_1645755235953_36461, Tracking URL = http://172.21.126.228:5004/proxy/application_1645755235953_36461/ Kill Command = /usr/local/service/hadoop/bin/mapred job -kill job_1645755235953_36461 Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1 2022-02-25 20:29:43,598 Stage-3 map = 0%, reduce = 0% Hadoop job information for Stage-9: number of mappers: 1; number of reducers: 1 2022-02-25 20:29:43,634 Stage-9 map = 0%, reduce = 0% Hadoop job information for Stage-7: number of mappers: 1; number of reducers: 1 2022-02-25 20:29:43,658 Stage-7 map = 0%, reduce = 0% Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2022-02-25 20:29:44,646 Stage-1 map = 0%, reduce = 0% 2022-02-25 20:29:51,767 Stage-9 map = 100%, reduce = 0%, Cumulative CPU 5.29 sec 2022-02-25 20:29:51,782 Stage-7 map = 100%, reduce = 0%, Cumulative CPU 5.45 sec 2022-02-25 20:29:52,750 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 6.06 sec 2022-02-25 20:29:54,835 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.76 sec 2022-02-25 20:29:58,872 Stage-9 map = 100%, reduce = 100%, Cumulative CPU 7.49 sec 2022-02-25 20:29:58,883 Stage-7 map = 100%, reduce = 100%, Cumulative CPU 8.86 sec 2022-02-25 20:29:59,868 Stage-3 map = 100%, reduce = 100%, Cumulative CPU 9.96 sec MapReduce Total cumulative CPU time: 7 seconds 490 msec Ended Job = job_1645755235953_36463 MapReduce Total cumulative CPU time: 8 seconds 860 msec Ended Job = job_1645755235953_36461 Stage-15 is selected by condition resolver. Stage-8 is filtered out by condition resolver. MapReduce Total cumulative CPU time: 9 seconds 960 msec Ended Job = job_1645755235953_36462 Launching Job 6 out of 11 FAILED: Hive Internal Error: java.util.ConcurrentModificationException(null) java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1387) at org.apache.hadoop.conf.Configuration.iterator(Configuration.java:2910) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.initialize(ExecDriver.java:178) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2649) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157) at
[jira] [Commented] (HIVE-25984) when set hive.auto.convert.join=true; and set hive.exec.parallel=true; in the case cause error
[ https://issues.apache.org/jira/browse/HIVE-25984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498113#comment-17498113 ] lkl commented on HIVE-25984: set hive.auto.convert.join=false; set hive.exec.parallel=true; change param value can run success. > when set hive.auto.convert.join=true; and set hive.exec.parallel=true; in the > case cause error > -- > > Key: HIVE-25984 > URL: https://issues.apache.org/jira/browse/HIVE-25984 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 3.0.0, 3.1.1, 3.1.2 >Reporter: lkl >Priority: Major > > {code:java} > > set hive.exec.parallel=true; > hive> set hive.exec.parallel.thread.number=16; > hive> ADD JAR > ofs://f4muzj1eelr-SyDy.chdfs.ap-beijing.myqcloud.com/datam/dota-archive-ningxia/dota/emr-steps/bigdata-dw-udf-0.0.1-SNAPSHOT-jar-with-dependencies.jar; > Added > [/data/emr/hive/tmp/2fbfd169-5bd0-4a63-922a-a25e88737375_resources/bigdata-dw-udf-0.0.1-SNAPSHOT-jar-with-dependencies.jar] > to class path > Added resources: > [ofs://f4muzj1eelr-SyDy.chdfs.ap-beijing.myqcloud.com/datam/dota-archive-ningxia/dota/emr-steps/bigdata-dw-udf-0.0.1-SNAPSHOT-jar-with-dependencies.jar] > hive> > > --INSERT OVERWRITE TABLE mgdm.dm_log_weixin_sdk_playtime_hour > PARTITION(pday=20220212,phour='08',pbid='weixin') > > select > > a.ip as ip, -- ip > > a.isp_id as isp_id, -- 运营商ID > > a.isp as isp, -- 运营商名称 > > a.country_id as country_id, -- 国家id > > a.country as country, -- 国家名称 > > a.is_domestic as is_domestic, -- > > a.province_id as province_id, -- 省份ID > > a.province as province, -- 省份名称 > > a.city_id as city_id, -- 城市ID > > a.city as city, -- 城市名称 > > a.did , > > a.sessionid , > > a.uuid , > > a.uvip , > > a.url , > > a.ver , > > a.suuid , > > a.termid , > > a.pix , > > a.bid , > > a.sdkver , > > a.`from` , > > a.pay , > > a.pt , > > a.cpt , > > a.plid , > > a.istry , > > a.def , > > a.ap , > > a.pstatus , > > a.cdnip , > > a.cp , > > a.bdid , > > a.bsid , > > a.cf , > > a.cid , > > a.idx , > > a.vts , > > a.td , > > a.unionid , > > a.src , > > a.ct , > > a.ht , > > a.clip_id , > > a.part_id , > > a.class_id , > > a.is_full , > > a.duration , > > IF(b.play_time>4000, 4000, IF(b.play_time > 0, b.play_time, 0)) > > as playtime, -- 播放时长 > > current_timestamp() as fetl_time -- etl时间 > > from (select a.* > > from (select a.* > > from (select a.*, > > row_number() over(partition by suuid, > pday, phour order by event_time desc) rn > > from mgdw.dw_log_weixin_sdk_hb_hour a > > where pday = 20220212 > > and phour = '08' > > and pbid = 'weixin' > > and suuid is not null > > and logtype='hb') a > > where rn = 1) a) a > > left join (select a.pday, > > a.phour, > > a.suuid, > > ceil(a.play_hb_time - coalesce(buffer_play_time, > 0)) as play_time > > from (select a.pday, > > a.phour, > > a.suuid, > > sum(play_hb_time) as play_hb_time > > from (select a.pday, > > a.phour, > > a.suuid, > > case > > when idx = min_idx then > > if(unix_timestamp(event_time) - > > unix_timestamp(min_stime) > > hb_time, > > hb_time, > > unix_timestamp(event_time) - > > unix_timestamp(min_stime)) > > when idx = max_idx then > > if(unix_timestamp(event_time) - > > unix_timestamp(pre_time) > > hb_time, > > hb_time, > >
[jira] [Commented] (HIVE-24905) only CURRENT ROW end frame is supported for RANGE
[ https://issues.apache.org/jira/browse/HIVE-24905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498059#comment-17498059 ] Stamatis Zampetakis commented on HIVE-24905: Since {{RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING}} is not vectorized at the moment there is a hack in [ASTConverter|https://github.com/apache/hive/blob/2a1a73f665eee497ebdb0745ab2c31c1614de017/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java#L724] to transform RANGE to ROWS (i.e.,g {{ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING}}) when the window is unbounded since they are equivalent. When this issue is resolved we could remove the respective code in ASTConverter. > only CURRENT ROW end frame is supported for RANGE > - > > Key: HIVE-24905 > URL: https://issues.apache.org/jira/browse/HIVE-24905 > Project: Hive > Issue Type: Sub-task >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > > This one is about to take care of vectorizing the FOLLOWING rows case: > {code} > avg(p_retailprice) over(partition by p_mfgr order by p_date range between 1 > preceding and 3 following) as avg1, > {code} > {code} > Reduce Vectorization: > enabled: true > enableConditionsMet: hive.vectorized.execution.reduce.enabled > IS true, hive.execution.engine tez IN [tez, spark] IS true > notVectorizedReason: PTF operator: count only CURRENT ROW end > frame is supported for RANGE > vectorized: false > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25981) Avoid checking for archived parts in analyze table
[ https://issues.apache.org/jira/browse/HIVE-25981?focusedWorklogId=732984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-732984 ] ASF GitHub Bot logged work on HIVE-25981: - Author: ASF GitHub Bot Created on: 25/Feb/22 10:03 Start Date: 25/Feb/22 10:03 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #3052: URL: https://github.com/apache/hive/pull/3052#discussion_r814635583 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -13131,7 +13131,8 @@ public void validate() throws SemanticException { LOG.debug("validated " + usedp.getName()); LOG.debug(usedp.getTable().getTableName()); WriteEntity.WriteType writeType = writeEntity.getWriteType(); - if (writeType != WriteType.UPDATE && writeType != WriteType.DELETE) { + if (writeType != WriteType.UPDATE && writeType != WriteType.DELETE && writeType != WriteType.DDL_NO_LOCK Review comment: Yeah, checking `AcidUtils.isTransactionalTable(tbl)` seems more reasonable than depending on the `WriteType` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 732984) Time Spent: 50m (was: 40m) > Avoid checking for archived parts in analyze table > -- > > Key: HIVE-25981 > URL: https://issues.apache.org/jira/browse/HIVE-25981 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Rajesh Balamohan >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Analyze table on large partitioned table is expensive due to unwanted checks > on archived data. > > {noformat} > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:3908) > - locked <0x0003d4c4c070> (a > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler) > at com.sun.proxy.$Proxy56.listPartitionsWithAuthInfo(Unknown Source) > at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:3845) > at > org.apache.hadoop.hive.ql.exec.ArchiveUtils.conflictingArchiveNameOrNull(ArchiveUtils.java:299) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:13579) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:241) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:196) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:615) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:561) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:555) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127) > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204) > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:265) > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:285) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25981) Avoid checking for archived parts in analyze table
[ https://issues.apache.org/jira/browse/HIVE-25981?focusedWorklogId=732983=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-732983 ] ASF GitHub Bot logged work on HIVE-25981: - Author: ASF GitHub Bot Created on: 25/Feb/22 10:00 Start Date: 25/Feb/22 10:00 Worklog Time Spent: 10m Work Description: rbalamohan commented on a change in pull request #3052: URL: https://github.com/apache/hive/pull/3052#discussion_r814632395 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -13131,7 +13131,8 @@ public void validate() throws SemanticException { LOG.debug("validated " + usedp.getName()); LOG.debug(usedp.getTable().getTableName()); WriteEntity.WriteType writeType = writeEntity.getWriteType(); - if (writeType != WriteType.UPDATE && writeType != WriteType.DELETE) { + if (writeType != WriteType.UPDATE && writeType != WriteType.DELETE && writeType != WriteType.DDL_NO_LOCK Review comment: I am fine with removing this completely for "transactionalInQuery", given this is never used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 732983) Time Spent: 40m (was: 0.5h) > Avoid checking for archived parts in analyze table > -- > > Key: HIVE-25981 > URL: https://issues.apache.org/jira/browse/HIVE-25981 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Rajesh Balamohan >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Analyze table on large partitioned table is expensive due to unwanted checks > on archived data. > > {noformat} > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:3908) > - locked <0x0003d4c4c070> (a > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler) > at com.sun.proxy.$Proxy56.listPartitionsWithAuthInfo(Unknown Source) > at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:3845) > at > org.apache.hadoop.hive.ql.exec.ArchiveUtils.conflictingArchiveNameOrNull(ArchiveUtils.java:299) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:13579) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:241) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:196) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:615) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:561) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:555) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127) > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204) > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:265) > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:285) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25981) Avoid checking for archived parts in analyze table
[ https://issues.apache.org/jira/browse/HIVE-25981?focusedWorklogId=732947=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-732947 ] ASF GitHub Bot logged work on HIVE-25981: - Author: ASF GitHub Bot Created on: 25/Feb/22 08:57 Start Date: 25/Feb/22 08:57 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #3052: URL: https://github.com/apache/hive/pull/3052#discussion_r814586334 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ## @@ -13131,7 +13131,8 @@ public void validate() throws SemanticException { LOG.debug("validated " + usedp.getName()); LOG.debug(usedp.getTable().getTableName()); WriteEntity.WriteType writeType = writeEntity.getWriteType(); - if (writeType != WriteType.UPDATE && writeType != WriteType.DELETE) { + if (writeType != WriteType.UPDATE && writeType != WriteType.DELETE && writeType != WriteType.DDL_NO_LOCK Review comment: Why we think that the WriteType defines whether we need to check for archived parts, or not? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 732947) Time Spent: 0.5h (was: 20m) > Avoid checking for archived parts in analyze table > -- > > Key: HIVE-25981 > URL: https://issues.apache.org/jira/browse/HIVE-25981 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Rajesh Balamohan >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Analyze table on large partitioned table is expensive due to unwanted checks > on archived data. > > {noformat} > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:3908) > - locked <0x0003d4c4c070> (a > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler) > at com.sun.proxy.$Proxy56.listPartitionsWithAuthInfo(Unknown Source) > at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:3845) > at > org.apache.hadoop.hive.ql.exec.ArchiveUtils.conflictingArchiveNameOrNull(ArchiveUtils.java:299) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.validate(SemanticAnalyzer.java:13579) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:241) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:196) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:615) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:561) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:555) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:127) > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204) > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:265) > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:285) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25980) Support HiveMetaStoreChecker.checkTable operation with multi-threaded
[ https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=732944=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-732944 ] ASF GitHub Bot logged work on HIVE-25980: - Author: ASF GitHub Bot Created on: 25/Feb/22 08:54 Start Date: 25/Feb/22 08:54 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #3053: URL: https://github.com/apache/hive/pull/3053#discussion_r814583976 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java ## @@ -303,56 +304,132 @@ void checkTable(Table table, PartitionIterable parts, byte[] filterExp, CheckRes if (tablePath == null) { return; } -FileSystem fs = tablePath.getFileSystem(conf); -if (!fs.exists(tablePath)) { +final FileSystem[] fs = {tablePath.getFileSystem(conf)}; +if (!fs[0].exists(tablePath)) { result.getTablesNotOnFs().add(table.getTableName()); return; } Set partPaths = new HashSet<>(); -// check that the partition folders exist on disk -for (Partition partition : parts) { - if (partition == null) { -// most likely the user specified an invalid partition -continue; - } - Path partPath = getDataLocation(table, partition); - if (partPath == null) { -continue; - } - fs = partPath.getFileSystem(conf); +int threadCount = MetastoreConf.getIntVar(conf, MetastoreConf.ConfVars.METASTORE_MSCK_FS_HANDLER_THREADS_COUNT); + +final ExecutorService pool = (threadCount > 1) ? +Executors.newFixedThreadPool(threadCount, +new ThreadFactoryBuilder() +.setDaemon(true) +.setNameFormat("CheckTable-PartitionOptimizer-%d").build()) : null; - CheckResult.PartitionResult prFromMetastore = new CheckResult.PartitionResult(); - prFromMetastore.setPartitionName(getPartitionName(table, partition)); - prFromMetastore.setTableName(partition.getTableName()); - if (!fs.exists(partPath)) { -result.getPartitionsNotOnFs().add(prFromMetastore); +try { + Queue> futures = new LinkedList<>(); + if (pool != null) { +// check that the partition folders exist on disk using multi-thread +for (Partition partition : parts) { Review comment: I think this will fetch all of the partitions from the partition iterator immediately and keep them in memory. The goal was with the partition iterator to prevent OOM when there are big tables with huge number of partitions. We do not want every partition in the memory once, so the iterator fetched them in batches, and after we did not use them we let the GC take care of the batch. With this change I expect that we create a `Future` immediately for all of the partitions and we will keep all of the partitions in memory until all of the checks are finished. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 732944) Time Spent: 20m (was: 10m) > Support HiveMetaStoreChecker.checkTable operation with multi-threaded > - > > Key: HIVE-25980 > URL: https://issues.apache.org/jira/browse/HIVE-25980 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 3.1.2, 4.0.0 >Reporter: Chiran Ravani >Assignee: Chiran Ravani >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > MSCK Repair table for high partition table can perform slow on Cloud Storage > such as S3, one of the case we found where slowness was observed in > HiveMetaStoreChecker.checkTable. > {code:java} > "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 > tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000] >java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:171) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at > sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464) > at > sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68) > at >