[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions
[ https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459665 ] ASF GitHub Bot logged work on HIVE-23560: - Author: ASF GitHub Bot Created on: 16/Jul/20 05:24 Start Date: 16/Jul/20 05:24 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1232: URL: https://github.com/apache/hive/pull/1232#discussion_r455506934 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java ## @@ -274,6 +277,120 @@ public void testAcidTablesBootstrapWithOpenTxnsTimeout() throws Throwable { verifyCompactionQueue(tables, replicatedDbName, replicaConf); } + @Test + public void testAcidTablesBootstrapWithOpenTxnsDiffDb() throws Throwable { Review comment: Yes, the cases are getting tested in isolation. We should also have them in the one test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459665) Time Spent: 2h (was: 1h 50m) > Optimize bootstrap dump to abort only write Transactions > > > Key: HIVE-23560 > URL: https://issues.apache.org/jira/browse/HIVE-23560 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize > bootstrap dump to avoid aborting all transactions.pdf > > Time Spent: 2h > Remaining Estimate: 0h > > Currently before doing a bootstrap dump, we abort all open transactions after > waiting for a configured time. We are proposing to abort only write > transactions for the db under replication and leave the read and repl created > transactions as is. > This doc attached talks about it in detail -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions
[ https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459663=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459663 ] ASF GitHub Bot logged work on HIVE-23560: - Author: ASF GitHub Bot Created on: 16/Jul/20 05:14 Start Date: 16/Jul/20 05:14 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1232: URL: https://github.com/apache/hive/pull/1232#discussion_r455514405 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java ## @@ -274,6 +277,120 @@ public void testAcidTablesBootstrapWithOpenTxnsTimeout() throws Throwable { verifyCompactionQueue(tables, replicatedDbName, replicaConf); } + @Test Review comment: testAcidTablesBootstrapDuringIncrementalWithOpenTxnsTimeout this test is already added. Is there anything else missing? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459663) Time Spent: 1h 50m (was: 1h 40m) > Optimize bootstrap dump to abort only write Transactions > > > Key: HIVE-23560 > URL: https://issues.apache.org/jira/browse/HIVE-23560 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize > bootstrap dump to avoid aborting all transactions.pdf > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently before doing a bootstrap dump, we abort all open transactions after > waiting for a configured time. We are proposing to abort only write > transactions for the db under replication and leave the read and repl created > transactions as is. > This doc attached talks about it in detail -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cancel background
[ https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=459662=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459662 ] ASF GitHub Bot logged work on HIVE-23727: - Author: ASF GitHub Bot Created on: 16/Jul/20 05:13 Start Date: 16/Jul/20 05:13 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #1149: URL: https://github.com/apache/hive/pull/1149 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459662) Time Spent: 3h 20m (was: 3h 10m) > Improve SQLOperation log handling when cancel background > > > Key: HIVE-23727 > URL: https://issues.apache.org/jira/browse/HIVE-23727 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > > The SQLOperation checks _if (shouldRunAsync() && state != > OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the > background task. If true, the state should not be OperationState.CANCELED, so > logging under the state == OperationState.CANCELED should never happen. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cancel background
[ https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=459661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459661 ] ASF GitHub Bot logged work on HIVE-23727: - Author: ASF GitHub Bot Created on: 16/Jul/20 05:13 Start Date: 16/Jul/20 05:13 Worklog Time Spent: 10m Work Description: dengzhhu653 closed pull request #1149: URL: https://github.com/apache/hive/pull/1149 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459661) Time Spent: 3h 10m (was: 3h) > Improve SQLOperation log handling when cancel background > > > Key: HIVE-23727 > URL: https://issues.apache.org/jira/browse/HIVE-23727 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > The SQLOperation checks _if (shouldRunAsync() && state != > OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the > background task. If true, the state should not be OperationState.CANCELED, so > logging under the state == OperationState.CANCELED should never happen. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions
[ https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459654 ] ASF GitHub Bot logged work on HIVE-23560: - Author: ASF GitHub Bot Created on: 16/Jul/20 04:44 Start Date: 16/Jul/20 04:44 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1232: URL: https://github.com/apache/hive/pull/1232#discussion_r455506934 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java ## @@ -274,6 +277,120 @@ public void testAcidTablesBootstrapWithOpenTxnsTimeout() throws Throwable { verifyCompactionQueue(tables, replicatedDbName, replicaConf); } + @Test + public void testAcidTablesBootstrapWithOpenTxnsDiffDb() throws Throwable { Review comment: Yes, the cases are getting testes in isolation. We should also have them in the one test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459654) Time Spent: 1h 40m (was: 1.5h) > Optimize bootstrap dump to abort only write Transactions > > > Key: HIVE-23560 > URL: https://issues.apache.org/jira/browse/HIVE-23560 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize > bootstrap dump to avoid aborting all transactions.pdf > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently before doing a bootstrap dump, we abort all open transactions after > waiting for a configured time. We are proposing to abort only write > transactions for the db under replication and leave the read and repl created > transactions as is. > This doc attached talks about it in detail -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions
[ https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459641 ] ASF GitHub Bot logged work on HIVE-23560: - Author: ASF GitHub Bot Created on: 16/Jul/20 04:19 Start Date: 16/Jul/20 04:19 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1232: URL: https://github.com/apache/hive/pull/1232#discussion_r455500572 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java ## @@ -274,6 +277,120 @@ public void testAcidTablesBootstrapWithOpenTxnsTimeout() throws Throwable { verifyCompactionQueue(tables, replicatedDbName, replicaConf); } + @Test + public void testAcidTablesBootstrapWithOpenTxnsDiffDb() throws Throwable { Review comment: for primary db abort txns there is already a different test. The txns gets aborted. This test is for secondary db which is not under replication This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459641) Time Spent: 1.5h (was: 1h 20m) > Optimize bootstrap dump to abort only write Transactions > > > Key: HIVE-23560 > URL: https://issues.apache.org/jira/browse/HIVE-23560 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize > bootstrap dump to avoid aborting all transactions.pdf > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently before doing a bootstrap dump, we abort all open transactions after > waiting for a configured time. We are proposing to abort only write > transactions for the db under replication and leave the read and repl created > transactions as is. > This doc attached talks about it in detail -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23726) Create table may throw MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string)
[ https://issues.apache.org/jira/browse/HIVE-23726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam resolved HIVE-23726. -- Fix Version/s: 4.0.0 Resolution: Fixed Fix has been committed to master. Thanks for the review [~samuelan] > Create table may throw > MetaException(message:java.lang.IllegalArgumentException: Can not create a > Path from a null string) > -- > > Key: HIVE-23726 > URL: https://issues.apache.org/jira/browse/HIVE-23726 > Project: Hive > Issue Type: Bug >Reporter: Istvan Fajth >Assignee: Naveen Gangam >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > - Given: > metastore.warehouse.tenant.colocation is set to true > a test database was created as {{create database test location '/data'}} > - When: > I try to create a table as {{create table t1 (a int) location '/data/t1'}} > - Then: > The create table fails with the following exception: > {code} > org.apache.hadoop.hive.ql.metadata.HiveException: > MetaException(message:java.lang.IllegalArgumentException: Can not create a > Path from a null string) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1138) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1143) > at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:148) > at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98) > at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: > java.lang.IllegalArgumentException: Can not create a Path from a null string > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63325) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63293) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result.read(ThriftHiveMetastore.java:63219) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_req(ThriftHiveMetastore.java:1780) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_req(ThriftHiveMetastore.java:1767) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:3518) > at >
[jira] [Work logged] (HIVE-23726) Create table may throw MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string)
[ https://issues.apache.org/jira/browse/HIVE-23726?focusedWorklogId=459632=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459632 ] ASF GitHub Bot logged work on HIVE-23726: - Author: ASF GitHub Bot Created on: 16/Jul/20 03:47 Start Date: 16/Jul/20 03:47 Worklog Time Spent: 10m Work Description: nrg4878 closed pull request #1198: URL: https://github.com/apache/hive/pull/1198 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459632) Time Spent: 20m (was: 10m) > Create table may throw > MetaException(message:java.lang.IllegalArgumentException: Can not create a > Path from a null string) > -- > > Key: HIVE-23726 > URL: https://issues.apache.org/jira/browse/HIVE-23726 > Project: Hive > Issue Type: Bug >Reporter: Istvan Fajth >Assignee: Naveen Gangam >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > - Given: > metastore.warehouse.tenant.colocation is set to true > a test database was created as {{create database test location '/data'}} > - When: > I try to create a table as {{create table t1 (a int) location '/data/t1'}} > - Then: > The create table fails with the following exception: > {code} > org.apache.hadoop.hive.ql.metadata.HiveException: > MetaException(message:java.lang.IllegalArgumentException: Can not create a > Path from a null string) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1138) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1143) > at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:148) > at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98) > at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: > java.lang.IllegalArgumentException: Can not create a Path from a null string > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63325) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63293) > at >
[jira] [Work logged] (HIVE-23726) Create table may throw MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string)
[ https://issues.apache.org/jira/browse/HIVE-23726?focusedWorklogId=459633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459633 ] ASF GitHub Bot logged work on HIVE-23726: - Author: ASF GitHub Bot Created on: 16/Jul/20 03:47 Start Date: 16/Jul/20 03:47 Worklog Time Spent: 10m Work Description: nrg4878 commented on pull request #1198: URL: https://github.com/apache/hive/pull/1198#issuecomment-659140632 committed to master This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459633) Time Spent: 0.5h (was: 20m) > Create table may throw > MetaException(message:java.lang.IllegalArgumentException: Can not create a > Path from a null string) > -- > > Key: HIVE-23726 > URL: https://issues.apache.org/jira/browse/HIVE-23726 > Project: Hive > Issue Type: Bug >Reporter: Istvan Fajth >Assignee: Naveen Gangam >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > - Given: > metastore.warehouse.tenant.colocation is set to true > a test database was created as {{create database test location '/data'}} > - When: > I try to create a table as {{create table t1 (a int) location '/data/t1'}} > - Then: > The create table fails with the following exception: > {code} > org.apache.hadoop.hive.ql.metadata.HiveException: > MetaException(message:java.lang.IllegalArgumentException: Can not create a > Path from a null string) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1138) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1143) > at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:148) > at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98) > at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: > java.lang.IllegalArgumentException: Can not create a Path from a null string > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63325) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63293) > at
[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions
[ https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459630=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459630 ] ASF GitHub Bot logged work on HIVE-23560: - Author: ASF GitHub Bot Created on: 16/Jul/20 03:40 Start Date: 16/Jul/20 03:40 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1232: URL: https://github.com/apache/hive/pull/1232#discussion_r455491023 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -986,7 +1019,8 @@ String getValidTxnListForReplDump(Hive hiveDb, long waitUntilTime) throws HiveEx // phase won't be able to replicate those txns. So, the logic is to wait for the given amount // of time to see if all open txns < current txn is getting aborted/committed. If not, then // we forcefully abort those txns just like AcidHouseKeeperService. -ValidTxnList validTxnList = getTxnMgr().getValidTxns(); +//Exclude readonly and repl created tranasactions +ValidTxnList validTxnList = getTxnMgr().getValidTxns(Arrays.asList(TxnType.READ_ONLY, TxnType.REPL_CREATED)); Review comment: Sorry, this comment was for Line number:1036. And I am referring to the list of excludes: Arrays.asList(TxnType.READ_ONLY, TxnType.REPL_CREATED) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459630) Time Spent: 1h 20m (was: 1h 10m) > Optimize bootstrap dump to abort only write Transactions > > > Key: HIVE-23560 > URL: https://issues.apache.org/jira/browse/HIVE-23560 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize > bootstrap dump to avoid aborting all transactions.pdf > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently before doing a bootstrap dump, we abort all open transactions after > waiting for a configured time. We are proposing to abort only write > transactions for the db under replication and leave the read and repl created > transactions as is. > This doc attached talks about it in detail -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions
[ https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459628 ] ASF GitHub Bot logged work on HIVE-23560: - Author: ASF GitHub Bot Created on: 16/Jul/20 03:38 Start Date: 16/Jul/20 03:38 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1232: URL: https://github.com/apache/hive/pull/1232#discussion_r455490444 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/BaseReplicationScenariosAcidTables.java ## @@ -336,18 +346,30 @@ void verifyInc2Load(String dbName, String lastReplId) return txns; } - void allocateWriteIdsForTables(String primaryDbName, Map tables, - TxnStore txnHandler, - List txns, HiveConf primaryConf) throws Throwable { + List allocateWriteIdsForTablesAndAquireLocks(String primaryDbName, Map tables, + TxnStore txnHandler, + List txns, HiveConf primaryConf) throws Throwable { AllocateTableWriteIdsRequest rqst = new AllocateTableWriteIdsRequest(); rqst.setDbName(primaryDbName); - +List lockIds = new ArrayList<>(); for(Map.Entry entry : tables.entrySet()) { rqst.setTableName(entry.getKey()); rqst.setTxnIds(txns); txnHandler.allocateTableWriteIds(rqst); + for (long txnId : txns) { +LockComponent comp = new LockComponent(LockType.SHARED_WRITE, LockLevel.TABLE, + primaryDbName); +comp.setTablename(entry.getKey()); +comp.setOperationType(DataOperationType.UPDATE); +List components = new ArrayList(1); +components.add(comp); +LockRequest lockRequest = new LockRequest(components, "u1", "hostname"); +lockRequest.setTxnid(txnId); +lockIds.add(txnHandler.lock(lockRequest).getLockid()); Review comment: because if you just do a open txn, it will not acquire a lock This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459628) Time Spent: 1h 10m (was: 1h) > Optimize bootstrap dump to abort only write Transactions > > > Key: HIVE-23560 > URL: https://issues.apache.org/jira/browse/HIVE-23560 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize > bootstrap dump to avoid aborting all transactions.pdf > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently before doing a bootstrap dump, we abort all open transactions after > waiting for a configured time. We are proposing to abort only write > transactions for the db under replication and leave the read and repl created > transactions as is. > This doc attached talks about it in detail -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions
[ https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459627 ] ASF GitHub Bot logged work on HIVE-23560: - Author: ASF GitHub Bot Created on: 16/Jul/20 03:37 Start Date: 16/Jul/20 03:37 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1232: URL: https://github.com/apache/hive/pull/1232#discussion_r455490260 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -999,20 +1033,27 @@ String getValidTxnListForReplDump(Hive hiveDb, long waitUntilTime) throws HiveEx } catch (InterruptedException e) { LOG.info("REPL DUMP thread sleep interrupted", e); } - validTxnList = getTxnMgr().getValidTxns(); + validTxnList = getTxnMgr().getValidTxns(Arrays.asList(TxnType.READ_ONLY, TxnType.REPL_CREATED)); } // After the timeout just force abort the open txns -List openTxns = getOpenTxns(validTxnList); -if (!openTxns.isEmpty()) { - hiveDb.abortTransactions(openTxns); - validTxnList = getTxnMgr().getValidTxns(); - if (validTxnList.getMinOpenTxn() != null) { -openTxns = getOpenTxns(validTxnList); -LOG.warn("REPL DUMP unable to force abort all the open txns: {} after timeout due to unknown reasons. " + -"However, this is rare case that shouldn't happen.", openTxns); -throw new IllegalStateException("REPL DUMP triggered abort txns failed for unknown reasons."); +if (conf.getBoolVar(REPL_BOOTSTRAP_DUMP_ABORT_WRITE_TXN_AFTER_TIMEOUT)) { + List openTxns = getOpenTxns(validTxnList, work.dbNameOrPattern); + if (!openTxns.isEmpty()) { +//abort only write transactions for the db under replication if abort transactions is enabled. +hiveDb.abortTransactions(openTxns); +validTxnList = getTxnMgr().getValidTxns(Arrays.asList(TxnType.READ_ONLY, TxnType.REPL_CREATED)); Review comment: If we use the already obtained validTxnList we won't know if there are still open txns. This is to check all open txns that were previously open, are aborted and not part of invalid txn list again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459627) Time Spent: 1h (was: 50m) > Optimize bootstrap dump to abort only write Transactions > > > Key: HIVE-23560 > URL: https://issues.apache.org/jira/browse/HIVE-23560 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize > bootstrap dump to avoid aborting all transactions.pdf > > Time Spent: 1h > Remaining Estimate: 0h > > Currently before doing a bootstrap dump, we abort all open transactions after > waiting for a configured time. We are proposing to abort only write > transactions for the db under replication and leave the read and repl created > transactions as is. > This doc attached talks about it in detail -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions
[ https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459626 ] ASF GitHub Bot logged work on HIVE-23560: - Author: ASF GitHub Bot Created on: 16/Jul/20 03:36 Start Date: 16/Jul/20 03:36 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1232: URL: https://github.com/apache/hive/pull/1232#discussion_r455489836 ## File path: ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java ## @@ -186,6 +186,18 @@ void replTableWriteIdState(String validWriteIdList, String dbName, String tableN */ ValidTxnList getValidTxns() throws LockException; + /** + * Get the transactions that are currently valid. The resulting + * {@link ValidTxnList} object can be passed as string to the processing + * tasks for use in the reading the data. This call should be made once up + * front by the planner and should never be called on the backend, + * as this will violate the isolation level semantics. + * @return list of valid transactions. + * @param txnTypes list of transaction types that should be excluded. + * @throws LockException + */ + ValidTxnList getValidTxns(List txnTypes) throws LockException; Review comment: txnTypes is set of 5 enum values. Its not a free flowing list. So filter/regex is not required. Same interface can be used to exclude any txntype by passing in that list. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459626) Time Spent: 50m (was: 40m) > Optimize bootstrap dump to abort only write Transactions > > > Key: HIVE-23560 > URL: https://issues.apache.org/jira/browse/HIVE-23560 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize > bootstrap dump to avoid aborting all transactions.pdf > > Time Spent: 50m > Remaining Estimate: 0h > > Currently before doing a bootstrap dump, we abort all open transactions after > waiting for a configured time. We are proposing to abort only write > transactions for the db under replication and leave the read and repl created > transactions as is. > This doc attached talks about it in detail -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions
[ https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459625 ] ASF GitHub Bot logged work on HIVE-23560: - Author: ASF GitHub Bot Created on: 16/Jul/20 03:35 Start Date: 16/Jul/20 03:35 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1232: URL: https://github.com/apache/hive/pull/1232#discussion_r455489488 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -986,7 +1019,8 @@ String getValidTxnListForReplDump(Hive hiveDb, long waitUntilTime) throws HiveEx // phase won't be able to replicate those txns. So, the logic is to wait for the given amount // of time to see if all open txns < current txn is getting aborted/committed. If not, then // we forcefully abort those txns just like AcidHouseKeeperService. -ValidTxnList validTxnList = getTxnMgr().getValidTxns(); +//Exclude readonly and repl created tranasactions +ValidTxnList validTxnList = getTxnMgr().getValidTxns(Arrays.asList(TxnType.READ_ONLY, TxnType.REPL_CREATED)); Review comment: It is outside the loop. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459625) Time Spent: 40m (was: 0.5h) > Optimize bootstrap dump to abort only write Transactions > > > Key: HIVE-23560 > URL: https://issues.apache.org/jira/browse/HIVE-23560 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize > bootstrap dump to avoid aborting all transactions.pdf > > Time Spent: 40m > Remaining Estimate: 0h > > Currently before doing a bootstrap dump, we abort all open transactions after > waiting for a configured time. We are proposing to abort only write > transactions for the db under replication and leave the read and repl created > transactions as is. > This doc attached talks about it in detail -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions
[ https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459623 ] ASF GitHub Bot logged work on HIVE-23560: - Author: ASF GitHub Bot Created on: 16/Jul/20 03:32 Start Date: 16/Jul/20 03:32 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1232: URL: https://github.com/apache/hive/pull/1232#discussion_r455488814 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -550,6 +550,11 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal "Indicates the timeout for all transactions which are opened before triggering bootstrap REPL DUMP. " + "If these open transactions are not closed within the timeout value, then REPL DUMP will " + "forcefully abort those transactions and continue with bootstrap dump."), + REPL_BOOTSTRAP_DUMP_ABORT_WRITE_TXN_AFTER_TIMEOUT("hive.repl.bootstrap.dump.abort.write.txn.after.timeout", Review comment: this is whether to abort txn after timeout or not flag. This is not timeout value flag. So after is added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459623) Time Spent: 0.5h (was: 20m) > Optimize bootstrap dump to abort only write Transactions > > > Key: HIVE-23560 > URL: https://issues.apache.org/jira/browse/HIVE-23560 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize > bootstrap dump to avoid aborting all transactions.pdf > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently before doing a bootstrap dump, we abort all open transactions after > waiting for a configured time. We are proposing to abort only write > transactions for the db under replication and leave the read and repl created > transactions as is. > This doc attached talks about it in detail -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore
[ https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459595 ] ASF GitHub Bot logged work on HIVE-23815: - Author: ASF GitHub Bot Created on: 16/Jul/20 01:33 Start Date: 16/Jul/20 01:33 Worklog Time Spent: 10m Work Description: xinghuayu007 opened a new pull request #1227: URL: https://github.com/apache/hive/pull/1227 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459595) Time Spent: 2h 50m (was: 2h 40m) > output statistics of underlying datastore > -- > > Key: HIVE-23815 > URL: https://issues.apache.org/jira/browse/HIVE-23815 > Project: Hive > Issue Type: Improvement >Reporter: Rossetti Wong >Assignee: Rossetti Wong >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > This patch provides a way to get the statistics data of metastore's > underlying datastore, like MySQL, Oracle and so on. You can get the number > of datastore reads and writes, the average time of transaction execution, the > total active connection and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore
[ https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459594 ] ASF GitHub Bot logged work on HIVE-23815: - Author: ASF GitHub Bot Created on: 16/Jul/20 01:32 Start Date: 16/Jul/20 01:32 Worklog Time Spent: 10m Work Description: xinghuayu007 closed pull request #1227: URL: https://github.com/apache/hive/pull/1227 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459594) Time Spent: 2h 40m (was: 2.5h) > output statistics of underlying datastore > -- > > Key: HIVE-23815 > URL: https://issues.apache.org/jira/browse/HIVE-23815 > Project: Hive > Issue Type: Improvement >Reporter: Rossetti Wong >Assignee: Rossetti Wong >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > This patch provides a way to get the statistics data of metastore's > underlying datastore, like MySQL, Oracle and so on. You can get the number > of datastore reads and writes, the average time of transaction execution, the > total active connection and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore
[ https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459591 ] ASF GitHub Bot logged work on HIVE-23815: - Author: ASF GitHub Bot Created on: 16/Jul/20 01:01 Start Date: 16/Jul/20 01:01 Worklog Time Spent: 10m Work Description: xinghuayu007 opened a new pull request #1227: URL: https://github.com/apache/hive/pull/1227 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459591) Time Spent: 2.5h (was: 2h 20m) > output statistics of underlying datastore > -- > > Key: HIVE-23815 > URL: https://issues.apache.org/jira/browse/HIVE-23815 > Project: Hive > Issue Type: Improvement >Reporter: Rossetti Wong >Assignee: Rossetti Wong >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > This patch provides a way to get the statistics data of metastore's > underlying datastore, like MySQL, Oracle and so on. You can get the number > of datastore reads and writes, the average time of transaction execution, the > total active connection and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23244) Extract Create View analyzer from SemanticAnalyzer
[ https://issues.apache.org/jira/browse/HIVE-23244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely resolved HIVE-23244. --- Resolution: Fixed > Extract Create View analyzer from SemanticAnalyzer > -- > > Key: HIVE-23244 > URL: https://issues.apache.org/jira/browse/HIVE-23244 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23244.01.patch, HIVE-23244.02.patch, > HIVE-23244.03.patch, HIVE-23244.04.patch, HIVE-23244.05.patch, > HIVE-23244.06.patch, HIVE-23244.07.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Create View commands are not queries, but commands which have queries as a > part of them. Therefore a separate CreateViewAnalyzer is needed which uses > SemanticAnalyer to analyze it's query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23857) Fix HiveParser "code too large" problem
[ https://issues.apache.org/jira/browse/HIVE-23857?focusedWorklogId=459584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459584 ] ASF GitHub Bot logged work on HIVE-23857: - Author: ASF GitHub Bot Created on: 16/Jul/20 00:40 Start Date: 16/Jul/20 00:40 Worklog Time Spent: 10m Work Description: miklosgergely merged pull request #1258: URL: https://github.com/apache/hive/pull/1258 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459584) Time Spent: 50m (was: 40m) > Fix HiveParser "code too large" problem > --- > > Key: HIVE-23857 > URL: https://issues.apache.org/jira/browse/HIVE-23857 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > HivePasrser.g can not be extended anymore as adding any more tokens leads to > a "code too large" problem, because the compiled code size would exceed 65536 > bytes. The real solution would be to introduce anltr4, in the meantime it can > be fixed be moving the tokenNames variable into a separate file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23857) Fix HiveParser "code too large" problem
[ https://issues.apache.org/jira/browse/HIVE-23857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely resolved HIVE-23857. --- Resolution: Fixed Merged to master, thank you [~belugabehr] and [~gopalv] > Fix HiveParser "code too large" problem > --- > > Key: HIVE-23857 > URL: https://issues.apache.org/jira/browse/HIVE-23857 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > HivePasrser.g can not be extended anymore as adding any more tokens leads to > a "code too large" problem, because the compiled code size would exceed 65536 > bytes. The real solution would be to introduce anltr4, in the meantime it can > be fixed be moving the tokenNames variable into a separate file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23857) Fix HiveParser "code too large" problem
[ https://issues.apache.org/jira/browse/HIVE-23857?focusedWorklogId=459530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459530 ] ASF GitHub Bot logged work on HIVE-23857: - Author: ASF GitHub Bot Created on: 15/Jul/20 21:14 Start Date: 15/Jul/20 21:14 Worklog Time Spent: 10m Work Description: miklosgergely commented on a change in pull request #1258: URL: https://github.com/apache/hive/pull/1258#discussion_r455352739 ## File path: parser/bin/fixHiveParser.sh ## @@ -0,0 +1,44 @@ +#!/bin/bash + +# This is a temporary solution for the issue of the "code too large" problem related to HiveParser.java +# We got to a point where adding anything to the antlr files lead to an issue about having a HiveParser.java that can not be compiled due to the compiled code size limitation in java (maximum 65536 bytes), so to avoid it we temorarly add this script to remove the huge tokenNames array into a separate file. +# The real solution would be to switch to antlr 4 + +tokenFile="target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParserTokens.java" +input="target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParser.java" +output="target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParser.java-fixed" + +rm $tokenFile > /dev/null 2>&1 +rm $output > /dev/null 2>&1 + +echo "package org.apache.hadoop.hive.ql.parse;" >> $tokenFile +echo "" >> $tokenFile +echo "public class HiveParserTokens {" >> $tokenFile + +state="STAY" +while IFS= read -r line Review comment: Thank you @t3rmin4t0r, I've modified the patch using awk. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459530) Time Spent: 40m (was: 0.5h) > Fix HiveParser "code too large" problem > --- > > Key: HIVE-23857 > URL: https://issues.apache.org/jira/browse/HIVE-23857 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > HivePasrser.g can not be extended anymore as adding any more tokens leads to > a "code too large" problem, because the compiled code size would exceed 65536 > bytes. The real solution would be to introduce anltr4, in the meantime it can > be fixed be moving the tokenNames variable into a separate file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23850) Allow PPD when subject is not a column with grouping sets present
[ https://issues.apache.org/jira/browse/HIVE-23850?focusedWorklogId=459527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459527 ] ASF GitHub Bot logged work on HIVE-23850: - Author: ASF GitHub Bot Created on: 15/Jul/20 21:07 Start Date: 15/Jul/20 21:07 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #1255: URL: https://github.com/apache/hive/pull/1255#issuecomment-659010617 ...Sorry for this, @jcamachor could you please take a look at? thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459527) Time Spent: 20m (was: 10m) > Allow PPD when subject is not a column with grouping sets present > - > > Key: HIVE-23850 > URL: https://issues.apache.org/jira/browse/HIVE-23850 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > After [HIVE-19653|https://issues.apache.org/jira/browse/HIVE-19653], filters > with only columns and constants are pushed down, but in some cases, this may > not work as well, for example: > SET hive.cbo.enable=false; > SELECT a, b, sum(s) > FROM T1 > GROUP BY a, b GROUPING SETS ((a), (a, b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > SELECT upper(a), b, sum(s) > FROM T1 > GROUP BY upper(a), b GROUPING SETS ((upper(a)), (upper(a), b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > The filters pushed down to GBY can be f(gbyKey) or gbyKey with udf , not > only the column groupby keys. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23857) Fix HiveParser "code too large" problem
[ https://issues.apache.org/jira/browse/HIVE-23857?focusedWorklogId=459505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459505 ] ASF GitHub Bot logged work on HIVE-23857: - Author: ASF GitHub Bot Created on: 15/Jul/20 19:50 Start Date: 15/Jul/20 19:50 Worklog Time Spent: 10m Work Description: t3rmin4t0r commented on a change in pull request #1258: URL: https://github.com/apache/hive/pull/1258#discussion_r455302352 ## File path: parser/bin/fixHiveParser.sh ## @@ -0,0 +1,44 @@ +#!/bin/bash + +# This is a temporary solution for the issue of the "code too large" problem related to HiveParser.java +# We got to a point where adding anything to the antlr files lead to an issue about having a HiveParser.java that can not be compiled due to the compiled code size limitation in java (maximum 65536 bytes), so to avoid it we temorarly add this script to remove the huge tokenNames array into a separate file. +# The real solution would be to switch to antlr 4 + +tokenFile="target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParserTokens.java" +input="target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParser.java" +output="target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParser.java-fixed" + +rm $tokenFile > /dev/null 2>&1 +rm $output > /dev/null 2>&1 + +echo "package org.apache.hadoop.hive.ql.parse;" >> $tokenFile +echo "" >> $tokenFile +echo "public class HiveParserTokens {" >> $tokenFile + +state="STAY" +while IFS= read -r line Review comment: Looks like AWK reinvented in bash This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459505) Time Spent: 0.5h (was: 20m) > Fix HiveParser "code too large" problem > --- > > Key: HIVE-23857 > URL: https://issues.apache.org/jira/browse/HIVE-23857 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > HivePasrser.g can not be extended anymore as adding any more tokens leads to > a "code too large" problem, because the compiled code size would exceed 65536 > bytes. The real solution would be to introduce anltr4, in the meantime it can > be fixed be moving the tokenNames variable into a separate file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23856) Beeline Should Print Binary Data in Base64
[ https://issues.apache.org/jira/browse/HIVE-23856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23856: -- Labels: pull-request-available (was: ) > Beeline Should Print Binary Data in Base64 > -- > > Key: HIVE-23856 > URL: https://issues.apache.org/jira/browse/HIVE-23856 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: Hunter Logan >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Make binary data formatted as Base64 to make it more parse-able by external > applications and easier for humans to convert using a Base64 tool. > https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/Rows.java#L165 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23856) Beeline Should Print Binary Data in Base64
[ https://issues.apache.org/jira/browse/HIVE-23856?focusedWorklogId=459503=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459503 ] ASF GitHub Bot logged work on HIVE-23856: - Author: ASF GitHub Bot Created on: 15/Jul/20 19:39 Start Date: 15/Jul/20 19:39 Worklog Time Spent: 10m Work Description: HunterL opened a new pull request #1261: URL: https://github.com/apache/hive/pull/1261 Fixed Binary data type in beeline rows to encode to Base64 https://issues.apache.org/jira/browse/HIVE-23856 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459503) Remaining Estimate: 0h Time Spent: 10m > Beeline Should Print Binary Data in Base64 > -- > > Key: HIVE-23856 > URL: https://issues.apache.org/jira/browse/HIVE-23856 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: Hunter Logan >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Make binary data formatted as Base64 to make it more parse-able by external > applications and easier for humans to convert using a Base64 tool. > https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/Rows.java#L165 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pravin Sinha updated HIVE-23069: Attachment: HIVE-23069.03.patch > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch, HIVE-23069.02.patch, > HIVE-23069.03.patch > > Time Spent: 6h > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions
[ https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459458 ] ASF GitHub Bot logged work on HIVE-23560: - Author: ASF GitHub Bot Created on: 15/Jul/20 18:16 Start Date: 15/Jul/20 18:16 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1232: URL: https://github.com/apache/hive/pull/1232#discussion_r454865275 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -986,7 +1019,8 @@ String getValidTxnListForReplDump(Hive hiveDb, long waitUntilTime) throws HiveEx // phase won't be able to replicate those txns. So, the logic is to wait for the given amount // of time to see if all open txns < current txn is getting aborted/committed. If not, then // we forcefully abort those txns just like AcidHouseKeeperService. -ValidTxnList validTxnList = getTxnMgr().getValidTxns(); +//Exclude readonly and repl created tranasactions +ValidTxnList validTxnList = getTxnMgr().getValidTxns(Arrays.asList(TxnType.READ_ONLY, TxnType.REPL_CREATED)); Review comment: Create the list once , outside the loop. ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -550,6 +550,11 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal "Indicates the timeout for all transactions which are opened before triggering bootstrap REPL DUMP. " + "If these open transactions are not closed within the timeout value, then REPL DUMP will " + "forcefully abort those transactions and continue with bootstrap dump."), + REPL_BOOTSTRAP_DUMP_ABORT_WRITE_TXN_AFTER_TIMEOUT("hive.repl.bootstrap.dump.abort.write.txn.after.timeout", Review comment: nit: remove 'after' from the config name. ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -999,20 +1033,27 @@ String getValidTxnListForReplDump(Hive hiveDb, long waitUntilTime) throws HiveEx } catch (InterruptedException e) { LOG.info("REPL DUMP thread sleep interrupted", e); } - validTxnList = getTxnMgr().getValidTxns(); + validTxnList = getTxnMgr().getValidTxns(Arrays.asList(TxnType.READ_ONLY, TxnType.REPL_CREATED)); } // After the timeout just force abort the open txns -List openTxns = getOpenTxns(validTxnList); -if (!openTxns.isEmpty()) { - hiveDb.abortTransactions(openTxns); - validTxnList = getTxnMgr().getValidTxns(); - if (validTxnList.getMinOpenTxn() != null) { -openTxns = getOpenTxns(validTxnList); -LOG.warn("REPL DUMP unable to force abort all the open txns: {} after timeout due to unknown reasons. " + -"However, this is rare case that shouldn't happen.", openTxns); -throw new IllegalStateException("REPL DUMP triggered abort txns failed for unknown reasons."); +if (conf.getBoolVar(REPL_BOOTSTRAP_DUMP_ABORT_WRITE_TXN_AFTER_TIMEOUT)) { + List openTxns = getOpenTxns(validTxnList, work.dbNameOrPattern); + if (!openTxns.isEmpty()) { +//abort only write transactions for the db under replication if abort transactions is enabled. +hiveDb.abortTransactions(openTxns); +validTxnList = getTxnMgr().getValidTxns(Arrays.asList(TxnType.READ_ONLY, TxnType.REPL_CREATED)); Review comment: Shouldn't already obtained validTxnList be used here? ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java ## @@ -975,6 +981,33 @@ private String getValidWriteIdList(String dbName, String tblName, String validTx return openTxns; } + List getOpenTxns(ValidTxnList validTxnList, String dbName) throws LockException { +HiveLockManager lockManager = getTxnMgr().getLockManager(); +long[] invalidTxns = validTxnList.getInvalidTransactions(); +List openTxns = new ArrayList<>(); +List dbTxns = new ArrayList<>(); Review comment: Can be replaced with a HashSet for faster lookup. ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java ## @@ -274,6 +277,120 @@ public void testAcidTablesBootstrapWithOpenTxnsTimeout() throws Throwable { verifyCompactionQueue(tables, replicatedDbName, replicaConf); } + @Test + public void testAcidTablesBootstrapWithOpenTxnsDiffDb() throws Throwable { Review comment: Also add a test for a case where few open txn from primary db and few from secondary. During dump txn ids from primary gets aborted but for secondary they are not. ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java ## @@ -274,6 +277,120 @@ public void
[jira] [Work logged] (HIVE-23857) Fix HiveParser "code too large" problem
[ https://issues.apache.org/jira/browse/HIVE-23857?focusedWorklogId=459431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459431 ] ASF GitHub Bot logged work on HIVE-23857: - Author: ASF GitHub Bot Created on: 15/Jul/20 17:32 Start Date: 15/Jul/20 17:32 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1258: URL: https://github.com/apache/hive/pull/1258#issuecomment-658902129 I once looked at this and was trying to avoid this route, but I guess it's all we can do unless someone wants to take up the mantle of going to ANTL4. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459431) Time Spent: 20m (was: 10m) > Fix HiveParser "code too large" problem > --- > > Key: HIVE-23857 > URL: https://issues.apache.org/jira/browse/HIVE-23857 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > HivePasrser.g can not be extended anymore as adding any more tokens leads to > a "code too large" problem, because the compiled code size would exceed 65536 > bytes. The real solution would be to introduce anltr4, in the meantime it can > be fixed be moving the tokenNames variable into a separate file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23853) CRUD based compaction also should update ACID file version metadata
[ https://issues.apache.org/jira/browse/HIVE-23853?focusedWorklogId=459428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459428 ] ASF GitHub Bot logged work on HIVE-23853: - Author: ASF GitHub Bot Created on: 15/Jul/20 17:27 Start Date: 15/Jul/20 17:27 Worklog Time Spent: 10m Work Description: pvary commented on pull request #1256: URL: https://github.com/apache/hive/pull/1256#issuecomment-658899132 Had to recreate the patch request, since I was not able to push my changes after manually merging stuff here :( Please check https://github.com/apache/hive/pull/1259 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459428) Time Spent: 1h 10m (was: 1h) > CRUD based compaction also should update ACID file version metadata > --- > > Key: HIVE-23853 > URL: https://issues.apache.org/jira/browse/HIVE-23853 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Current CRUD compaction does not update the file metadata to contain the ACID > version. Also the {{_orc_acid_version}} version file is not created. > We should do this to be consistent across the board. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23853) CRUD based compaction also should update ACID file version metadata
[ https://issues.apache.org/jira/browse/HIVE-23853?focusedWorklogId=459427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459427 ] ASF GitHub Bot logged work on HIVE-23853: - Author: ASF GitHub Bot Created on: 15/Jul/20 17:27 Start Date: 15/Jul/20 17:27 Worklog Time Spent: 10m Work Description: pvary closed pull request #1256: URL: https://github.com/apache/hive/pull/1256 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459427) Time Spent: 1h (was: 50m) > CRUD based compaction also should update ACID file version metadata > --- > > Key: HIVE-23853 > URL: https://issues.apache.org/jira/browse/HIVE-23853 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Current CRUD compaction does not update the file metadata to contain the ACID > version. Also the {{_orc_acid_version}} version file is not created. > We should do this to be consistent across the board. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23853) CRUD based compaction also should update ACID file version metadata
[ https://issues.apache.org/jira/browse/HIVE-23853?focusedWorklogId=459425=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459425 ] ASF GitHub Bot logged work on HIVE-23853: - Author: ASF GitHub Bot Created on: 15/Jul/20 17:26 Start Date: 15/Jul/20 17:26 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1256: URL: https://github.com/apache/hive/pull/1256#discussion_r455213413 ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -4206,6 +4206,10 @@ private static void copyFiles(final HiveConf conf, final FileSystem destFs, } } files = fileStatuses.toArray(new FileStatus[files.length]); + + if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_WRITE_ACID_VERSION_FILE)) { +AcidUtils.OrcAcidVersion.writeVersionFile(destf, destFs); + } Review comment: It was a my first try to have it in the FSOperator which I have forgot to remove. After FSO we have like 3 moves which I wanted to avoid, so since we already have a Compactor specific change in copyFiles, I have decided to reuse it ## File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java ## @@ -25,9 +25,11 @@ import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; +import java.util.HashSet; Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459425) Time Spent: 50m (was: 40m) > CRUD based compaction also should update ACID file version metadata > --- > > Key: HIVE-23853 > URL: https://issues.apache.org/jira/browse/HIVE-23853 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Current CRUD compaction does not update the file metadata to contain the ACID > version. Also the {{_orc_acid_version}} version file is not created. > We should do this to be consistent across the board. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23853) CRUD based compaction also should update ACID file version metadata
[ https://issues.apache.org/jira/browse/HIVE-23853?focusedWorklogId=459423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459423 ] ASF GitHub Bot logged work on HIVE-23853: - Author: ASF GitHub Bot Created on: 15/Jul/20 17:24 Start Date: 15/Jul/20 17:24 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1256: URL: https://github.com/apache/hive/pull/1256#discussion_r455212090 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java ## @@ -161,6 +163,7 @@ private static synchronized MemoryManager getThreadLocalOrcLlapMemoryManager(fin LlapProxy.isDaemon()) { memory(getThreadLocalOrcLlapMemoryManager(conf)); } + isCompaction = tableProperties != null && "true".equals(tableProperties.getProperty(AcidUtils.COMPACTOR_TABLE_PROPERTY)); Review comment: Done. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459423) Time Spent: 40m (was: 0.5h) > CRUD based compaction also should update ACID file version metadata > --- > > Key: HIVE-23853 > URL: https://issues.apache.org/jira/browse/HIVE-23853 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Current CRUD compaction does not update the file metadata to contain the ACID > version. Also the {{_orc_acid_version}} version file is not created. > We should do this to be consistent across the board. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23853) CRUD based compaction also should update ACID file version metadata
[ https://issues.apache.org/jira/browse/HIVE-23853?focusedWorklogId=459421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459421 ] ASF GitHub Bot logged work on HIVE-23853: - Author: ASF GitHub Bot Created on: 15/Jul/20 17:22 Start Date: 15/Jul/20 17:22 Worklog Time Spent: 10m Work Description: pvary opened a new pull request #1259: URL: https://github.com/apache/hive/pull/1259 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459421) Time Spent: 0.5h (was: 20m) > CRUD based compaction also should update ACID file version metadata > --- > > Key: HIVE-23853 > URL: https://issues.apache.org/jira/browse/HIVE-23853 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Current CRUD compaction does not update the file metadata to contain the ACID > version. Also the {{_orc_acid_version}} version file is not created. > We should do this to be consistent across the board. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23857) Fix HiveParser "code too large" problem
[ https://issues.apache.org/jira/browse/HIVE-23857?focusedWorklogId=459402=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459402 ] ASF GitHub Bot logged work on HIVE-23857: - Author: ASF GitHub Bot Created on: 15/Jul/20 16:58 Start Date: 15/Jul/20 16:58 Worklog Time Spent: 10m Work Description: miklosgergely opened a new pull request #1258: URL: https://github.com/apache/hive/pull/1258 HivePasrser.g can not be extended anymore as adding any more tokens leads to a "code too large" problem, because the compiled code size would exceed 65536 bytes. The real solution would be to introduce anltr4, in the meantime it can be fixed be moving the tokenNames variable into a separate file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459402) Remaining Estimate: 0h Time Spent: 10m > Fix HiveParser "code too large" problem > --- > > Key: HIVE-23857 > URL: https://issues.apache.org/jira/browse/HIVE-23857 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > HivePasrser.g can not be extended anymore as adding any more tokens leads to > a "code too large" problem, because the compiled code size would exceed 65536 > bytes. The real solution would be to introduce anltr4, in the meantime it can > be fixed be moving the tokenNames variable into a separate file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23857) Fix HiveParser "code too large" problem
[ https://issues.apache.org/jira/browse/HIVE-23857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23857: -- Labels: pull-request-available (was: ) > Fix HiveParser "code too large" problem > --- > > Key: HIVE-23857 > URL: https://issues.apache.org/jira/browse/HIVE-23857 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HivePasrser.g can not be extended anymore as adding any more tokens leads to > a "code too large" problem, because the compiled code size would exceed 65536 > bytes. The real solution would be to introduce anltr4, in the meantime it can > be fixed be moving the tokenNames variable into a separate file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23857) Fix HiveParser "code too large" problem
[ https://issues.apache.org/jira/browse/HIVE-23857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely reassigned HIVE-23857: - > Fix HiveParser "code too large" problem > --- > > Key: HIVE-23857 > URL: https://issues.apache.org/jira/browse/HIVE-23857 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > > HivePasrser.g can not be extended anymore as adding any more tokens leads to > a "code too large" problem, because the compiled code size would exceed 65536 > bytes. The real solution would be to introduce anltr4, in the meantime it can > be fixed be moving the tokenNames variable into a separate file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23856) Beeline Should Print Binary Data in Base64
[ https://issues.apache.org/jira/browse/HIVE-23856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hunter Logan reassigned HIVE-23856: --- Assignee: Hunter Logan > Beeline Should Print Binary Data in Base64 > -- > > Key: HIVE-23856 > URL: https://issues.apache.org/jira/browse/HIVE-23856 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: Hunter Logan >Priority: Minor > > Make binary data formatted as Base64 to make it more parse-able by external > applications and easier for humans to convert using a Base64 tool. > https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/Rows.java#L165 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23838) KafkaRecordIteratorTest is flaky
[ https://issues.apache.org/jira/browse/HIVE-23838?focusedWorklogId=459384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459384 ] ASF GitHub Bot logged work on HIVE-23838: - Author: ASF GitHub Bot Created on: 15/Jul/20 16:38 Start Date: 15/Jul/20 16:38 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1245: URL: https://github.com/apache/hive/pull/1245#issuecomment-658871179 Godspeed @klcopp ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459384) Time Spent: 1.5h (was: 1h 20m) > KafkaRecordIteratorTest is flaky > > > Key: HIVE-23838 > URL: https://issues.apache.org/jira/browse/HIVE-23838 > Project: Hive > Issue Type: Bug > Components: kafka integration >Affects Versions: 4.0.0 >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Failed on [4th run of flaky test > checker|http://ci.hive.apache.org/job/hive-flaky-check/69/] with > org.apache.kafka.common.errors.TimeoutException: Timeout expired after > 1milliseconds while awaiting InitProducerId -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-23804) Adding defaults for Columns Stats table in the schema to make them backward compatible
[ https://issues.apache.org/jira/browse/HIVE-23804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158515#comment-17158515 ] Abhishek Somani edited comment on HIVE-23804 at 7/15/20, 4:33 PM: -- [~ngangam] the issue exists in all metastore schemas of 3.0 and above. When any such schema is used with an HMS version < 3.0, one will hit the issue. So for example, if you use metastore db schema of 4.0.0, but HMS version 2.3, one will hit the issue. So basically the backward compatibility of metastore db with older HMS versions is broken. was (Author: asomani): [~ngangam] the issue exists in all metastore schemas of 3.0 and above. When any such schema is used with an HMS version <= 3.0, one will hit the issue. So for example, if you use metastore db schema of 4.0.0, but HMS version 2.3, one will hit the issue. So basically the backward compatibility of metastore db with older HMS versions is broken. > Adding defaults for Columns Stats table in the schema to make them backward > compatible > -- > > Key: HIVE-23804 > URL: https://issues.apache.org/jira/browse/HIVE-23804 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.1.1, 2.3.7 >Reporter: Aditya Shah >Assignee: Aditya Shah >Priority: Major > Attachments: HIVE-23804-1.patch, HIVE-23804.patch > > > Since the table/part column statistics tables have added a new `CAT_NAME` > column with `NOT NULL` constraint in version >3.0.0, queries to analyze > statistics break for Hive versions <3.0.0 when used against an upgraded DB. > One such miss is handled in HIVE-21739. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23804) Adding defaults for Columns Stats table in the schema to make them backward compatible
[ https://issues.apache.org/jira/browse/HIVE-23804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158515#comment-17158515 ] Abhishek Somani commented on HIVE-23804: [~ngangam] the issue exists in all metastore schemas of 3.0 and above. When any such schema is used with an HMS version <= 3.0, one will hit the issue. So for example, if you use metastore db schema of 4.0.0, but HMS version 2.3, one will hit the issue. So basically the backward compatibility of metastore db with older HMS versions is broken. > Adding defaults for Columns Stats table in the schema to make them backward > compatible > -- > > Key: HIVE-23804 > URL: https://issues.apache.org/jira/browse/HIVE-23804 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.1.1, 2.3.7 >Reporter: Aditya Shah >Assignee: Aditya Shah >Priority: Major > Attachments: HIVE-23804-1.patch, HIVE-23804.patch > > > Since the table/part column statistics tables have added a new `CAT_NAME` > column with `NOT NULL` constraint in version >3.0.0, queries to analyze > statistics break for Hive versions <3.0.0 when used against an upgraded DB. > One such miss is handled in HIVE-21739. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23852) Natively support Date type in ReduceSink operator
[ https://issues.apache.org/jira/browse/HIVE-23852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23852: -- Labels: pull-request-available (was: ) > Natively support Date type in ReduceSink operator > - > > Key: HIVE-23852 > URL: https://issues.apache.org/jira/browse/HIVE-23852 > Project: Hive > Issue Type: Improvement >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > There is no native support currently meaning that these types end up being > serialized as multi-key columns which is much slower (iterating through batch > columns instead of writing a value directly) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23852) Natively support Date type in ReduceSink operator
[ https://issues.apache.org/jira/browse/HIVE-23852?focusedWorklogId=459377=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459377 ] ASF GitHub Bot logged work on HIVE-23852: - Author: ASF GitHub Bot Created on: 15/Jul/20 16:23 Start Date: 15/Jul/20 16:23 Worklog Time Spent: 10m Work Description: pgaref opened a new pull request #1257: URL: https://github.com/apache/hive/pull/1257 Adding support for Date type in ReduceSink operator for native vector sink Change-Id: I0b151b72d70f3f57278144def5b64a063cd77623 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459377) Remaining Estimate: 0h Time Spent: 10m > Natively support Date type in ReduceSink operator > - > > Key: HIVE-23852 > URL: https://issues.apache.org/jira/browse/HIVE-23852 > Project: Hive > Issue Type: Improvement >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > There is no native support currently meaning that these types end up being > serialized as multi-key columns which is much slower (iterating through batch > columns instead of writing a value directly) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23855) TestQueryShutdownHooks is flaky
[ https://issues.apache.org/jira/browse/HIVE-23855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mustafa Iman reassigned HIVE-23855: --- Assignee: Mustafa Iman > TestQueryShutdownHooks is flaky > --- > > Key: HIVE-23855 > URL: https://issues.apache.org/jira/browse/HIVE-23855 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Mustafa Iman >Priority: Major > > http://ci.hive.apache.org/job/hive-precommit/job/master/100/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23852) Natively support Date type in ReduceSink operator
[ https://issues.apache.org/jira/browse/HIVE-23852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-23852: -- Summary: Natively support Date type in ReduceSink operator (was: Natively support Date and Timestamp types in ReduceSink operator) > Natively support Date type in ReduceSink operator > - > > Key: HIVE-23852 > URL: https://issues.apache.org/jira/browse/HIVE-23852 > Project: Hive > Issue Type: Improvement >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > > There is no native support currently meaning that these types end up being > serialized as multi-key columns which is much slower (iterating through batch > columns instead of writing a value directly) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23853) CRUD based compaction also should update ACID file version metadata
[ https://issues.apache.org/jira/browse/HIVE-23853?focusedWorklogId=459362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459362 ] ASF GitHub Bot logged work on HIVE-23853: - Author: ASF GitHub Bot Created on: 15/Jul/20 16:04 Start Date: 15/Jul/20 16:04 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #1256: URL: https://github.com/apache/hive/pull/1256#discussion_r455104994 ## File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java ## @@ -161,6 +163,7 @@ private static synchronized MemoryManager getThreadLocalOrcLlapMemoryManager(fin LlapProxy.isDaemon()) { memory(getThreadLocalOrcLlapMemoryManager(conf)); } + isCompaction = tableProperties != null && "true".equals(tableProperties.getProperty(AcidUtils.COMPACTOR_TABLE_PROPERTY)); Review comment: can use AcidUtils#isCompactionTable here ## File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java ## @@ -25,9 +25,11 @@ import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; +import java.util.HashSet; Review comment: unused import ## File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java ## @@ -4206,6 +4206,10 @@ private static void copyFiles(final HiveConf conf, final FileSystem destFs, } } files = fileStatuses.toArray(new FileStatus[files.length]); + + if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_WRITE_ACID_VERSION_FILE)) { +AcidUtils.OrcAcidVersion.writeVersionFile(destf, destFs); + } Review comment: Why write the version file to the destination file during MoveTask, if we already wrote one in FSOperator and we move everything that was written in FSOp to the destination file? ## File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java ## @@ -25,9 +25,11 @@ import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; +import java.util.HashSet; import java.util.LinkedList; import java.util.List; import java.util.Optional; +import java.util.Set; Review comment: unused import This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459362) Time Spent: 20m (was: 10m) > CRUD based compaction also should update ACID file version metadata > --- > > Key: HIVE-23853 > URL: https://issues.apache.org/jira/browse/HIVE-23853 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Current CRUD compaction does not update the file metadata to contain the ACID > version. Also the {{_orc_acid_version}} version file is not created. > We should do this to be consistent across the board. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23492) Remove unnecessary FileSystem#exists calls from ql module
[ https://issues.apache.org/jira/browse/HIVE-23492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HIVE-23492: -- Description: Wherever there is an exists() call before open() or delete(), remove it and infer from the FileNotFoundException raised in open/delete that the file does not exist. Exists() just checks for a FileNotFoundException so it's a waste of time, especially on higher-latency stores (was: Wherever there is an exists() call before open() or delete(), remove it and infer from the FileNotFoundException raised in open/delete that the file does not exist. Exists() just checks for a FileNotFoundException so it's a waste of time, especially on clunkier FSes) > Remove unnecessary FileSystem#exists calls from ql module > - > > Key: HIVE-23492 > URL: https://issues.apache.org/jira/browse/HIVE-23492 > Project: Hive > Issue Type: Improvement >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-23492.01.patch, HIVE-23492.02.patch, > HIVE-23492.03.patch, HIVE-23492.04.patch, HIVE-23492.05.patch > > > Wherever there is an exists() call before open() or delete(), remove it and > infer from the FileNotFoundException raised in open/delete that the file does > not exist. Exists() just checks for a FileNotFoundException so it's a waste > of time, especially on higher-latency stores -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459342 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 15/Jul/20 15:22 Start Date: 15/Jul/20 15:22 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r455136416 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpWork.java ## @@ -207,4 +217,20 @@ public ReplicationMetricCollector getMetricCollector() { public void setMetricCollector(ReplicationMetricCollector metricCollector) { this.metricCollector = metricCollector; } + + public ReplicationSpec getReplicationSpec() { +return replicationSpec; + } + + public void setReplicationSpec(ReplicationSpec replicationSpec) { +this.replicationSpec = replicationSpec; + } + + public FileList getFileList(Path backingFile, int cacheSize, HiveConf conf, boolean b) throws IOException { Review comment: removed the change altogether This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459342) Time Spent: 6h (was: 5h 50m) > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch, HIVE-23069.02.patch > > Time Spent: 6h > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23838) KafkaRecordIteratorTest is flaky
[ https://issues.apache.org/jira/browse/HIVE-23838?focusedWorklogId=459340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459340 ] ASF GitHub Bot logged work on HIVE-23838: - Author: ASF GitHub Bot Created on: 15/Jul/20 15:18 Start Date: 15/Jul/20 15:18 Worklog Time Spent: 10m Work Description: klcopp commented on pull request #1245: URL: https://github.com/apache/hive/pull/1245#issuecomment-658829863 @belugabehr Thanks! I haven't found a real solution to this and it's getting worse. So apparently Z. Haindrich is disabling the test until I fix it.. or someone fixes it.. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459340) Time Spent: 1h 20m (was: 1h 10m) > KafkaRecordIteratorTest is flaky > > > Key: HIVE-23838 > URL: https://issues.apache.org/jira/browse/HIVE-23838 > Project: Hive > Issue Type: Bug > Components: kafka integration >Affects Versions: 4.0.0 >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Failed on [4th run of flaky test > checker|http://ci.hive.apache.org/job/hive-flaky-check/69/] with > org.apache.kafka.common.errors.TimeoutException: Timeout expired after > 1milliseconds while awaiting InitProducerId -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23847) Extracting hive-parser module broke exec jar upload in tez
[ https://issues.apache.org/jira/browse/HIVE-23847?focusedWorklogId=459334=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459334 ] ASF GitHub Bot logged work on HIVE-23847: - Author: ASF GitHub Bot Created on: 15/Jul/20 15:02 Start Date: 15/Jul/20 15:02 Worklog Time Spent: 10m Work Description: asinkovits commented on a change in pull request #1252: URL: https://github.com/apache/hive/pull/1252#discussion_r455122449 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java ## @@ -109,7 +112,7 @@ private final HiveConf conf; private Path tezScratchDir; - private LocalResource appJarLr; + private Collection appJarLrs; Review comment: Thanks for the suggestion. Unfortunately it seems that the generated HiveParser.class has some huge methods, which causes the shade plugin to fail. Error creating shaded jar: Problem shading JAR /home/antals/.m2/repository/org/apache/hive/hive-parser/4.0.0-SNAPSHOT/hive-parser-4.0.0-SNAPSHOT.jar entry org/apache/hadoop/hive/ql/parse/HiveParser.class: java.lang.RuntimeException: Method code too large! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459334) Time Spent: 0.5h (was: 20m) > Extracting hive-parser module broke exec jar upload in tez > -- > > Key: HIVE-23847 > URL: https://issues.apache.org/jira/browse/HIVE-23847 > Project: Hive > Issue Type: Bug >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > 2020-07-13 16:53:50,551 [INFO] [Dispatcher thread {Central}] > |HistoryEventHandler.criticalEvents|: > [HISTORY][DAG:dag_1594632473849_0001_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Map 1, taskAttemptId=attempt_1594632473849_0001_1_00_00_0, > creationTime=1594652027059, allocationTime=1594652028460, > startTime=1594652029356, finishTime=1594652030546, timeTaken=1190, > status=FAILED, taskFailureType=NON_FATAL, errorEnum=FRAMEWORK_ERROR, > diagnostics=Error: Error while running task ( failure ) : > attempt_1594632473849_0001_1_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: Map operator initialization failed > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: Map operator initialization failed > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:340) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) > ... 16 more > Caused by: java.lang.NoClassDefFoundError: > org/apache/hadoop/hive/ql/parse/ParseException > at java.lang.Class.getDeclaredConstructors0(Native Method) > at
[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support
[ https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=459315=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459315 ] ASF GitHub Bot logged work on HIVE-20447: - Author: ASF GitHub Bot Created on: 15/Jul/20 14:17 Start Date: 15/Jul/20 14:17 Worklog Time Spent: 10m Work Description: belugabehr commented on a change in pull request #1169: URL: https://github.com/apache/hive/pull/1169#discussion_r455083837 ## File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * This source file is based on code taken from SQLLine 1.9 + * See SQLLine notice in LICENSE + */ +package org.apache.hive.beeline; + +import java.sql.SQLException; +import java.sql.Types; +import java.io.ByteArrayOutputStream; +import java.io.IOException; + +import com.fasterxml.jackson.core.JsonEncoding; +import com.fasterxml.jackson.core.JsonFactory; +import com.fasterxml.jackson.core.JsonGenerator; + +/** + * OutputFormat for standard JSON format. + * + */ +public class JSONOutputFormat extends AbstractOutputFormat { + protected final BeeLine beeLine; + protected JsonGenerator generator; + + + /** + * @param beeLine + */ + JSONOutputFormat(BeeLine beeLine){ +this.beeLine = beeLine; +ByteArrayOutputStream buf = new ByteArrayOutputStream(); +try { + this.generator = new JsonFactory().createGenerator(buf, JsonEncoding.UTF8); +} catch(IOException e) { + beeLine.handleException(e); +} + } + + @Override + void printHeader(Rows.Row header) { +try { + generator.writeStartObject(); + generator.writeArrayFieldStart("resultset"); +} catch(IOException e) { + beeLine.handleException(e); +} + } + + @Override + void printFooter(Rows.Row header) { +try { + generator.writeEndArray(); + generator.writeEndObject(); + generator.flush(); + String out = ((ByteArrayOutputStream) generator.getOutputTarget()).toString("UTF-8"); + beeLine.output(out); +} catch(IOException e) { + beeLine.handleException(e); +} + } + + @Override + void printRow(Rows rows, Rows.Row header, Rows.Row row) { +String[] head = header.values; +String[] vals = row.values; + +try { + int colCount = rows.rsMeta.getColumnCount(); + boolean objStartFlag = true; + + for (int i = 0; (i < head.length) && (i < vals.length); i++) { +if (objStartFlag) { + generator.writeStartObject(); + objStartFlag = false; +} + +generator.writeFieldName(head[i]); +switch(rows.rsMeta.getColumnType(i+1)) { + case Types.TINYINT: + case Types.SMALLINT: + case Types.INTEGER: + case Types.BIGINT: + case Types.REAL: + case Types.FLOAT: + case Types.DOUBLE: + case Types.DECIMAL: + case Types.NUMERIC: + case Types.ROWID: +generator.writeNumber(vals[i]); +break; + case Types.NULL: +generator.writeNull(); +break; + case Types.BOOLEAN: +generator.writeBoolean(Boolean.parseBoolean(vals[i])); +break; + default: Review comment: OK. I follow now. Sorry I didn't read your comment close enough. :) So, I think for now, what you should so is... Create a new `case` for BINARY data and then check that that this is set to "true". Throw an Exception otherwise. ``` beeLine.getOpts().getConvertBinaryArrayToString(); ``` At least this way, we can have a separate ticket about making this base-64 instead of the current implementation. Then, just use `writeString`. If Google Guava is included, use the `Precondition` class to check This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please
[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support
[ https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=459311=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459311 ] ASF GitHub Bot logged work on HIVE-20447: - Author: ASF GitHub Bot Created on: 15/Jul/20 14:14 Start Date: 15/Jul/20 14:14 Worklog Time Spent: 10m Work Description: belugabehr commented on a change in pull request #1169: URL: https://github.com/apache/hive/pull/1169#discussion_r455083837 ## File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * This source file is based on code taken from SQLLine 1.9 + * See SQLLine notice in LICENSE + */ +package org.apache.hive.beeline; + +import java.sql.SQLException; +import java.sql.Types; +import java.io.ByteArrayOutputStream; +import java.io.IOException; + +import com.fasterxml.jackson.core.JsonEncoding; +import com.fasterxml.jackson.core.JsonFactory; +import com.fasterxml.jackson.core.JsonGenerator; + +/** + * OutputFormat for standard JSON format. + * + */ +public class JSONOutputFormat extends AbstractOutputFormat { + protected final BeeLine beeLine; + protected JsonGenerator generator; + + + /** + * @param beeLine + */ + JSONOutputFormat(BeeLine beeLine){ +this.beeLine = beeLine; +ByteArrayOutputStream buf = new ByteArrayOutputStream(); +try { + this.generator = new JsonFactory().createGenerator(buf, JsonEncoding.UTF8); +} catch(IOException e) { + beeLine.handleException(e); +} + } + + @Override + void printHeader(Rows.Row header) { +try { + generator.writeStartObject(); + generator.writeArrayFieldStart("resultset"); +} catch(IOException e) { + beeLine.handleException(e); +} + } + + @Override + void printFooter(Rows.Row header) { +try { + generator.writeEndArray(); + generator.writeEndObject(); + generator.flush(); + String out = ((ByteArrayOutputStream) generator.getOutputTarget()).toString("UTF-8"); + beeLine.output(out); +} catch(IOException e) { + beeLine.handleException(e); +} + } + + @Override + void printRow(Rows rows, Rows.Row header, Rows.Row row) { +String[] head = header.values; +String[] vals = row.values; + +try { + int colCount = rows.rsMeta.getColumnCount(); + boolean objStartFlag = true; + + for (int i = 0; (i < head.length) && (i < vals.length); i++) { +if (objStartFlag) { + generator.writeStartObject(); + objStartFlag = false; +} + +generator.writeFieldName(head[i]); +switch(rows.rsMeta.getColumnType(i+1)) { + case Types.TINYINT: + case Types.SMALLINT: + case Types.INTEGER: + case Types.BIGINT: + case Types.REAL: + case Types.FLOAT: + case Types.DOUBLE: + case Types.DECIMAL: + case Types.NUMERIC: + case Types.ROWID: +generator.writeNumber(vals[i]); +break; + case Types.NULL: +generator.writeNull(); +break; + case Types.BOOLEAN: +generator.writeBoolean(Boolean.parseBoolean(vals[i])); +break; + default: Review comment: OK. I follow now. Sorry I didn't read your comment close enough. :) So, I think for now, what you should so is... Create a new `case` for BINARY data and then check that that this is set to "true". Throw an Exception otherwise. ``` beeLine.getOpts().getConvertBinaryArrayToString(); ``` At least this way, we can have a separate ticket about making this base-64 instead of the current implementation. If Google Guava is included, use the `Precondition` class to check This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at:
[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support
[ https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=459310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459310 ] ASF GitHub Bot logged work on HIVE-20447: - Author: ASF GitHub Bot Created on: 15/Jul/20 14:13 Start Date: 15/Jul/20 14:13 Worklog Time Spent: 10m Work Description: belugabehr commented on a change in pull request #1169: URL: https://github.com/apache/hive/pull/1169#discussion_r455083837 ## File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * This source file is based on code taken from SQLLine 1.9 + * See SQLLine notice in LICENSE + */ +package org.apache.hive.beeline; + +import java.sql.SQLException; +import java.sql.Types; +import java.io.ByteArrayOutputStream; +import java.io.IOException; + +import com.fasterxml.jackson.core.JsonEncoding; +import com.fasterxml.jackson.core.JsonFactory; +import com.fasterxml.jackson.core.JsonGenerator; + +/** + * OutputFormat for standard JSON format. + * + */ +public class JSONOutputFormat extends AbstractOutputFormat { + protected final BeeLine beeLine; + protected JsonGenerator generator; + + + /** + * @param beeLine + */ + JSONOutputFormat(BeeLine beeLine){ +this.beeLine = beeLine; +ByteArrayOutputStream buf = new ByteArrayOutputStream(); +try { + this.generator = new JsonFactory().createGenerator(buf, JsonEncoding.UTF8); +} catch(IOException e) { + beeLine.handleException(e); +} + } + + @Override + void printHeader(Rows.Row header) { +try { + generator.writeStartObject(); + generator.writeArrayFieldStart("resultset"); +} catch(IOException e) { + beeLine.handleException(e); +} + } + + @Override + void printFooter(Rows.Row header) { +try { + generator.writeEndArray(); + generator.writeEndObject(); + generator.flush(); + String out = ((ByteArrayOutputStream) generator.getOutputTarget()).toString("UTF-8"); + beeLine.output(out); +} catch(IOException e) { + beeLine.handleException(e); +} + } + + @Override + void printRow(Rows rows, Rows.Row header, Rows.Row row) { +String[] head = header.values; +String[] vals = row.values; + +try { + int colCount = rows.rsMeta.getColumnCount(); + boolean objStartFlag = true; + + for (int i = 0; (i < head.length) && (i < vals.length); i++) { +if (objStartFlag) { + generator.writeStartObject(); + objStartFlag = false; +} + +generator.writeFieldName(head[i]); +switch(rows.rsMeta.getColumnType(i+1)) { + case Types.TINYINT: + case Types.SMALLINT: + case Types.INTEGER: + case Types.BIGINT: + case Types.REAL: + case Types.FLOAT: + case Types.DOUBLE: + case Types.DECIMAL: + case Types.NUMERIC: + case Types.ROWID: +generator.writeNumber(vals[i]); +break; + case Types.NULL: +generator.writeNull(); +break; + case Types.BOOLEAN: +generator.writeBoolean(Boolean.parseBoolean(vals[i])); +break; + default: Review comment: OK. I follow now. Sorry I didn't read your comment close enough. :) So, I think for now, what you should so is... Create a new `case` for BINARY data and then check that that this is set to "true". Throw an Exception otherwise. ``` beeLine.getOpts().getConvertBinaryArrayToString(); ``` At least this way, we can have a separate ticket about making this base-64 instead of the current implementation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog
[jira] [Work logged] (HIVE-23793) Review of QueryInfo Class
[ https://issues.apache.org/jira/browse/HIVE-23793?focusedWorklogId=459304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459304 ] ASF GitHub Bot logged work on HIVE-23793: - Author: ASF GitHub Bot Created on: 15/Jul/20 14:03 Start Date: 15/Jul/20 14:03 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on pull request #1197: URL: https://github.com/apache/hive/pull/1197#issuecomment-658787201 OK, changes look to me This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459304) Time Spent: 2h 20m (was: 2h 10m) > Review of QueryInfo Class > - > > Key: HIVE-23793 > URL: https://issues.apache.org/jira/browse/HIVE-23793 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HIVE-23244) Extract Create View analyzer from SemanticAnalyzer
[ https://issues.apache.org/jira/browse/HIVE-23244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely reopened HIVE-23244: --- Had to revert it as the new ANTLR token made the generated class too big, as in the meantime some other token was added too by another patch. Must solve that issue first. > Extract Create View analyzer from SemanticAnalyzer > -- > > Key: HIVE-23244 > URL: https://issues.apache.org/jira/browse/HIVE-23244 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23244.01.patch, HIVE-23244.02.patch, > HIVE-23244.03.patch, HIVE-23244.04.patch, HIVE-23244.05.patch, > HIVE-23244.06.patch, HIVE-23244.07.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Create View commands are not queries, but commands which have queries as a > part of them. Therefore a separate CreateViewAnalyzer is needed which uses > SemanticAnalyer to analyze it's query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23793) Review of QueryInfo Class
[ https://issues.apache.org/jira/browse/HIVE-23793?focusedWorklogId=459300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459300 ] ASF GitHub Bot logged work on HIVE-23793: - Author: ASF GitHub Bot Created on: 15/Jul/20 13:55 Start Date: 15/Jul/20 13:55 Worklog Time Spent: 10m Work Description: belugabehr commented on a change in pull request #1197: URL: https://github.com/apache/hive/pull/1197#discussion_r455070373 ## File path: ql/src/java/org/apache/hadoop/hive/ql/QueryInfo.java ## @@ -70,36 +80,57 @@ public String getExecutionEngine() { return executionEngine; } - public synchronized String getState() { + public String getState() { return state; } + /** + * The time the query began in milliseconds. + * + * @return The time the query began + */ public long getBeginTime() { -return beginTime; +return TimeUnit.NANOSECONDS.toMillis(beginTime); } - public synchronized Long getEndTime() { -return endTime; + /** + * Get the end time in milliseconds. Only valid if {@link #isRunning()} + * returns false. + * + * @return Query end time + */ + public long getEndTime() { +return TimeUnit.NANOSECONDS.toMillis(endTime); } - public synchronized void updateState(String state) { + public void updateState(String state) { this.state = state; } public String getOperationId() { return operationId; } - public synchronized void setEndTime() { -this.endTime = System.currentTimeMillis(); + public void setEndTime() { +this.endTime = System.nanoTime(); } - public synchronized void setRuntime(long runtime) { -this.runtime = runtime; + /** + * Set the amount of time the query spent actually running in milliseconds. + * + * @param runtime The amount of time this query spent running + */ + public void setRuntime(long runtime) { +this.runtime = TimeUnit.MILLISECONDS.toNanos(runtime); Review comment: For simplicity sake, I wanted to keep all of the internal time values the same precision (nano): ``` /* * Times are stored internally with nanosecond precision. */ ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459300) Time Spent: 2h 10m (was: 2h) > Review of QueryInfo Class > - > > Key: HIVE-23793 > URL: https://issues.apache.org/jira/browse/HIVE-23793 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23836) Make "cols" dependent so that it cascade deletes
[ https://issues.apache.org/jira/browse/HIVE-23836?focusedWorklogId=459298=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459298 ] ASF GitHub Bot logged work on HIVE-23836: - Author: ASF GitHub Bot Created on: 15/Jul/20 13:53 Start Date: 15/Jul/20 13:53 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1239: URL: https://github.com/apache/hive/pull/1239#issuecomment-658781200 @ashutoshc You were correct about marking the association for deletion. I'm not sure if this is already the default value, but better safe than sorry. Could use a review. Thanks :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459298) Time Spent: 20m (was: 10m) > Make "cols" dependent so that it cascade deletes > > > Key: HIVE-23836 > URL: https://issues.apache.org/jira/browse/HIVE-23836 > Project: Hive > Issue Type: Bug >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > {quote} > If you want the deletion of a persistent object to cause the deletion of > related objects then you need to mark the related fields in the mapping to be > "dependent". > {quote} > http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields > http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object > The database won't do it: > {code:sql|title=Derby Schema} > ALTER TABLE "APP"."COLUMNS_V2" ADD CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY > ("CD_ID") REFERENCES "APP"."CDS" ("CD_ID") ON DELETE NO ACTION ON UPDATE NO > ACTION; > {code} > https://github.com/apache/hive/blob/65cf6957cf9432277a096f91b40985237274579f/standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql#L452 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23244) Extract Create View analyzer from SemanticAnalyzer
[ https://issues.apache.org/jira/browse/HIVE-23244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely updated HIVE-23244: -- Resolution: Fixed Status: Resolved (was: Patch Available) Merged to master, thank you [~belugabehr] > Extract Create View analyzer from SemanticAnalyzer > -- > > Key: HIVE-23244 > URL: https://issues.apache.org/jira/browse/HIVE-23244 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23244.01.patch, HIVE-23244.02.patch, > HIVE-23244.03.patch, HIVE-23244.04.patch, HIVE-23244.05.patch, > HIVE-23244.06.patch, HIVE-23244.07.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Create View commands are not queries, but commands which have queries as a > part of them. Therefore a separate CreateViewAnalyzer is needed which uses > SemanticAnalyer to analyze it's query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23814) Clean up Driver
[ https://issues.apache.org/jira/browse/HIVE-23814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Gergely resolved HIVE-23814. --- Resolution: Fixed Merged to master, thank you [~pvary] > Clean up Driver > --- > > Key: HIVE-23814 > URL: https://issues.apache.org/jira/browse/HIVE-23814 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > Driver is now cut down to it's minimal size by extracting all of it's sub > tasks to separate classes. The rest should be cleaned up by > * moving out some smaller parts of the code to sub task and utility classes > wherever it is still possible > * fix checkstyle issues > * add missing javadoc > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23244) Extract Create View analyzer from SemanticAnalyzer
[ https://issues.apache.org/jira/browse/HIVE-23244?focusedWorklogId=459284=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459284 ] ASF GitHub Bot logged work on HIVE-23244: - Author: ASF GitHub Bot Created on: 15/Jul/20 13:13 Start Date: 15/Jul/20 13:13 Worklog Time Spent: 10m Work Description: miklosgergely merged pull request #1125: URL: https://github.com/apache/hive/pull/1125 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459284) Time Spent: 40m (was: 0.5h) > Extract Create View analyzer from SemanticAnalyzer > -- > > Key: HIVE-23244 > URL: https://issues.apache.org/jira/browse/HIVE-23244 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23244.01.patch, HIVE-23244.02.patch, > HIVE-23244.03.patch, HIVE-23244.04.patch, HIVE-23244.05.patch, > HIVE-23244.06.patch, HIVE-23244.07.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Create View commands are not queries, but commands which have queries as a > part of them. Therefore a separate CreateViewAnalyzer is needed which uses > SemanticAnalyer to analyze it's query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore
[ https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459283 ] ASF GitHub Bot logged work on HIVE-23815: - Author: ASF GitHub Bot Created on: 15/Jul/20 13:12 Start Date: 15/Jul/20 13:12 Worklog Time Spent: 10m Work Description: xinghuayu007 opened a new pull request #1227: URL: https://github.com/apache/hive/pull/1227 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459283) Time Spent: 2h 20m (was: 2h 10m) > output statistics of underlying datastore > -- > > Key: HIVE-23815 > URL: https://issues.apache.org/jira/browse/HIVE-23815 > Project: Hive > Issue Type: Improvement >Reporter: Rossetti Wong >Assignee: Rossetti Wong >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > This patch provides a way to get the statistics data of metastore's > underlying datastore, like MySQL, Oracle and so on. You can get the number > of datastore reads and writes, the average time of transaction execution, the > total active connection and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore
[ https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459282 ] ASF GitHub Bot logged work on HIVE-23815: - Author: ASF GitHub Bot Created on: 15/Jul/20 13:09 Start Date: 15/Jul/20 13:09 Worklog Time Spent: 10m Work Description: xinghuayu007 closed pull request #1227: URL: https://github.com/apache/hive/pull/1227 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459282) Time Spent: 2h 10m (was: 2h) > output statistics of underlying datastore > -- > > Key: HIVE-23815 > URL: https://issues.apache.org/jira/browse/HIVE-23815 > Project: Hive > Issue Type: Improvement >Reporter: Rossetti Wong >Assignee: Rossetti Wong >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > This patch provides a way to get the statistics data of metastore's > underlying datastore, like MySQL, Oracle and so on. You can get the number > of datastore reads and writes, the average time of transaction execution, the > total active connection and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
[ https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158137#comment-17158137 ] Syed Shameerur Rahman edited comment on HIVE-23851 at 7/15/20, 1:02 PM: [~asinkovits] I guess HIVE-23808 might be due to misconfiguration, may be the value of metastore.expression.proxy is forcefully set to org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore. Check qtest msck_repair_drop.q In this case HIVE-23851 , We set metastore.expression.proxy to org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore for partition filtering and hence hitting this issue was (Author: srahman): [~asinkovits] I guess HIVE-23808 might be due to misconfiguration, may be the value of metastore.expression.proxy is forcefully set to org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore. Check qtest msck_repair_drop.q > MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions > > > Key: HIVE-23851 > URL: https://issues.apache.org/jira/browse/HIVE-23851 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Fix For: 4.0.0 > > > *Steps to reproduce:* > # Create external table > # Run msck command to sync all the partitions with metastore > # Remove one of the partition path > # Run msck repair with partition filtering > *Stack Trace:* > {code:java} > 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] > ppr.PartitionExpressionForMetastore: Failed to deserialize the expression > java.lang.IndexOutOfBoundsException: Index: 110, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_192] > {code} > *Cause:* > In case of msck repair with partition filtering we expect expression proxy > class to be set as PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78 > ), While dropping partition we serialize the drop partition filter > expression as ( > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589 > ) which is incompatible during deserializtion happening in > PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52 > ) hence the query fails with Failed to deserialize the expression. > *Solutions*: > I
[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
[ https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158137#comment-17158137 ] Syed Shameerur Rahman commented on HIVE-23851: -- [~asinkovits] I guess HIVE-23808 might be due to misconfiguration, may be the value of metastore.expression.proxy is forcefully set to org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore. Check qtest msck_repair_drop.q > MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions > > > Key: HIVE-23851 > URL: https://issues.apache.org/jira/browse/HIVE-23851 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Fix For: 4.0.0 > > > *Steps to reproduce:* > # Create external table > # Run msck command to sync all the partitions with metastore > # Remove one of the partition path > # Run msck repair with partition filtering > *Stack Trace:* > {code:java} > 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] > ppr.PartitionExpressionForMetastore: Failed to deserialize the expression > java.lang.IndexOutOfBoundsException: Index: 110, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_192] > {code} > *Cause:* > In case of msck repair with partition filtering we expect expression proxy > class to be set as PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78 > ), While dropping partition we serialize the drop partition filter > expression as ( > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589 > ) which is incompatible during deserializtion happening in > PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52 > ) hence the query fails with Failed to deserialize the expression. > *Solutions*: > I could think of two approaches to this problem > # Since PartitionExpressionForMetastore is required only during parition > pruning step, We can switch back the expression proxy class to > MsckPartitionExpressionProxy once the partition pruning step is done. > # The other solution is to make serialization process in msck drop partition > filter expression compatible with the one with > PartitionExpressionForMetastore, We can do this via Reflection since the drop > partition serialization happens in Msck
[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore
[ https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459275=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459275 ] ASF GitHub Bot logged work on HIVE-23815: - Author: ASF GitHub Bot Created on: 15/Jul/20 12:47 Start Date: 15/Jul/20 12:47 Worklog Time Spent: 10m Work Description: xinghuayu007 opened a new pull request #1227: URL: https://github.com/apache/hive/pull/1227 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459275) Time Spent: 2h (was: 1h 50m) > output statistics of underlying datastore > -- > > Key: HIVE-23815 > URL: https://issues.apache.org/jira/browse/HIVE-23815 > Project: Hive > Issue Type: Improvement >Reporter: Rossetti Wong >Assignee: Rossetti Wong >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > This patch provides a way to get the statistics data of metastore's > underlying datastore, like MySQL, Oracle and so on. You can get the number > of datastore reads and writes, the average time of transaction execution, the > total active connection and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23726) Create table may throw MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string)
[ https://issues.apache.org/jira/browse/HIVE-23726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158124#comment-17158124 ] Naveen Gangam commented on HIVE-23726: -- [~sankarh] [~samuelan] Could you please review ? Its a one-line fix. Thanks > Create table may throw > MetaException(message:java.lang.IllegalArgumentException: Can not create a > Path from a null string) > -- > > Key: HIVE-23726 > URL: https://issues.apache.org/jira/browse/HIVE-23726 > Project: Hive > Issue Type: Bug >Reporter: Istvan Fajth >Assignee: Naveen Gangam >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > - Given: > metastore.warehouse.tenant.colocation is set to true > a test database was created as {{create database test location '/data'}} > - When: > I try to create a table as {{create table t1 (a int) location '/data/t1'}} > - Then: > The create table fails with the following exception: > {code} > org.apache.hadoop.hive.ql.metadata.HiveException: > MetaException(message:java.lang.IllegalArgumentException: Can not create a > Path from a null string) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1138) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1143) > at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:148) > at > org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98) > at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: > java.lang.IllegalArgumentException: Can not create a Path from a null string > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63325) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63293) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result.read(ThriftHiveMetastore.java:63219) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_req(ThriftHiveMetastore.java:1780) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_req(ThriftHiveMetastore.java:1767) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:3518) > at >
[jira] [Commented] (HIVE-23855) TestQueryShutdownHooks is flaky
[ https://issues.apache.org/jira/browse/HIVE-23855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158129#comment-17158129 ] Zoltan Haindrich commented on HIVE-23855: - fyi [~mustafaiman] > TestQueryShutdownHooks is flaky > --- > > Key: HIVE-23855 > URL: https://issues.apache.org/jira/browse/HIVE-23855 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Priority: Major > > http://ci.hive.apache.org/job/hive-precommit/job/master/100/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore
[ https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459273=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459273 ] ASF GitHub Bot logged work on HIVE-23815: - Author: ASF GitHub Bot Created on: 15/Jul/20 12:46 Start Date: 15/Jul/20 12:46 Worklog Time Spent: 10m Work Description: xinghuayu007 closed pull request #1227: URL: https://github.com/apache/hive/pull/1227 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459273) Time Spent: 1h 50m (was: 1h 40m) > output statistics of underlying datastore > -- > > Key: HIVE-23815 > URL: https://issues.apache.org/jira/browse/HIVE-23815 > Project: Hive > Issue Type: Improvement >Reporter: Rossetti Wong >Assignee: Rossetti Wong >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > This patch provides a way to get the statistics data of metastore's > underlying datastore, like MySQL, Oracle and so on. You can get the number > of datastore reads and writes, the average time of transaction execution, the > total active connection and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
[ https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158126#comment-17158126 ] Antal Sinkovits commented on HIVE-23851: [~srahman] I'm a bit confused now. Is this the same issue as https://issues.apache.org/jira/browse/HIVE-23808 ? > MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions > > > Key: HIVE-23851 > URL: https://issues.apache.org/jira/browse/HIVE-23851 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Fix For: 4.0.0 > > > *Steps to reproduce:* > # Create external table > # Run msck command to sync all the partitions with metastore > # Remove one of the partition path > # Run msck repair with partition filtering > *Stack Trace:* > {code:java} > 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] > ppr.PartitionExpressionForMetastore: Failed to deserialize the expression > java.lang.IndexOutOfBoundsException: Index: 110, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_192] > {code} > *Cause:* > In case of msck repair with partition filtering we expect expression proxy > class to be set as PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78 > ), While dropping partition we serialize the drop partition filter > expression as ( > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589 > ) which is incompatible during deserializtion happening in > PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52 > ) hence the query fails with Failed to deserialize the expression. > *Solutions*: > I could think of two approaches to this problem > # Since PartitionExpressionForMetastore is required only during parition > pruning step, We can switch back the expression proxy class to > MsckPartitionExpressionProxy once the partition pruning step is done. > # The other solution is to make serialization process in msck drop partition > filter expression compatible with the one with > PartitionExpressionForMetastore, We can do this via Reflection since the drop > partition serialization happens in Msck class (standadlone-metatsore) by this > way we can completely remove the need for class MsckPartitionExpressionProxy > and this also helps to
[jira] [Work logged] (HIVE-23814) Clean up Driver
[ https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=459274=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459274 ] ASF GitHub Bot logged work on HIVE-23814: - Author: ASF GitHub Bot Created on: 15/Jul/20 12:46 Start Date: 15/Jul/20 12:46 Worklog Time Spent: 10m Work Description: miklosgergely merged pull request #1222: URL: https://github.com/apache/hive/pull/1222 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459274) Time Spent: 2h 40m (was: 2.5h) > Clean up Driver > --- > > Key: HIVE-23814 > URL: https://issues.apache.org/jira/browse/HIVE-23814 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > Driver is now cut down to it's minimal size by extracting all of it's sub > tasks to separate classes. The rest should be cleaned up by > * moving out some smaller parts of the code to sub task and utility classes > wherever it is still possible > * fix checkstyle issues > * add missing javadoc > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23853) CRUD based compaction also should update ACID file version metadata
[ https://issues.apache.org/jira/browse/HIVE-23853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23853: -- Labels: pull-request-available (was: ) > CRUD based compaction also should update ACID file version metadata > --- > > Key: HIVE-23853 > URL: https://issues.apache.org/jira/browse/HIVE-23853 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Current CRUD compaction does not update the file metadata to contain the ACID > version. Also the {{_orc_acid_version}} version file is not created. > We should do this to be consistent across the board. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23853) CRUD based compaction also should update ACID file version metadata
[ https://issues.apache.org/jira/browse/HIVE-23853?focusedWorklogId=459270=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459270 ] ASF GitHub Bot logged work on HIVE-23853: - Author: ASF GitHub Bot Created on: 15/Jul/20 12:28 Start Date: 15/Jul/20 12:28 Worklog Time Spent: 10m Work Description: pvary opened a new pull request #1256: URL: https://github.com/apache/hive/pull/1256 Made sure that the version metadata is update at file creation, and the version file is created. Updated tests as well This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459270) Remaining Estimate: 0h Time Spent: 10m > CRUD based compaction also should update ACID file version metadata > --- > > Key: HIVE-23853 > URL: https://issues.apache.org/jira/browse/HIVE-23853 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Current CRUD compaction does not update the file metadata to contain the ACID > version. Also the {{_orc_acid_version}} version file is not created. > We should do this to be consistent across the board. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23854) Natively support Double and Decimal CVs in ReduceSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-23854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis reassigned HIVE-23854: - > Natively support Double and Decimal CVs in ReduceSinkOperator > - > > Key: HIVE-23854 > URL: https://issues.apache.org/jira/browse/HIVE-23854 > Project: Hive > Issue Type: Improvement >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore
[ https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459267=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459267 ] ASF GitHub Bot logged work on HIVE-23815: - Author: ASF GitHub Bot Created on: 15/Jul/20 12:26 Start Date: 15/Jul/20 12:26 Worklog Time Spent: 10m Work Description: xinghuayu007 opened a new pull request #1227: URL: https://github.com/apache/hive/pull/1227 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459267) Time Spent: 1h 40m (was: 1.5h) > output statistics of underlying datastore > -- > > Key: HIVE-23815 > URL: https://issues.apache.org/jira/browse/HIVE-23815 > Project: Hive > Issue Type: Improvement >Reporter: Rossetti Wong >Assignee: Rossetti Wong >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > This patch provides a way to get the statistics data of metastore's > underlying datastore, like MySQL, Oracle and so on. You can get the number > of datastore reads and writes, the average time of transaction execution, the > total active connection and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23853) CRUD based compaction also should update ACID file version metadata
[ https://issues.apache.org/jira/browse/HIVE-23853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary reassigned HIVE-23853: - > CRUD based compaction also should update ACID file version metadata > --- > > Key: HIVE-23853 > URL: https://issues.apache.org/jira/browse/HIVE-23853 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > > Current CRUD compaction does not update the file metadata to contain the ACID > version. Also the {{_orc_acid_version}} version file is not created. > We should do this to be consistent across the board. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore
[ https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459264 ] ASF GitHub Bot logged work on HIVE-23815: - Author: ASF GitHub Bot Created on: 15/Jul/20 12:22 Start Date: 15/Jul/20 12:22 Worklog Time Spent: 10m Work Description: xinghuayu007 closed pull request #1227: URL: https://github.com/apache/hive/pull/1227 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459264) Time Spent: 1.5h (was: 1h 20m) > output statistics of underlying datastore > -- > > Key: HIVE-23815 > URL: https://issues.apache.org/jira/browse/HIVE-23815 > Project: Hive > Issue Type: Improvement >Reporter: Rossetti Wong >Assignee: Rossetti Wong >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > This patch provides a way to get the statistics data of metastore's > underlying datastore, like MySQL, Oracle and so on. You can get the number > of datastore reads and writes, the average time of transaction execution, the > total active connection and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore
[ https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459261=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459261 ] ASF GitHub Bot logged work on HIVE-23815: - Author: ASF GitHub Bot Created on: 15/Jul/20 12:17 Start Date: 15/Jul/20 12:17 Worklog Time Spent: 10m Work Description: xinghuayu007 opened a new pull request #1227: URL: https://github.com/apache/hive/pull/1227 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459261) Time Spent: 1h 20m (was: 1h 10m) > output statistics of underlying datastore > -- > > Key: HIVE-23815 > URL: https://issues.apache.org/jira/browse/HIVE-23815 > Project: Hive > Issue Type: Improvement >Reporter: Rossetti Wong >Assignee: Rossetti Wong >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > This patch provides a way to get the statistics data of metastore's > underlying datastore, like MySQL, Oracle and so on. You can get the number > of datastore reads and writes, the average time of transaction execution, the > total active connection and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23852) Natively support Date and Timestamp types in ReduceSink operator
[ https://issues.apache.org/jira/browse/HIVE-23852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis reassigned HIVE-23852: - > Natively support Date and Timestamp types in ReduceSink operator > > > Key: HIVE-23852 > URL: https://issues.apache.org/jira/browse/HIVE-23852 > Project: Hive > Issue Type: Improvement >Reporter: Panagiotis Garefalakis >Assignee: Panagiotis Garefalakis >Priority: Major > > There is no native support currently meaning that these types end up being > serialized as multi-key columns which is much slower (iterating through batch > columns instead of writing a value directly) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore
[ https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459254 ] ASF GitHub Bot logged work on HIVE-23815: - Author: ASF GitHub Bot Created on: 15/Jul/20 11:57 Start Date: 15/Jul/20 11:57 Worklog Time Spent: 10m Work Description: xinghuayu007 closed pull request #1227: URL: https://github.com/apache/hive/pull/1227 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459254) Time Spent: 1h 10m (was: 1h) > output statistics of underlying datastore > -- > > Key: HIVE-23815 > URL: https://issues.apache.org/jira/browse/HIVE-23815 > Project: Hive > Issue Type: Improvement >Reporter: Rossetti Wong >Assignee: Rossetti Wong >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > This patch provides a way to get the statistics data of metastore's > underlying datastore, like MySQL, Oracle and so on. You can get the number > of datastore reads and writes, the average time of transaction execution, the > total active connection and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23540) Fix Findbugs Warnings in EncodedColumnBatch
[ https://issues.apache.org/jira/browse/HIVE-23540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis updated HIVE-23540: -- Resolution: Duplicate Status: Resolved (was: Patch Available) > Fix Findbugs Warnings in EncodedColumnBatch > --- > > Key: HIVE-23540 > URL: https://issues.apache.org/jira/browse/HIVE-23540 > Project: Hive > Issue Type: Improvement > Components: storage-api >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HIVE-23540.1.patch, HIVE-23540.2.patch > > > bq. Strings should not be concatenated using '+' in a loop -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23850) Allow PPD when subject is not a column with grouping sets present
[ https://issues.apache.org/jira/browse/HIVE-23850?focusedWorklogId=459239=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459239 ] ASF GitHub Bot logged work on HIVE-23850: - Author: ASF GitHub Bot Created on: 15/Jul/20 11:19 Start Date: 15/Jul/20 11:19 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #1255: URL: https://github.com/apache/hive/pull/1255 … present ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459239) Remaining Estimate: 0h Time Spent: 10m > Allow PPD when subject is not a column with grouping sets present > - > > Key: HIVE-23850 > URL: https://issues.apache.org/jira/browse/HIVE-23850 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > After [HIVE-19653|https://issues.apache.org/jira/browse/HIVE-19653], filters > with only columns and constants are pushed down, but in some cases, this may > not work as well, for example: > SET hive.cbo.enable=false; > SELECT a, b, sum(s) > FROM T1 > GROUP BY a, b GROUPING SETS ((a), (a, b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > SELECT upper(a), b, sum(s) > FROM T1 > GROUP BY upper(a), b GROUPING SETS ((upper(a)), (upper(a), b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > The filters pushed down to GBY can be f(gbyKey) or gbyKey with udf , not > only the column groupby keys. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23850) Allow PPD when subject is not a column with grouping sets present
[ https://issues.apache.org/jira/browse/HIVE-23850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23850: -- Labels: pull-request-available (was: ) > Allow PPD when subject is not a column with grouping sets present > - > > Key: HIVE-23850 > URL: https://issues.apache.org/jira/browse/HIVE-23850 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > After [HIVE-19653|https://issues.apache.org/jira/browse/HIVE-19653], filters > with only columns and constants are pushed down, but in some cases, this may > not work as well, for example: > SET hive.cbo.enable=false; > SELECT a, b, sum(s) > FROM T1 > GROUP BY a, b GROUPING SETS ((a), (a, b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > SELECT upper(a), b, sum(s) > FROM T1 > GROUP BY upper(a), b GROUPING SETS ((upper(a)), (upper(a), b)) > HAVING upper(a) = "AAA" AND sum(s) > 100; > > The filters pushed down to GBY can be f(gbyKey) or gbyKey with udf , not > only the column groupby keys. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
[ https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HIVE-23851: - Fix Version/s: 4.0.0 > MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions > > > Key: HIVE-23851 > URL: https://issues.apache.org/jira/browse/HIVE-23851 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Fix For: 4.0.0 > > > *Steps to reproduce:* > # Create external table > # Run msck command to sync all the partitions with metastore > # Remove one of the partition path > # Run msck repair with partition filtering > *Stack Trace:* > {code:java} > 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] > ppr.PartitionExpressionForMetastore: Failed to deserialize the expression > java.lang.IndexOutOfBoundsException: Index: 110, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_192] > {code} > *Cause:* > In case of msck repair with partition filtering we expect expression proxy > class to be set as PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78 > ), While dropping partition we serialize the drop partition filter > expression as ( > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589 > ) which is incompatible during deserializtion happening in > PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52 > ) hence the query fails with Failed to deserialize the expression. > *Solutions*: > I could think of two approaches to this problem > # Since PartitionExpressionForMetastore is required only during parition > pruning step, We can switch back the expression proxy class to > MsckPartitionExpressionProxy once the partition pruning step is done. > # The other solution is to make serialization process in msck drop partition > filter expression compatible with the one with > PartitionExpressionForMetastore, We can do this via Reflection since the drop > partition serialization happens in Msck class (standadlone-metatsore) by this > way we can completely remove the need for class MsckPartitionExpressionProxy > and this also helps to reduce the complexity of Msck Repair command with > parition filtering to work with ease (no need to set the
[jira] [Updated] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
[ https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman updated HIVE-23851: - Target Version/s: (was: 4.0.0) > MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions > > > Key: HIVE-23851 > URL: https://issues.apache.org/jira/browse/HIVE-23851 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > > *Steps to reproduce:* > # Create external table > # Run msck command to sync all the partitions with metastore > # Remove one of the partition path > # Run msck repair with partition filtering > *Stack Trace:* > {code:java} > 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] > ppr.PartitionExpressionForMetastore: Failed to deserialize the expression > java.lang.IndexOutOfBoundsException: Index: 110, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_192] > {code} > *Cause:* > In case of msck repair with partition filtering we expect expression proxy > class to be set as PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78 > ), While dropping partition we serialize the drop partition filter > expression as ( > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589 > ) which is incompatible during deserializtion happening in > PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52 > ) hence the query fails with Failed to deserialize the expression. > *Solutions*: > I could think of two approaches to this problem > # Since PartitionExpressionForMetastore is required only during parition > pruning step, We can switch back the expression proxy class to > MsckPartitionExpressionProxy once the partition pruning step is done. > # The other solution is to make serialization process in msck drop partition > filter expression compatible with the one with > PartitionExpressionForMetastore, We can do this via Reflection since the drop > partition serialization happens in Msck class (standadlone-metatsore) by this > way we can completely remove the need for class MsckPartitionExpressionProxy > and this also helps to reduce the complexity of Msck Repair command with > parition filtering to work with ease (no need to set the expression > proxyClass
[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
[ https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158086#comment-17158086 ] Syed Shameerur Rahman commented on HIVE-23851: -- [~kgyrtkirk] [~prasanth_j] Any thoughts on the above issue? > MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions > > > Key: HIVE-23851 > URL: https://issues.apache.org/jira/browse/HIVE-23851 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > > *Steps to reproduce:* > # Create external table > # Run msck command to sync all the partitions with metastore > # Remove one of the partition path > # Run msck repair with partition filtering > *Stack Trace:* > {code:java} > 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] > ppr.PartitionExpressionForMetastore: Failed to deserialize the expression > java.lang.IndexOutOfBoundsException: Index: 110, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_192] > {code} > *Cause:* > In case of msck repair with partition filtering we expect expression proxy > class to be set as PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78 > ), While dropping partition we serialize the drop partition filter > expression as ( > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589 > ) which is incompatible during deserializtion happening in > PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52 > ) hence the query fails with Failed to deserialize the expression. > *Solutions*: > I could think of two approaches to this problem > # Since PartitionExpressionForMetastore is required only during parition > pruning step, We can switch back the expression proxy class to > MsckPartitionExpressionProxy once the partition pruning step is done. > # The other solution is to make serialization process in msck drop partition > filter expression compatible with the one with > PartitionExpressionForMetastore, We can do this via Reflection since the drop > partition serialization happens in Msck class (standadlone-metatsore) by this > way we can completely remove the need for class MsckPartitionExpressionProxy > and this also helps to reduce the complexity of Msck Repair command with > parition filtering
[jira] [Assigned] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
[ https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Shameerur Rahman reassigned HIVE-23851: > MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions > > > Key: HIVE-23851 > URL: https://issues.apache.org/jira/browse/HIVE-23851 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > > *Steps to reproduce:* > # Create external table > # Run msck command to sync all the partitions with metastore > # Remove one of the partition path > # Run msck repair with partition filtering > *Stack Trace:* > {code:java} > 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] > ppr.PartitionExpressionForMetastore: Failed to deserialize the expression > java.lang.IndexOutOfBoundsException: Index: 110, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80) > [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_192] > {code} > *Cause:* > In case of msck repair with partition filtering we expect expression proxy > class to be set as PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78 > ), While dropping partition we serialize the drop partition filter > expression as ( > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589 > ) which is incompatible during deserializtion happening in > PartitionExpressionForMetastore ( > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52 > ) hence the query fails with Failed to deserialize the expression. > *Solutions*: > I could think of two approaches to this problem > # Since PartitionExpressionForMetastore is required only during parition > pruning step, We can switch back the expression proxy class to > MsckPartitionExpressionProxy once the partition pruning step is done. > # The other solution is to make serialization process in msck drop partition > filter expression compatible with the one with > PartitionExpressionForMetastore, We can do this via Reflection since the drop > partition serialization happens in Msck class (standadlone-metatsore) by this > way we can completely remove the need for class MsckPartitionExpressionProxy > and this also helps to reduce the complexity of Msck Repair command with > parition filtering to work with ease (no need to set the expression > proxyClass config). > I am personally
[jira] [Resolved] (HIVE-23848) TestHiveMetaStoreChecker and TestMiniLlapLocalCliDriver tests are failing in master
[ https://issues.apache.org/jira/browse/HIVE-23848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary resolved HIVE-23848. --- Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master. Thanks for the patch [~kishendas]! > TestHiveMetaStoreChecker and TestMiniLlapLocalCliDriver tests are failing in > master > --- > > Key: HIVE-23848 > URL: https://issues.apache.org/jira/browse/HIVE-23848 > Project: Hive > Issue Type: Test > Components: HiveServer2 >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > Fix For: 4.0.0 > > > Below tests are failing after HIVE-23767 landed in master. > testAddPartitionNormalDeltas – > org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker > testCliDriver[show_partitions2] – > org.apache.hadoop.hive.cli.split21.TestMiniLlapLocalCliDriver > testAddPartitionMMBase – > org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker > testAddPartitionCompactedDeltas – > org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker > testAddPartitionCompactedBase – > org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cancel background
[ https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=459216=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459216 ] ASF GitHub Bot logged work on HIVE-23727: - Author: ASF GitHub Bot Created on: 15/Jul/20 10:17 Start Date: 15/Jul/20 10:17 Worklog Time Spent: 10m Work Description: dengzhhu653 opened a new pull request #1149: URL: https://github.com/apache/hive/pull/1149 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY) For more details, please see https://cwiki.apache.org/confluence/display/Hive/HowToContribute This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459216) Time Spent: 3h (was: 2h 50m) > Improve SQLOperation log handling when cancel background > > > Key: HIVE-23727 > URL: https://issues.apache.org/jira/browse/HIVE-23727 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > The SQLOperation checks _if (shouldRunAsync() && state != > OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the > background task. If true, the state should not be OperationState.CANCELED, so > logging under the state == OperationState.CANCELED should never happen. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459212=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459212 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 15/Jul/20 10:12 Start Date: 15/Jul/20 10:12 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r454943283 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.repl.util; + +import com.google.common.annotations.VisibleForTesting; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedReader; +import java.io.IOException; +import java.io.InputStreamReader; +import java.util.Iterator; +import java.util.NoSuchElementException; +import java.util.concurrent.LinkedBlockingQueue; + + +/** + * A file backed list of Strings which is in-memory till the threshold. + */ +public class FileList implements AutoCloseable, Iterator { + private static final Logger LOG = LoggerFactory.getLogger(FileList.class); + private static int fileListStreamerID = 0; + private static final String FILE_LIST_STREAMER_PREFIX = "file-list-streamer-"; + + private LinkedBlockingQueue cache; + private volatile boolean thresholdHit = false; + private int thresholdPoint; + private float thresholdFactor = 0.9f; + private Path backingFile; + private FileListStreamer fileListStreamer; + private String nextElement; + private boolean noMoreElement; + private HiveConf conf; + private BufferedReader backingFileReader; + + + public FileList(Path backingFile, int cacheSize, HiveConf conf) throws IOException { +this.backingFile = backingFile; +if (cacheSize > 0) { + // Cache size must be > 0 for this list to be used for the write operation. + this.cache = new LinkedBlockingQueue<>(cacheSize); + fileListStreamer = new FileListStreamer(cache, backingFile, conf); + LOG.debug("File list backed by {} can be used for write operation.", backingFile); +} else { + thresholdHit = true; +} +this.conf = conf; +thresholdPoint = getThreshold(cacheSize); + } + + /** + * Only add operation is safe for concurrent operations. + */ + public void add(String entry) throws SemanticException { +if (thresholdHit && !fileListStreamer.isAlive()) { + throw new SemanticException("List is not getting saved anymore to file " + backingFile.toString()); +} +try { + cache.put(entry); +} catch (InterruptedException e) { + throw new SemanticException(e); +} +if (!thresholdHit && cache.size() >= thresholdPoint) { + initStoreToFile(cache.size()); +} + } + + @Override + public boolean hasNext() { +if (!thresholdHit) { + return (cache != null && !cache.isEmpty()); +} +if (nextElement != null) { + return true; +} +if (noMoreElement) { + return false; +} +nextElement = readNextLine(); +if (nextElement == null) { + noMoreElement = true; +} +return !noMoreElement; + } + + @Override + public String next() { +if (!hasNext()) { + throw new NoSuchElementException("No more element in the list backed by " + backingFile); +} +String retVal = nextElement; +nextElement = null; +return thresholdHit ? retVal : cache.poll(); + } + + private synchronized void initStoreToFile(int cacheSize) { +if (!thresholdHit) { + fileListStreamer.setName(getNextID()); + fileListStreamer.setDaemon(true); + fileListStreamer.start(); + thresholdHit = true; + LOG.info("Started streaming the list elements to file: {}, cache size {}", backingFile, cacheSize); +} + } + + private String readNextLine() { +String nextElement = null;
[jira] [Updated] (HIVE-23474) Deny Repl Dump if the database is a target of replication
[ https://issues.apache.org/jira/browse/HIVE-23474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23474: --- Status: In Progress (was: Patch Available) > Deny Repl Dump if the database is a target of replication > - > > Key: HIVE-23474 > URL: https://issues.apache.org/jira/browse/HIVE-23474 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23474.01.patch, HIVE-23474.02.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23474) Deny Repl Dump if the database is a target of replication
[ https://issues.apache.org/jira/browse/HIVE-23474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23474: --- Attachment: HIVE-23474.02.patch Status: Patch Available (was: In Progress) > Deny Repl Dump if the database is a target of replication > - > > Key: HIVE-23474 > URL: https://issues.apache.org/jira/browse/HIVE-23474 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23474.01.patch, HIVE-23474.02.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459208 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 15/Jul/20 10:06 Start Date: 15/Jul/20 10:06 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r454940146 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.repl.util; + +import com.google.common.annotations.VisibleForTesting; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedReader; +import java.io.IOException; +import java.io.InputStreamReader; +import java.util.Iterator; +import java.util.NoSuchElementException; +import java.util.concurrent.LinkedBlockingQueue; + + +/** + * A file backed list of Strings which is in-memory till the threshold. + */ +public class FileList implements AutoCloseable, Iterator { + private static final Logger LOG = LoggerFactory.getLogger(FileList.class); + private static int fileListStreamerID = 0; + private static final String FILE_LIST_STREAMER_PREFIX = "file-list-streamer-"; + + private LinkedBlockingQueue cache; + private volatile boolean thresholdHit = false; + private int thresholdPoint; + private float thresholdFactor = 0.9f; + private Path backingFile; + private FileListStreamer fileListStreamer; + private String nextElement; + private boolean noMoreElement; + private HiveConf conf; + private BufferedReader backingFileReader; + + + public FileList(Path backingFile, int cacheSize, HiveConf conf) throws IOException { +this.backingFile = backingFile; +if (cacheSize > 0) { + // Cache size must be > 0 for this list to be used for the write operation. + this.cache = new LinkedBlockingQueue<>(cacheSize); + fileListStreamer = new FileListStreamer(cache, backingFile, conf); + LOG.debug("File list backed by {} can be used for write operation.", backingFile); +} else { + thresholdHit = true; +} +this.conf = conf; +thresholdPoint = getThreshold(cacheSize); + } + + /** + * Only add operation is safe for concurrent operations. + */ + public void add(String entry) throws SemanticException { +if (thresholdHit && !fileListStreamer.isAlive()) { + throw new SemanticException("List is not getting saved anymore to file " + backingFile.toString()); +} +try { + cache.put(entry); +} catch (InterruptedException e) { + throw new SemanticException(e); +} +if (!thresholdHit && cache.size() >= thresholdPoint) { + initStoreToFile(cache.size()); +} + } + + @Override + public boolean hasNext() { +if (!thresholdHit) { + return (cache != null && !cache.isEmpty()); +} +if (nextElement != null) { + return true; +} +if (noMoreElement) { + return false; +} +nextElement = readNextLine(); +if (nextElement == null) { + noMoreElement = true; +} +return !noMoreElement; + } + + @Override + public String next() { +if (!hasNext()) { + throw new NoSuchElementException("No more element in the list backed by " + backingFile); +} +String retVal = nextElement; +nextElement = null; +return thresholdHit ? retVal : cache.poll(); + } + + private synchronized void initStoreToFile(int cacheSize) { +if (!thresholdHit) { + fileListStreamer.setName(getNextID()); + fileListStreamer.setDaemon(true); + fileListStreamer.start(); + thresholdHit = true; + LOG.info("Started streaming the list elements to file: {}, cache size {}", backingFile, cacheSize); +} + } + + private String readNextLine() { +String nextElement =
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459203 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 15/Jul/20 10:03 Start Date: 15/Jul/20 10:03 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r454938290 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.repl.util; + +import com.google.common.annotations.VisibleForTesting; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedReader; +import java.io.IOException; +import java.io.InputStreamReader; +import java.util.Iterator; +import java.util.NoSuchElementException; +import java.util.concurrent.LinkedBlockingQueue; + + +/** + * A file backed list of Strings which is in-memory till the threshold. + */ +public class FileList implements AutoCloseable, Iterator { + private static final Logger LOG = LoggerFactory.getLogger(FileList.class); + private static int fileListStreamerID = 0; + private static final String FILE_LIST_STREAMER_PREFIX = "file-list-streamer-"; + + private LinkedBlockingQueue cache; + private volatile boolean thresholdHit = false; + private int thresholdPoint; + private float thresholdFactor = 0.9f; + private Path backingFile; + private FileListStreamer fileListStreamer; + private String nextElement; + private boolean noMoreElement; + private HiveConf conf; + private BufferedReader backingFileReader; + + + public FileList(Path backingFile, int cacheSize, HiveConf conf) throws IOException { +this.backingFile = backingFile; +if (cacheSize > 0) { + // Cache size must be > 0 for this list to be used for the write operation. + this.cache = new LinkedBlockingQueue<>(cacheSize); + fileListStreamer = new FileListStreamer(cache, backingFile, conf); + LOG.debug("File list backed by {} can be used for write operation.", backingFile); +} else { + thresholdHit = true; +} +this.conf = conf; +thresholdPoint = getThreshold(cacheSize); + } + + /** + * Only add operation is safe for concurrent operations. + */ + public void add(String entry) throws SemanticException { +if (thresholdHit && !fileListStreamer.isAlive()) { + throw new SemanticException("List is not getting saved anymore to file " + backingFile.toString()); +} +try { + cache.put(entry); +} catch (InterruptedException e) { + throw new SemanticException(e); +} +if (!thresholdHit && cache.size() >= thresholdPoint) { + initStoreToFile(cache.size()); +} + } + + @Override + public boolean hasNext() { +if (!thresholdHit) { + return (cache != null && !cache.isEmpty()); +} +if (nextElement != null) { + return true; +} +if (noMoreElement) { + return false; +} +nextElement = readNextLine(); +if (nextElement == null) { + noMoreElement = true; +} +return !noMoreElement; + } + + @Override + public String next() { +if (!hasNext()) { + throw new NoSuchElementException("No more element in the list backed by " + backingFile); +} +String retVal = nextElement; +nextElement = null; +return thresholdHit ? retVal : cache.poll(); + } + + private synchronized void initStoreToFile(int cacheSize) { +if (!thresholdHit) { + fileListStreamer.setName(getNextID()); + fileListStreamer.setDaemon(true); + fileListStreamer.start(); + thresholdHit = true; + LOG.info("Started streaming the list elements to file: {}, cache size {}", backingFile, cacheSize); +} + } + + private String readNextLine() { +String nextElement = null;
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459191=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459191 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 15/Jul/20 09:49 Start Date: 15/Jul/20 09:49 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r454929778 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.repl.util; + +import com.google.common.annotations.VisibleForTesting; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hive.conf.HiveConf; +import org.apache.hadoop.hive.ql.parse.SemanticException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedReader; +import java.io.IOException; +import java.io.InputStreamReader; +import java.util.Iterator; +import java.util.NoSuchElementException; +import java.util.concurrent.LinkedBlockingQueue; + + +/** + * A file backed list of Strings which is in-memory till the threshold. + */ +public class FileList implements AutoCloseable, Iterator { + private static final Logger LOG = LoggerFactory.getLogger(FileList.class); + private static int fileListStreamerID = 0; + private static final String FILE_LIST_STREAMER_PREFIX = "file-list-streamer-"; + + private LinkedBlockingQueue cache; + private volatile boolean thresholdHit = false; + private int thresholdPoint; + private float thresholdFactor = 0.9f; + private Path backingFile; + private FileListStreamer fileListStreamer; + private String nextElement; + private boolean noMoreElement; + private HiveConf conf; + private BufferedReader backingFileReader; + + + public FileList(Path backingFile, int cacheSize, HiveConf conf) throws IOException { +this.backingFile = backingFile; +if (cacheSize > 0) { + // Cache size must be > 0 for this list to be used for the write operation. + this.cache = new LinkedBlockingQueue<>(cacheSize); + fileListStreamer = new FileListStreamer(cache, backingFile, conf); + LOG.debug("File list backed by {} can be used for write operation.", backingFile); +} else { + thresholdHit = true; +} +this.conf = conf; +thresholdPoint = getThreshold(cacheSize); + } + + /** + * Only add operation is safe for concurrent operations. + */ + public void add(String entry) throws SemanticException { +if (thresholdHit && !fileListStreamer.isAlive()) { + throw new SemanticException("List is not getting saved anymore to file " + backingFile.toString()); +} +try { + cache.put(entry); +} catch (InterruptedException e) { + throw new SemanticException(e); +} +if (!thresholdHit && cache.size() >= thresholdPoint) { + initStoreToFile(cache.size()); +} + } + + @Override + public boolean hasNext() { +if (!thresholdHit) { + return (cache != null && !cache.isEmpty()); +} +if (nextElement != null) { + return true; +} +if (noMoreElement) { + return false; +} +nextElement = readNextLine(); +if (nextElement == null) { + noMoreElement = true; +} +return !noMoreElement; + } + + @Override + public String next() { +if (!hasNext()) { + throw new NoSuchElementException("No more element in the list backed by " + backingFile); +} +String retVal = nextElement; +nextElement = null; +return thresholdHit ? retVal : cache.poll(); + } + + private synchronized void initStoreToFile(int cacheSize) { +if (!thresholdHit) { + fileListStreamer.setName(getNextID()); + fileListStreamer.setDaemon(true); + fileListStreamer.start(); + thresholdHit = true; + LOG.info("Started streaming the list elements to file: {}, cache size {}", backingFile, cacheSize); +} + } + + private String readNextLine() { +String nextElement =
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459183 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 15/Jul/20 09:46 Start Date: 15/Jul/20 09:46 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r454928086 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileListStreamer.java ## @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hive.ql.exec.repl.util; + +import com.google.common.annotations.VisibleForTesting; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedWriter; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStreamWriter; +import java.util.concurrent.LinkedBlockingQueue; +import java.util.concurrent.TimeUnit; + +public class FileListStreamer extends Thread implements Closeable { + private static final Logger LOG = LoggerFactory.getLogger(FileListStreamer.class); + private static BufferedWriter backingFileWriterInTest; + private static final long TIMEOUT_IN_SECS = 5L; + private volatile boolean signalTostop; + private final LinkedBlockingQueue cache; + private Path backingFile; + private Configuration conf; + private BufferedWriter backingFileWriter; + private volatile boolean valid = true; + private final Object COMPLETION_LOCK = new Object(); + private volatile boolean completed = false; + private volatile boolean initialized = false; + + + + public FileListStreamer(LinkedBlockingQueue cache, Path backingFile, Configuration conf) throws IOException { +this.cache = cache; +this.backingFile = backingFile; +this.conf = conf; + } + + private void lazyInit() throws IOException { +if (backingFileWriterInTest == null) { + FileSystem fs = FileSystem.get(backingFile.toUri(), conf); + backingFileWriter = new BufferedWriter(new OutputStreamWriter(fs.create(backingFile))); +} else { + backingFileWriter = backingFileWriterInTest; +} +initialized = true; +LOG.info("Initialized a file based store to save a list at: {}", backingFile); + } + + public boolean isValid() { +return valid; + } + + // Blocks for remaining entries to be flushed to file. + @Override + public void close() throws IOException { +signalTostop = true; +synchronized (COMPLETION_LOCK) { + while (motiveToWait()) { +try { + COMPLETION_LOCK.wait(TimeUnit.SECONDS.toMillis(TIMEOUT_IN_SECS)); +} catch (InterruptedException e) { + // no-op +} + } +} +if (!isValid()) { + throw new IOException("File list is not in a valid state:" + backingFile); +} + } + + private boolean motiveToWait() { +return !completed && valid; + } + + @Override + public void run() { +try { + lazyInit(); +} catch (IOException e) { + valid = false; + throw new RuntimeException("Unable to initialize the file list streamer", e); +} +boolean exThrown = false; +while (!exThrown && (!signalTostop || !cache.isEmpty())) { + try { +String nextEntry = cache.poll(TIMEOUT_IN_SECS, TimeUnit.SECONDS); +if (nextEntry != null) { + backingFileWriter.write(nextEntry); + backingFileWriter.newLine(); + LOG.debug("Writing entry {} to file list backed by {}", nextEntry, backingFile); +} + } catch (Exception iEx) { +if (!(iEx instanceof InterruptedException)) { + // not draining any more. Inform the producer to avoid OOM. + valid = false; + LOG.error("Exception while saving the list to file " + backingFile, iEx); + exThrown = true; +} + } +} +try{ + closeBackingFile(); + completed = true; +} finally { +
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459181=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459181 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 15/Jul/20 09:44 Start Date: 15/Jul/20 09:44 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r454926767 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java ## @@ -1559,6 +1645,76 @@ public void testIncrementalLoad() throws IOException { verifyRun("SELECT a from " + replDbName + ".ptned WHERE b=2", ptnData2, driverMirror); } + @Test + public void testIncrementalLoadLazyCopy() throws IOException { Review comment: There are many existing tests with lazy load false. For external table which uses mini hdfs we already have a test for lazy load true. I will add one for other table as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459181) Time Spent: 5h (was: 4h 50m) > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch, HIVE-23069.02.patch > > Time Spent: 5h > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459176=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459176 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 15/Jul/20 09:38 Start Date: 15/Jul/20 09:38 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r454923611 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/FileOperations.java ## @@ -165,4 +178,83 @@ private void validateSrcPathListExists() throws IOException, LoginException { throw new FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage())); } } + + /** + * This needs the root data directory to which the data needs to be exported to. + * The data export here is a list of files either in table/partition that are written to the _files + * in the exportRootDataDir provided. + */ + void exportFilesAsList() throws SemanticException, IOException, LoginException { +if (dataPathList.isEmpty()) { + return; +} +Retry retryable = new Retry(IOException.class) { + @Override + public Void execute() throws Exception { +try (BufferedWriter writer = writer()) { + for (Path dataPath : dataPathList) { +writeFilesList(listFilesInDir(dataPath), writer, AcidUtils.getAcidSubDir(dataPath)); + } +} catch (IOException e) { + if (e instanceof FileNotFoundException) { Review comment: Shouldn't this suffice? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459176) Time Spent: 4h 50m (was: 4h 40m) > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch, HIVE-23069.02.patch > > Time Spent: 4h 50m > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459175=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459175 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 15/Jul/20 09:37 Start Date: 15/Jul/20 09:37 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r454923124 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/FileOperations.java ## @@ -165,4 +178,83 @@ private void validateSrcPathListExists() throws IOException, LoginException { throw new FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage())); } } + + /** + * This needs the root data directory to which the data needs to be exported to. + * The data export here is a list of files either in table/partition that are written to the _files + * in the exportRootDataDir provided. + */ + void exportFilesAsList() throws SemanticException, IOException, LoginException { +if (dataPathList.isEmpty()) { + return; +} +Retry retryable = new Retry(IOException.class) { + @Override + public Void execute() throws Exception { +try (BufferedWriter writer = writer()) { + for (Path dataPath : dataPathList) { +writeFilesList(listFilesInDir(dataPath), writer, AcidUtils.getAcidSubDir(dataPath)); + } +} catch (IOException e) { + if (e instanceof FileNotFoundException) { Review comment: if (e instanceof FileNotFoundException) { logger.error("exporting data files in dir : " + dataPathList + " to " + exportRootDataDir + " failed"); throw new FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage())); } This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459175) Time Spent: 4h 40m (was: 4.5h) > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch, HIVE-23069.02.patch > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459172=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459172 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 15/Jul/20 09:34 Start Date: 15/Jul/20 09:34 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r454921294 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpWork.java ## @@ -207,4 +217,20 @@ public ReplicationMetricCollector getMetricCollector() { public void setMetricCollector(ReplicationMetricCollector metricCollector) { this.metricCollector = metricCollector; } + + public ReplicationSpec getReplicationSpec() { +return replicationSpec; + } + + public void setReplicationSpec(ReplicationSpec replicationSpec) { +this.replicationSpec = replicationSpec; + } + + public FileList getFileList(Path backingFile, int cacheSize, HiveConf conf, boolean b) throws IOException { Review comment: This is required for some old test which isn't mockito based . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459172) Time Spent: 4.5h (was: 4h 20m) > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch, HIVE-23069.02.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.
[ https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459169=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459169 ] ASF GitHub Bot logged work on HIVE-23069: - Author: ASF GitHub Bot Created on: 15/Jul/20 09:32 Start Date: 15/Jul/20 09:32 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1225: URL: https://github.com/apache/hive/pull/1225#discussion_r454919825 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java ## @@ -210,6 +210,66 @@ public void externalTableReplicationWithDefaultPaths() throws Throwable { assertExternalFileInfo(Arrays.asList("t2", "t3", "t4"), tuple.dumpLocation, true); } + @Test + public void externalTableReplicationWithDefaultPathsLazyCopy() throws Throwable { +List lazyCopyClause = Arrays.asList("'" + HiveConf.ConfVars.REPL_DATA_COPY_LAZY.varname + "'='true'"); +//creates external tables with partitions +WarehouseInstance.Tuple tuple = primary +.run("use " + primaryDbName) +.run("create external table t1 (id int)") +.run("insert into table t1 values (1)") +.run("insert into table t1 values (2)") +.run("create external table t2 (place string) partitioned by (country string)") +.run("insert into table t2 partition(country='india') values ('bangalore')") +.run("insert into table t2 partition(country='us') values ('austin')") +.run("insert into table t2 partition(country='france') values ('paris')") +.dump(primaryDbName, lazyCopyClause); + +// verify that the external table info is written correctly for bootstrap +assertExternalFileInfo(Arrays.asList("t1", "t2"), tuple.dumpLocation, primaryDbName, false); + + + +replica.load(replicatedDbName, primaryDbName, lazyCopyClause) +.run("use " + replicatedDbName) +.run("show tables like 't1'") +.verifyResult("t1") +.run("show tables like 't2'") +.verifyResult("t2") +.run("repl status " + replicatedDbName) +.verifyResult(tuple.lastReplicationId) +.run("select country from t2 where country = 'us'") +.verifyResult("us") +.run("select country from t2 where country = 'france'") +.verifyResult("france") +.run("show partitions t2").verifyResults(new String[] {"country=france", "country=india", "country=us"}); + +String hiveDumpLocation = tuple.dumpLocation + File.separator + ReplUtils.REPL_HIVE_BASE_DIR; +// Ckpt should be set on bootstrapped db. +replica.verifyIfCkptSet(replicatedDbName, hiveDumpLocation); + +assertTablePartitionLocation(primaryDbName + ".t1", replicatedDbName + ".t1"); +assertTablePartitionLocation(primaryDbName + ".t2", replicatedDbName + ".t2"); + +tuple = primary.run("use " + primaryDbName) +.run("create external table t3 (id int)") +.run("insert into table t3 values (10)") +.run("create external table t4 as select id from t3") Review comment: How is that related to this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 459169) Time Spent: 4h 20m (was: 4h 10m) > Memory efficient iterator should be used during replication. > > > Key: HIVE-23069 > URL: https://issues.apache.org/jira/browse/HIVE-23069 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23069.01.patch, HIVE-23069.02.patch > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Currently the iterator used while copying table data is memory based. In case > of a database with very large number of table/partitions, such iterator may > cause HS2 process to go OOM. > Also introduces a config option to run data copy tasks during repl load > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005)