date:20200715

[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459665
 ]

ASF GitHub Bot logged work on HIVE-23560:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 05:24
Start Date: 16/Jul/20 05:24
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1232:
URL: https://github.com/apache/hive/pull/1232#discussion_r455506934



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -274,6 +277,120 @@ public void testAcidTablesBootstrapWithOpenTxnsTimeout() 
throws Throwable {
 verifyCompactionQueue(tables, replicatedDbName, replicaConf);
   }
 
+  @Test
+  public void testAcidTablesBootstrapWithOpenTxnsDiffDb() throws Throwable {

Review comment:
   Yes, the cases are getting tested in isolation. We should also have them 
in the one test.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459665)
Time Spent: 2h  (was: 1h 50m)

> Optimize bootstrap dump to abort only write Transactions
> 
>
> Key: HIVE-23560
> URL: https://issues.apache.org/jira/browse/HIVE-23560
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize 
> bootstrap dump to avoid aborting all transactions.pdf
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently before doing a bootstrap dump, we abort all open transactions after 
> waiting for a configured time. We are proposing to abort only write 
> transactions for the db under replication and leave the read and repl created 
> transactions as is.
> This doc attached talks about it in detail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459663=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459663
 ]

ASF GitHub Bot logged work on HIVE-23560:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 05:14
Start Date: 16/Jul/20 05:14
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1232:
URL: https://github.com/apache/hive/pull/1232#discussion_r455514405



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -274,6 +277,120 @@ public void testAcidTablesBootstrapWithOpenTxnsTimeout() 
throws Throwable {
 verifyCompactionQueue(tables, replicatedDbName, replicaConf);
   }
 
+  @Test

Review comment:
   testAcidTablesBootstrapDuringIncrementalWithOpenTxnsTimeout this test is 
already added. Is there anything else missing?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459663)
Time Spent: 1h 50m  (was: 1h 40m)

> Optimize bootstrap dump to abort only write Transactions
> 
>
> Key: HIVE-23560
> URL: https://issues.apache.org/jira/browse/HIVE-23560
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize 
> bootstrap dump to avoid aborting all transactions.pdf
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently before doing a bootstrap dump, we abort all open transactions after 
> waiting for a configured time. We are proposing to abort only write 
> transactions for the db under replication and leave the read and repl created 
> transactions as is.
> This doc attached talks about it in detail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cancel background

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=459662=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459662
 ]

ASF GitHub Bot logged work on HIVE-23727:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 05:13
Start Date: 16/Jul/20 05:13
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #1149:
URL: https://github.com/apache/hive/pull/1149


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459662)
Time Spent: 3h 20m  (was: 3h 10m)

> Improve SQLOperation log handling when cancel background
> 
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cancel background

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=459661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459661
 ]

ASF GitHub Bot logged work on HIVE-23727:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 05:13
Start Date: 16/Jul/20 05:13
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 closed pull request #1149:
URL: https://github.com/apache/hive/pull/1149


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459661)
Time Spent: 3h 10m  (was: 3h)

> Improve SQLOperation log handling when cancel background
> 
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459654
 ]

ASF GitHub Bot logged work on HIVE-23560:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 04:44
Start Date: 16/Jul/20 04:44
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1232:
URL: https://github.com/apache/hive/pull/1232#discussion_r455506934



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -274,6 +277,120 @@ public void testAcidTablesBootstrapWithOpenTxnsTimeout() 
throws Throwable {
 verifyCompactionQueue(tables, replicatedDbName, replicaConf);
   }
 
+  @Test
+  public void testAcidTablesBootstrapWithOpenTxnsDiffDb() throws Throwable {

Review comment:
   Yes, the cases are getting testes in isolation. We should also have them 
in the one test.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459654)
Time Spent: 1h 40m  (was: 1.5h)

> Optimize bootstrap dump to abort only write Transactions
> 
>
> Key: HIVE-23560
> URL: https://issues.apache.org/jira/browse/HIVE-23560
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize 
> bootstrap dump to avoid aborting all transactions.pdf
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently before doing a bootstrap dump, we abort all open transactions after 
> waiting for a configured time. We are proposing to abort only write 
> transactions for the db under replication and leave the read and repl created 
> transactions as is.
> This doc attached talks about it in detail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459641
 ]

ASF GitHub Bot logged work on HIVE-23560:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 04:19
Start Date: 16/Jul/20 04:19
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1232:
URL: https://github.com/apache/hive/pull/1232#discussion_r455500572



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -274,6 +277,120 @@ public void testAcidTablesBootstrapWithOpenTxnsTimeout() 
throws Throwable {
 verifyCompactionQueue(tables, replicatedDbName, replicaConf);
   }
 
+  @Test
+  public void testAcidTablesBootstrapWithOpenTxnsDiffDb() throws Throwable {

Review comment:
   for primary db abort txns there is already a different test. The txns 
gets aborted. This test is for secondary db which is not under replication





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459641)
Time Spent: 1.5h  (was: 1h 20m)

> Optimize bootstrap dump to abort only write Transactions
> 
>
> Key: HIVE-23560
> URL: https://issues.apache.org/jira/browse/HIVE-23560
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize 
> bootstrap dump to avoid aborting all transactions.pdf
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently before doing a bootstrap dump, we abort all open transactions after 
> waiting for a configured time. We are proposing to abort only write 
> transactions for the db under replication and leave the read and repl created 
> transactions as is.
> This doc attached talks about it in detail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-23726) Create table may throw MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string)

2020-07-15 Thread Naveen Gangam (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-23726.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fix has been committed to master. Thanks for the review [~samuelan]

> Create table may throw 
> MetaException(message:java.lang.IllegalArgumentException: Can not create a 
> Path from a null string)
> --
>
> Key: HIVE-23726
> URL: https://issues.apache.org/jira/browse/HIVE-23726
> Project: Hive
>  Issue Type: Bug
>Reporter: Istvan Fajth
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> - Given:
>  metastore.warehouse.tenant.colocation is set to true
>  a test database was created as {{create database test location '/data'}}
>  - When:
>  I try to create a table as {{create table t1 (a int) location '/data/t1'}}
>  - Then:
> The create table fails with the following exception:
> {code}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:java.lang.IllegalArgumentException: Can not create a 
> Path from a null string)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1138)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1143)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:148)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98)
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.metastore.api.MetaException: 
> java.lang.IllegalArgumentException: Can not create a Path from a null string
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63325)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63293)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result.read(ThriftHiveMetastore.java:63219)
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_req(ThriftHiveMetastore.java:1780)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_req(ThriftHiveMetastore.java:1767)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:3518)
>   at 
>

[jira] [Work logged] (HIVE-23726) Create table may throw MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string)

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23726?focusedWorklogId=459632=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459632
 ]

ASF GitHub Bot logged work on HIVE-23726:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 03:47
Start Date: 16/Jul/20 03:47
Worklog Time Spent: 10m 
  Work Description: nrg4878 closed pull request #1198:
URL: https://github.com/apache/hive/pull/1198


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459632)
Time Spent: 20m  (was: 10m)

> Create table may throw 
> MetaException(message:java.lang.IllegalArgumentException: Can not create a 
> Path from a null string)
> --
>
> Key: HIVE-23726
> URL: https://issues.apache.org/jira/browse/HIVE-23726
> Project: Hive
>  Issue Type: Bug
>Reporter: Istvan Fajth
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> - Given:
>  metastore.warehouse.tenant.colocation is set to true
>  a test database was created as {{create database test location '/data'}}
>  - When:
>  I try to create a table as {{create table t1 (a int) location '/data/t1'}}
>  - Then:
> The create table fails with the following exception:
> {code}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:java.lang.IllegalArgumentException: Can not create a 
> Path from a null string)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1138)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1143)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:148)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98)
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.metastore.api.MetaException: 
> java.lang.IllegalArgumentException: Can not create a Path from a null string
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63325)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63293)
>   at 
>

[jira] [Work logged] (HIVE-23726) Create table may throw MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string)

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23726?focusedWorklogId=459633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459633
 ]

ASF GitHub Bot logged work on HIVE-23726:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 03:47
Start Date: 16/Jul/20 03:47
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #1198:
URL: https://github.com/apache/hive/pull/1198#issuecomment-659140632


   committed to master



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459633)
Time Spent: 0.5h  (was: 20m)

> Create table may throw 
> MetaException(message:java.lang.IllegalArgumentException: Can not create a 
> Path from a null string)
> --
>
> Key: HIVE-23726
> URL: https://issues.apache.org/jira/browse/HIVE-23726
> Project: Hive
>  Issue Type: Bug
>Reporter: Istvan Fajth
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> - Given:
>  metastore.warehouse.tenant.colocation is set to true
>  a test database was created as {{create database test location '/data'}}
>  - When:
>  I try to create a table as {{create table t1 (a int) location '/data/t1'}}
>  - Then:
> The create table fails with the following exception:
> {code}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:java.lang.IllegalArgumentException: Can not create a 
> Path from a null string)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1138)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1143)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:148)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98)
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.metastore.api.MetaException: 
> java.lang.IllegalArgumentException: Can not create a Path from a null string
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63325)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63293)
>   at

[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459630=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459630
 ]

ASF GitHub Bot logged work on HIVE-23560:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 03:40
Start Date: 16/Jul/20 03:40
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1232:
URL: https://github.com/apache/hive/pull/1232#discussion_r455491023



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -986,7 +1019,8 @@ String getValidTxnListForReplDump(Hive hiveDb, long 
waitUntilTime) throws HiveEx
 // phase won't be able to replicate those txns. So, the logic is to wait 
for the given amount
 // of time to see if all open txns < current txn is getting 
aborted/committed. If not, then
 // we forcefully abort those txns just like AcidHouseKeeperService.
-ValidTxnList validTxnList = getTxnMgr().getValidTxns();
+//Exclude readonly and repl created tranasactions
+ValidTxnList validTxnList = 
getTxnMgr().getValidTxns(Arrays.asList(TxnType.READ_ONLY, 
TxnType.REPL_CREATED));

Review comment:
   Sorry, this comment was for Line number:1036. And I am referring to the 
list of excludes: Arrays.asList(TxnType.READ_ONLY, TxnType.REPL_CREATED)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459630)
Time Spent: 1h 20m  (was: 1h 10m)

> Optimize bootstrap dump to abort only write Transactions
> 
>
> Key: HIVE-23560
> URL: https://issues.apache.org/jira/browse/HIVE-23560
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize 
> bootstrap dump to avoid aborting all transactions.pdf
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently before doing a bootstrap dump, we abort all open transactions after 
> waiting for a configured time. We are proposing to abort only write 
> transactions for the db under replication and leave the read and repl created 
> transactions as is.
> This doc attached talks about it in detail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459628
 ]

ASF GitHub Bot logged work on HIVE-23560:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 03:38
Start Date: 16/Jul/20 03:38
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1232:
URL: https://github.com/apache/hive/pull/1232#discussion_r455490444



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/BaseReplicationScenariosAcidTables.java
##
@@ -336,18 +346,30 @@ void verifyInc2Load(String dbName, String lastReplId)
 return txns;
   }
 
-  void allocateWriteIdsForTables(String primaryDbName, Map 
tables,
- TxnStore txnHandler,
- List txns, HiveConf 
primaryConf) throws Throwable {
+  List allocateWriteIdsForTablesAndAquireLocks(String primaryDbName, 
Map tables,
+ TxnStore txnHandler,
+ List txns, HiveConf 
primaryConf) throws Throwable {
 AllocateTableWriteIdsRequest rqst = new AllocateTableWriteIdsRequest();
 rqst.setDbName(primaryDbName);
-
+List lockIds = new ArrayList<>();
 for(Map.Entry entry : tables.entrySet()) {
   rqst.setTableName(entry.getKey());
   rqst.setTxnIds(txns);
   txnHandler.allocateTableWriteIds(rqst);
+  for (long txnId : txns) {
+LockComponent comp = new LockComponent(LockType.SHARED_WRITE, 
LockLevel.TABLE,
+  primaryDbName);
+comp.setTablename(entry.getKey());
+comp.setOperationType(DataOperationType.UPDATE);
+List components = new ArrayList(1);
+components.add(comp);
+LockRequest lockRequest = new LockRequest(components, "u1", 
"hostname");
+lockRequest.setTxnid(txnId);
+lockIds.add(txnHandler.lock(lockRequest).getLockid());

Review comment:
   because if you just do a open txn, it will not acquire a lock





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459628)
Time Spent: 1h 10m  (was: 1h)

> Optimize bootstrap dump to abort only write Transactions
> 
>
> Key: HIVE-23560
> URL: https://issues.apache.org/jira/browse/HIVE-23560
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize 
> bootstrap dump to avoid aborting all transactions.pdf
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently before doing a bootstrap dump, we abort all open transactions after 
> waiting for a configured time. We are proposing to abort only write 
> transactions for the db under replication and leave the read and repl created 
> transactions as is.
> This doc attached talks about it in detail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459627
 ]

ASF GitHub Bot logged work on HIVE-23560:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 03:37
Start Date: 16/Jul/20 03:37
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1232:
URL: https://github.com/apache/hive/pull/1232#discussion_r455490260



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -999,20 +1033,27 @@ String getValidTxnListForReplDump(Hive hiveDb, long 
waitUntilTime) throws HiveEx
   } catch (InterruptedException e) {
 LOG.info("REPL DUMP thread sleep interrupted", e);
   }
-  validTxnList = getTxnMgr().getValidTxns();
+  validTxnList = getTxnMgr().getValidTxns(Arrays.asList(TxnType.READ_ONLY, 
TxnType.REPL_CREATED));
 }
 
 // After the timeout just force abort the open txns
-List openTxns = getOpenTxns(validTxnList);
-if (!openTxns.isEmpty()) {
-  hiveDb.abortTransactions(openTxns);
-  validTxnList = getTxnMgr().getValidTxns();
-  if (validTxnList.getMinOpenTxn() != null) {
-openTxns = getOpenTxns(validTxnList);
-LOG.warn("REPL DUMP unable to force abort all the open txns: {} after 
timeout due to unknown reasons. " +
-"However, this is rare case that shouldn't happen.", openTxns);
-throw new IllegalStateException("REPL DUMP triggered abort txns failed 
for unknown reasons.");
+if (conf.getBoolVar(REPL_BOOTSTRAP_DUMP_ABORT_WRITE_TXN_AFTER_TIMEOUT)) {
+  List openTxns = getOpenTxns(validTxnList, work.dbNameOrPattern);
+  if (!openTxns.isEmpty()) {
+//abort only write transactions for the db under replication if abort 
transactions is enabled.
+hiveDb.abortTransactions(openTxns);
+validTxnList = 
getTxnMgr().getValidTxns(Arrays.asList(TxnType.READ_ONLY, 
TxnType.REPL_CREATED));

Review comment:
   If we use the already obtained validTxnList we won't know if there are 
still open txns. This is to check all open txns that were previously open, are 
aborted and not part of invalid txn list again.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459627)
Time Spent: 1h  (was: 50m)

> Optimize bootstrap dump to abort only write Transactions
> 
>
> Key: HIVE-23560
> URL: https://issues.apache.org/jira/browse/HIVE-23560
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize 
> bootstrap dump to avoid aborting all transactions.pdf
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently before doing a bootstrap dump, we abort all open transactions after 
> waiting for a configured time. We are proposing to abort only write 
> transactions for the db under replication and leave the read and repl created 
> transactions as is.
> This doc attached talks about it in detail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459626
 ]

ASF GitHub Bot logged work on HIVE-23560:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 03:36
Start Date: 16/Jul/20 03:36
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1232:
URL: https://github.com/apache/hive/pull/1232#discussion_r455489836



##
File path: ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java
##
@@ -186,6 +186,18 @@ void replTableWriteIdState(String validWriteIdList, String 
dbName, String tableN
*/
   ValidTxnList getValidTxns() throws LockException;
 
+ /**
+  * Get the transactions that are currently valid.  The resulting
+  * {@link ValidTxnList} object can be passed as string to the processing
+  * tasks for use in the reading the data.  This call should be made once up
+  * front by the planner and should never be called on the backend,
+  * as this will violate the isolation level semantics.
+  * @return list of valid transactions.
+  * @param  txnTypes list of transaction types that should be excluded.
+  * @throws LockException
+  */
+  ValidTxnList getValidTxns(List txnTypes) throws LockException;

Review comment:
   txnTypes is set of 5 enum values. Its not a free flowing list. So 
filter/regex is not required. Same interface can be used to exclude any txntype 
by passing in that list.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459626)
Time Spent: 50m  (was: 40m)

> Optimize bootstrap dump to abort only write Transactions
> 
>
> Key: HIVE-23560
> URL: https://issues.apache.org/jira/browse/HIVE-23560
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize 
> bootstrap dump to avoid aborting all transactions.pdf
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently before doing a bootstrap dump, we abort all open transactions after 
> waiting for a configured time. We are proposing to abort only write 
> transactions for the db under replication and leave the read and repl created 
> transactions as is.
> This doc attached talks about it in detail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459625
 ]

ASF GitHub Bot logged work on HIVE-23560:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 03:35
Start Date: 16/Jul/20 03:35
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1232:
URL: https://github.com/apache/hive/pull/1232#discussion_r455489488



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -986,7 +1019,8 @@ String getValidTxnListForReplDump(Hive hiveDb, long 
waitUntilTime) throws HiveEx
 // phase won't be able to replicate those txns. So, the logic is to wait 
for the given amount
 // of time to see if all open txns < current txn is getting 
aborted/committed. If not, then
 // we forcefully abort those txns just like AcidHouseKeeperService.
-ValidTxnList validTxnList = getTxnMgr().getValidTxns();
+//Exclude readonly and repl created tranasactions
+ValidTxnList validTxnList = 
getTxnMgr().getValidTxns(Arrays.asList(TxnType.READ_ONLY, 
TxnType.REPL_CREATED));

Review comment:
   It is outside the loop.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459625)
Time Spent: 40m  (was: 0.5h)

> Optimize bootstrap dump to abort only write Transactions
> 
>
> Key: HIVE-23560
> URL: https://issues.apache.org/jira/browse/HIVE-23560
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize 
> bootstrap dump to avoid aborting all transactions.pdf
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently before doing a bootstrap dump, we abort all open transactions after 
> waiting for a configured time. We are proposing to abort only write 
> transactions for the db under replication and leave the read and repl created 
> transactions as is.
> This doc attached talks about it in detail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459623
 ]

ASF GitHub Bot logged work on HIVE-23560:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 03:32
Start Date: 16/Jul/20 03:32
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1232:
URL: https://github.com/apache/hive/pull/1232#discussion_r455488814



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -550,6 +550,11 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Indicates the timeout for all transactions which are opened before 
triggering bootstrap REPL DUMP. "
 + "If these open transactions are not closed within the timeout 
value, then REPL DUMP will "
 + "forcefully abort those transactions and continue with bootstrap 
dump."),
+
REPL_BOOTSTRAP_DUMP_ABORT_WRITE_TXN_AFTER_TIMEOUT("hive.repl.bootstrap.dump.abort.write.txn.after.timeout",

Review comment:
   this is whether to abort txn after timeout or not flag. This is not 
timeout value flag. So after is added





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459623)
Time Spent: 0.5h  (was: 20m)

> Optimize bootstrap dump to abort only write Transactions
> 
>
> Key: HIVE-23560
> URL: https://issues.apache.org/jira/browse/HIVE-23560
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23560.01.patch, HIVE-23560.02.patch, Optimize 
> bootstrap dump to avoid aborting all transactions.pdf
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently before doing a bootstrap dump, we abort all open transactions after 
> waiting for a configured time. We are proposing to abort only write 
> transactions for the db under replication and leave the read and repl created 
> transactions as is.
> This doc attached talks about it in detail



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459595=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459595
 ]

ASF GitHub Bot logged work on HIVE-23815:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 01:33
Start Date: 16/Jul/20 01:33
Worklog Time Spent: 10m 
  Work Description: xinghuayu007 opened a new pull request #1227:
URL: https://github.com/apache/hive/pull/1227


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459595)
Time Spent: 2h 50m  (was: 2h 40m)

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459594
 ]

ASF GitHub Bot logged work on HIVE-23815:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 01:32
Start Date: 16/Jul/20 01:32
Worklog Time Spent: 10m 
  Work Description: xinghuayu007 closed pull request #1227:
URL: https://github.com/apache/hive/pull/1227


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459594)
Time Spent: 2h 40m  (was: 2.5h)

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459591
 ]

ASF GitHub Bot logged work on HIVE-23815:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 01:01
Start Date: 16/Jul/20 01:01
Worklog Time Spent: 10m 
  Work Description: xinghuayu007 opened a new pull request #1227:
URL: https://github.com/apache/hive/pull/1227


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459591)
Time Spent: 2.5h  (was: 2h 20m)

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-23244) Extract Create View analyzer from SemanticAnalyzer

2020-07-15 Thread Miklos Gergely (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely resolved HIVE-23244.
---
Resolution: Fixed

> Extract Create View analyzer from SemanticAnalyzer
> --
>
> Key: HIVE-23244
> URL: https://issues.apache.org/jira/browse/HIVE-23244
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23244.01.patch, HIVE-23244.02.patch, 
> HIVE-23244.03.patch, HIVE-23244.04.patch, HIVE-23244.05.patch, 
> HIVE-23244.06.patch, HIVE-23244.07.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Create View commands are not queries, but commands which have queries as a 
> part of them. Therefore a separate CreateViewAnalyzer is needed which uses 
> SemanticAnalyer to analyze it's query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23857) Fix HiveParser "code too large" problem

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23857?focusedWorklogId=459584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459584
 ]

ASF GitHub Bot logged work on HIVE-23857:
-

Author: ASF GitHub Bot
Created on: 16/Jul/20 00:40
Start Date: 16/Jul/20 00:40
Worklog Time Spent: 10m 
  Work Description: miklosgergely merged pull request #1258:
URL: https://github.com/apache/hive/pull/1258


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459584)
Time Spent: 50m  (was: 40m)

> Fix HiveParser "code too large" problem
> ---
>
> Key: HIVE-23857
> URL: https://issues.apache.org/jira/browse/HIVE-23857
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HivePasrser.g can not be extended anymore as adding any more tokens leads to 
> a "code too large" problem, because the compiled code size would exceed 65536 
> bytes. The real solution would be to introduce anltr4, in the meantime it can 
> be fixed be moving the tokenNames variable into a separate file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-23857) Fix HiveParser "code too large" problem

2020-07-15 Thread Miklos Gergely (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely resolved HIVE-23857.
---
Resolution: Fixed

Merged to master, thank you [~belugabehr] and [~gopalv]

> Fix HiveParser "code too large" problem
> ---
>
> Key: HIVE-23857
> URL: https://issues.apache.org/jira/browse/HIVE-23857
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HivePasrser.g can not be extended anymore as adding any more tokens leads to 
> a "code too large" problem, because the compiled code size would exceed 65536 
> bytes. The real solution would be to introduce anltr4, in the meantime it can 
> be fixed be moving the tokenNames variable into a separate file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23857) Fix HiveParser "code too large" problem

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23857?focusedWorklogId=459530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459530
 ]

ASF GitHub Bot logged work on HIVE-23857:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 21:14
Start Date: 15/Jul/20 21:14
Worklog Time Spent: 10m 
  Work Description: miklosgergely commented on a change in pull request 
#1258:
URL: https://github.com/apache/hive/pull/1258#discussion_r455352739



##
File path: parser/bin/fixHiveParser.sh
##
@@ -0,0 +1,44 @@
+#!/bin/bash
+
+# This is a temporary solution for the issue of the "code too large" problem 
related to HiveParser.java
+# We got to a point where adding anything to the antlr files lead to an issue 
about having a HiveParser.java that can not be compiled due to the compiled 
code size limitation in java (maximum 65536 bytes), so to avoid it we temorarly 
add this script to remove the huge tokenNames array into a separate file.
+# The real solution would be to switch to antlr 4
+
+tokenFile="target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParserTokens.java"
+input="target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParser.java"
+output="target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParser.java-fixed"
+
+rm $tokenFile > /dev/null 2>&1
+rm $output > /dev/null 2>&1
+
+echo "package org.apache.hadoop.hive.ql.parse;" >> $tokenFile
+echo "" >> $tokenFile
+echo "public class HiveParserTokens {" >> $tokenFile
+
+state="STAY"
+while IFS= read -r line

Review comment:
   Thank you @t3rmin4t0r, I've modified the patch using awk.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459530)
Time Spent: 40m  (was: 0.5h)

> Fix HiveParser "code too large" problem
> ---
>
> Key: HIVE-23857
> URL: https://issues.apache.org/jira/browse/HIVE-23857
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HivePasrser.g can not be extended anymore as adding any more tokens leads to 
> a "code too large" problem, because the compiled code size would exceed 65536 
> bytes. The real solution would be to introduce anltr4, in the meantime it can 
> be fixed be moving the tokenNames variable into a separate file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23850) Allow PPD when subject is not a column with grouping sets present

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23850?focusedWorklogId=459527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459527
 ]

ASF GitHub Bot logged work on HIVE-23850:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 21:07
Start Date: 15/Jul/20 21:07
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1255:
URL: https://github.com/apache/hive/pull/1255#issuecomment-659010617


   ...Sorry for this, @jcamachor could you please take a look at? thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459527)
Time Spent: 20m  (was: 10m)

> Allow PPD when subject is not a column with grouping sets present
> -
>
> Key: HIVE-23850
> URL: https://issues.apache.org/jira/browse/HIVE-23850
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> After [HIVE-19653|https://issues.apache.org/jira/browse/HIVE-19653],  filters 
> with only columns and constants are pushed down, but in some cases,  this may 
> not work as well, for example:
> SET hive.cbo.enable=false;
> SELECT a, b, sum(s)
> FROM T1
> GROUP BY a, b GROUPING SETS ((a), (a, b))
> HAVING upper(a) = "AAA" AND sum(s) > 100;
>  
> SELECT upper(a), b, sum(s)
> FROM T1
> GROUP BY upper(a), b GROUPING SETS ((upper(a)), (upper(a), b))
> HAVING upper(a) = "AAA" AND sum(s) > 100;
>  
> The filters pushed down to GBY can be f(gbyKey) or gbyKey with udf ,  not 
> only the column groupby keys.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23857) Fix HiveParser "code too large" problem

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23857?focusedWorklogId=459505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459505
 ]

ASF GitHub Bot logged work on HIVE-23857:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 19:50
Start Date: 15/Jul/20 19:50
Worklog Time Spent: 10m 
  Work Description: t3rmin4t0r commented on a change in pull request #1258:
URL: https://github.com/apache/hive/pull/1258#discussion_r455302352



##
File path: parser/bin/fixHiveParser.sh
##
@@ -0,0 +1,44 @@
+#!/bin/bash
+
+# This is a temporary solution for the issue of the "code too large" problem 
related to HiveParser.java
+# We got to a point where adding anything to the antlr files lead to an issue 
about having a HiveParser.java that can not be compiled due to the compiled 
code size limitation in java (maximum 65536 bytes), so to avoid it we temorarly 
add this script to remove the huge tokenNames array into a separate file.
+# The real solution would be to switch to antlr 4
+
+tokenFile="target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParserTokens.java"
+input="target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParser.java"
+output="target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParser.java-fixed"
+
+rm $tokenFile > /dev/null 2>&1
+rm $output > /dev/null 2>&1
+
+echo "package org.apache.hadoop.hive.ql.parse;" >> $tokenFile
+echo "" >> $tokenFile
+echo "public class HiveParserTokens {" >> $tokenFile
+
+state="STAY"
+while IFS= read -r line

Review comment:
   Looks like AWK reinvented in bash





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459505)
Time Spent: 0.5h  (was: 20m)

> Fix HiveParser "code too large" problem
> ---
>
> Key: HIVE-23857
> URL: https://issues.apache.org/jira/browse/HIVE-23857
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HivePasrser.g can not be extended anymore as adding any more tokens leads to 
> a "code too large" problem, because the compiled code size would exceed 65536 
> bytes. The real solution would be to introduce anltr4, in the meantime it can 
> be fixed be moving the tokenNames variable into a separate file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23856) Beeline Should Print Binary Data in Base64

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23856:
--
Labels: pull-request-available  (was: )

> Beeline Should Print Binary Data in Base64
> --
>
> Key: HIVE-23856
> URL: https://issues.apache.org/jira/browse/HIVE-23856
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Hunter Logan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Make binary data formatted as Base64 to make it more parse-able by external 
> applications and easier for humans to convert using a Base64 tool.
> https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/Rows.java#L165



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23856) Beeline Should Print Binary Data in Base64

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23856?focusedWorklogId=459503=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459503
 ]

ASF GitHub Bot logged work on HIVE-23856:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 19:39
Start Date: 15/Jul/20 19:39
Worklog Time Spent: 10m 
  Work Description: HunterL opened a new pull request #1261:
URL: https://github.com/apache/hive/pull/1261


   Fixed Binary data type in beeline rows to encode to Base64
   
   https://issues.apache.org/jira/browse/HIVE-23856
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459503)
Remaining Estimate: 0h
Time Spent: 10m

> Beeline Should Print Binary Data in Base64
> --
>
> Key: HIVE-23856
> URL: https://issues.apache.org/jira/browse/HIVE-23856
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Hunter Logan
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Make binary data formatted as Base64 to make it more parse-able by external 
> applications and easier for humans to convert using a Base64 tool.
> https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/Rows.java#L165



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-15 Thread Pravin Sinha (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-23069:

Attachment: HIVE-23069.03.patch

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch, HIVE-23069.02.patch, 
> HIVE-23069.03.patch
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23560) Optimize bootstrap dump to abort only write Transactions

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23560?focusedWorklogId=459458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459458
 ]

ASF GitHub Bot logged work on HIVE-23560:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 18:16
Start Date: 15/Jul/20 18:16
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1232:
URL: https://github.com/apache/hive/pull/1232#discussion_r454865275



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -986,7 +1019,8 @@ String getValidTxnListForReplDump(Hive hiveDb, long 
waitUntilTime) throws HiveEx
 // phase won't be able to replicate those txns. So, the logic is to wait 
for the given amount
 // of time to see if all open txns < current txn is getting 
aborted/committed. If not, then
 // we forcefully abort those txns just like AcidHouseKeeperService.
-ValidTxnList validTxnList = getTxnMgr().getValidTxns();
+//Exclude readonly and repl created tranasactions
+ValidTxnList validTxnList = 
getTxnMgr().getValidTxns(Arrays.asList(TxnType.READ_ONLY, 
TxnType.REPL_CREATED));

Review comment:
   Create the list once , outside the loop.

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -550,6 +550,11 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "Indicates the timeout for all transactions which are opened before 
triggering bootstrap REPL DUMP. "
 + "If these open transactions are not closed within the timeout 
value, then REPL DUMP will "
 + "forcefully abort those transactions and continue with bootstrap 
dump."),
+
REPL_BOOTSTRAP_DUMP_ABORT_WRITE_TXN_AFTER_TIMEOUT("hive.repl.bootstrap.dump.abort.write.txn.after.timeout",

Review comment:
   nit: remove 'after' from the config name.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -999,20 +1033,27 @@ String getValidTxnListForReplDump(Hive hiveDb, long 
waitUntilTime) throws HiveEx
   } catch (InterruptedException e) {
 LOG.info("REPL DUMP thread sleep interrupted", e);
   }
-  validTxnList = getTxnMgr().getValidTxns();
+  validTxnList = getTxnMgr().getValidTxns(Arrays.asList(TxnType.READ_ONLY, 
TxnType.REPL_CREATED));
 }
 
 // After the timeout just force abort the open txns
-List openTxns = getOpenTxns(validTxnList);
-if (!openTxns.isEmpty()) {
-  hiveDb.abortTransactions(openTxns);
-  validTxnList = getTxnMgr().getValidTxns();
-  if (validTxnList.getMinOpenTxn() != null) {
-openTxns = getOpenTxns(validTxnList);
-LOG.warn("REPL DUMP unable to force abort all the open txns: {} after 
timeout due to unknown reasons. " +
-"However, this is rare case that shouldn't happen.", openTxns);
-throw new IllegalStateException("REPL DUMP triggered abort txns failed 
for unknown reasons.");
+if (conf.getBoolVar(REPL_BOOTSTRAP_DUMP_ABORT_WRITE_TXN_AFTER_TIMEOUT)) {
+  List openTxns = getOpenTxns(validTxnList, work.dbNameOrPattern);
+  if (!openTxns.isEmpty()) {
+//abort only write transactions for the db under replication if abort 
transactions is enabled.
+hiveDb.abortTransactions(openTxns);
+validTxnList = 
getTxnMgr().getValidTxns(Arrays.asList(TxnType.READ_ONLY, 
TxnType.REPL_CREATED));

Review comment:
   Shouldn't already obtained validTxnList be used here? 

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java
##
@@ -975,6 +981,33 @@ private String getValidWriteIdList(String dbName, String 
tblName, String validTx
 return openTxns;
   }
 
+  List getOpenTxns(ValidTxnList validTxnList, String dbName) throws 
LockException {
+HiveLockManager lockManager = getTxnMgr().getLockManager();
+long[] invalidTxns = validTxnList.getInvalidTransactions();
+List openTxns = new ArrayList<>();
+List dbTxns = new ArrayList<>();

Review comment:
   Can be replaced with a HashSet  for faster lookup.

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -274,6 +277,120 @@ public void testAcidTablesBootstrapWithOpenTxnsTimeout() 
throws Throwable {
 verifyCompactionQueue(tables, replicatedDbName, replicaConf);
   }
 
+  @Test
+  public void testAcidTablesBootstrapWithOpenTxnsDiffDb() throws Throwable {

Review comment:
   Also add a test for a case where few open txn from primary  db and few 
from secondary. During dump txn ids from primary gets aborted but for secondary 
they are not.

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcidTables.java
##
@@ -274,6 +277,120 @@ public void

[jira] [Work logged] (HIVE-23857) Fix HiveParser "code too large" problem

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23857?focusedWorklogId=459431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459431
 ]

ASF GitHub Bot logged work on HIVE-23857:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 17:32
Start Date: 15/Jul/20 17:32
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1258:
URL: https://github.com/apache/hive/pull/1258#issuecomment-658902129


   I once looked at this and was trying to avoid this route, but I guess it's 
all we can do unless someone wants to take up the mantle of going to ANTL4.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459431)
Time Spent: 20m  (was: 10m)

> Fix HiveParser "code too large" problem
> ---
>
> Key: HIVE-23857
> URL: https://issues.apache.org/jira/browse/HIVE-23857
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HivePasrser.g can not be extended anymore as adding any more tokens leads to 
> a "code too large" problem, because the compiled code size would exceed 65536 
> bytes. The real solution would be to introduce anltr4, in the meantime it can 
> be fixed be moving the tokenNames variable into a separate file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23853) CRUD based compaction also should update ACID file version metadata

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23853?focusedWorklogId=459428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459428
 ]

ASF GitHub Bot logged work on HIVE-23853:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 17:27
Start Date: 15/Jul/20 17:27
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #1256:
URL: https://github.com/apache/hive/pull/1256#issuecomment-658899132


   Had to recreate the patch request, since I was not able to push my changes 
after manually merging stuff here :(
   Please check https://github.com/apache/hive/pull/1259
   Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459428)
Time Spent: 1h 10m  (was: 1h)

> CRUD based compaction also should update ACID file version metadata
> ---
>
> Key: HIVE-23853
> URL: https://issues.apache.org/jira/browse/HIVE-23853
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Current CRUD compaction does not update the file metadata to contain the ACID 
> version. Also the {{_orc_acid_version}} version file is not created.
> We should do this to be consistent across the board.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23853) CRUD based compaction also should update ACID file version metadata

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23853?focusedWorklogId=459427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459427
 ]

ASF GitHub Bot logged work on HIVE-23853:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 17:27
Start Date: 15/Jul/20 17:27
Worklog Time Spent: 10m 
  Work Description: pvary closed pull request #1256:
URL: https://github.com/apache/hive/pull/1256


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459427)
Time Spent: 1h  (was: 50m)

> CRUD based compaction also should update ACID file version metadata
> ---
>
> Key: HIVE-23853
> URL: https://issues.apache.org/jira/browse/HIVE-23853
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Current CRUD compaction does not update the file metadata to contain the ACID 
> version. Also the {{_orc_acid_version}} version file is not created.
> We should do this to be consistent across the board.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23853) CRUD based compaction also should update ACID file version metadata

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23853?focusedWorklogId=459425=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459425
 ]

ASF GitHub Bot logged work on HIVE-23853:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 17:26
Start Date: 15/Jul/20 17:26
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1256:
URL: https://github.com/apache/hive/pull/1256#discussion_r455213413



##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -4206,6 +4206,10 @@ private static void copyFiles(final HiveConf conf, final 
FileSystem destFs,
 }
   }
   files = fileStatuses.toArray(new FileStatus[files.length]);
+
+  if (HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_WRITE_ACID_VERSION_FILE)) {
+AcidUtils.OrcAcidVersion.writeVersionFile(destf, destFs);
+  }

Review comment:
   It was a my first try to have it in the FSOperator which I have forgot 
to remove. After FSO we have like 3 moves which I wanted to avoid, so since we 
already have a Compactor specific change in copyFiles, I have decided to reuse 
it

##
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java
##
@@ -25,9 +25,11 @@
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.Collections;
+import java.util.HashSet;

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459425)
Time Spent: 50m  (was: 40m)

> CRUD based compaction also should update ACID file version metadata
> ---
>
> Key: HIVE-23853
> URL: https://issues.apache.org/jira/browse/HIVE-23853
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Current CRUD compaction does not update the file metadata to contain the ACID 
> version. Also the {{_orc_acid_version}} version file is not created.
> We should do this to be consistent across the board.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23853) CRUD based compaction also should update ACID file version metadata

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23853?focusedWorklogId=459423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459423
 ]

ASF GitHub Bot logged work on HIVE-23853:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 17:24
Start Date: 15/Jul/20 17:24
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1256:
URL: https://github.com/apache/hive/pull/1256#discussion_r455212090



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
##
@@ -161,6 +163,7 @@ private static synchronized MemoryManager 
getThreadLocalOrcLlapMemoryManager(fin
 LlapProxy.isDaemon()) {
 memory(getThreadLocalOrcLlapMemoryManager(conf));
   }
+  isCompaction = tableProperties != null && 
"true".equals(tableProperties.getProperty(AcidUtils.COMPACTOR_TABLE_PROPERTY));

Review comment:
   Done. Thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459423)
Time Spent: 40m  (was: 0.5h)

> CRUD based compaction also should update ACID file version metadata
> ---
>
> Key: HIVE-23853
> URL: https://issues.apache.org/jira/browse/HIVE-23853
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Current CRUD compaction does not update the file metadata to contain the ACID 
> version. Also the {{_orc_acid_version}} version file is not created.
> We should do this to be consistent across the board.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23853) CRUD based compaction also should update ACID file version metadata

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23853?focusedWorklogId=459421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459421
 ]

ASF GitHub Bot logged work on HIVE-23853:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 17:22
Start Date: 15/Jul/20 17:22
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request #1259:
URL: https://github.com/apache/hive/pull/1259


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459421)
Time Spent: 0.5h  (was: 20m)

> CRUD based compaction also should update ACID file version metadata
> ---
>
> Key: HIVE-23853
> URL: https://issues.apache.org/jira/browse/HIVE-23853
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Current CRUD compaction does not update the file metadata to contain the ACID 
> version. Also the {{_orc_acid_version}} version file is not created.
> We should do this to be consistent across the board.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23857) Fix HiveParser "code too large" problem

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23857?focusedWorklogId=459402=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459402
 ]

ASF GitHub Bot logged work on HIVE-23857:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 16:58
Start Date: 15/Jul/20 16:58
Worklog Time Spent: 10m 
  Work Description: miklosgergely opened a new pull request #1258:
URL: https://github.com/apache/hive/pull/1258


   HivePasrser.g can not be extended anymore as adding any more tokens leads to 
a "code too large" problem, because the compiled code size would exceed 65536 
bytes. The real solution would be to introduce anltr4, in the meantime it can 
be fixed be moving the tokenNames variable into a separate file.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459402)
Remaining Estimate: 0h
Time Spent: 10m

> Fix HiveParser "code too large" problem
> ---
>
> Key: HIVE-23857
> URL: https://issues.apache.org/jira/browse/HIVE-23857
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HivePasrser.g can not be extended anymore as adding any more tokens leads to 
> a "code too large" problem, because the compiled code size would exceed 65536 
> bytes. The real solution would be to introduce anltr4, in the meantime it can 
> be fixed be moving the tokenNames variable into a separate file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23857) Fix HiveParser "code too large" problem

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23857:
--
Labels: pull-request-available  (was: )

> Fix HiveParser "code too large" problem
> ---
>
> Key: HIVE-23857
> URL: https://issues.apache.org/jira/browse/HIVE-23857
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HivePasrser.g can not be extended anymore as adding any more tokens leads to 
> a "code too large" problem, because the compiled code size would exceed 65536 
> bytes. The real solution would be to introduce anltr4, in the meantime it can 
> be fixed be moving the tokenNames variable into a separate file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23857) Fix HiveParser "code too large" problem

2020-07-15 Thread Miklos Gergely (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely reassigned HIVE-23857:
-


> Fix HiveParser "code too large" problem
> ---
>
> Key: HIVE-23857
> URL: https://issues.apache.org/jira/browse/HIVE-23857
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>
> HivePasrser.g can not be extended anymore as adding any more tokens leads to 
> a "code too large" problem, because the compiled code size would exceed 65536 
> bytes. The real solution would be to introduce anltr4, in the meantime it can 
> be fixed be moving the tokenNames variable into a separate file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23856) Beeline Should Print Binary Data in Base64

2020-07-15 Thread Hunter Logan (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hunter Logan reassigned HIVE-23856:
---

Assignee: Hunter Logan

> Beeline Should Print Binary Data in Base64
> --
>
> Key: HIVE-23856
> URL: https://issues.apache.org/jira/browse/HIVE-23856
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Hunter Logan
>Priority: Minor
>
> Make binary data formatted as Base64 to make it more parse-able by external 
> applications and easier for humans to convert using a Base64 tool.
> https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/Rows.java#L165



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23838) KafkaRecordIteratorTest is flaky

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23838?focusedWorklogId=459384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459384
 ]

ASF GitHub Bot logged work on HIVE-23838:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 16:38
Start Date: 15/Jul/20 16:38
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1245:
URL: https://github.com/apache/hive/pull/1245#issuecomment-658871179


   Godspeed @klcopp !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459384)
Time Spent: 1.5h  (was: 1h 20m)

> KafkaRecordIteratorTest is flaky
> 
>
> Key: HIVE-23838
> URL: https://issues.apache.org/jira/browse/HIVE-23838
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Failed on [4th run of flaky test 
> checker|http://ci.hive.apache.org/job/hive-flaky-check/69/] with
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 1milliseconds while awaiting InitProducerId



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-23804) Adding defaults for Columns Stats table in the schema to make them backward compatible

2020-07-15 Thread Abhishek Somani (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158515#comment-17158515
 ] 

Abhishek Somani edited comment on HIVE-23804 at 7/15/20, 4:33 PM:
--

[~ngangam] the issue exists in all metastore schemas of 3.0 and above. When any 
such schema is used with an HMS version < 3.0, one will hit the issue.

 

So for example, if you use metastore db schema of 4.0.0, but HMS version 2.3, 
one will hit the issue.

So basically the backward compatibility of metastore db with older HMS versions 
is broken.


was (Author: asomani):
[~ngangam] the issue exists in all metastore schemas of 3.0 and above. When any 
such schema is used with an HMS version <= 3.0, one will hit the issue.

 

So for example, if you use metastore db schema of 4.0.0, but HMS version 2.3, 
one will hit the issue.

So basically the backward compatibility of metastore db with older HMS versions 
is broken.

> Adding defaults for Columns Stats table in the schema to make them backward 
> compatible
> --
>
> Key: HIVE-23804
> URL: https://issues.apache.org/jira/browse/HIVE-23804
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 2.3.7
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-23804-1.patch, HIVE-23804.patch
>
>
> Since the table/part column statistics tables have added a new `CAT_NAME` 
> column with `NOT NULL` constraint in version >3.0.0, queries to analyze 
> statistics break for Hive versions <3.0.0 when used against an upgraded DB. 
> One such miss is handled in HIVE-21739.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23804) Adding defaults for Columns Stats table in the schema to make them backward compatible

2020-07-15 Thread Abhishek Somani (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158515#comment-17158515
 ] 

Abhishek Somani commented on HIVE-23804:


[~ngangam] the issue exists in all metastore schemas of 3.0 and above. When any 
such schema is used with an HMS version <= 3.0, one will hit the issue.

 

So for example, if you use metastore db schema of 4.0.0, but HMS version 2.3, 
one will hit the issue.

So basically the backward compatibility of metastore db with older HMS versions 
is broken.

> Adding defaults for Columns Stats table in the schema to make them backward 
> compatible
> --
>
> Key: HIVE-23804
> URL: https://issues.apache.org/jira/browse/HIVE-23804
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 2.3.7
>Reporter: Aditya Shah
>Assignee: Aditya Shah
>Priority: Major
> Attachments: HIVE-23804-1.patch, HIVE-23804.patch
>
>
> Since the table/part column statistics tables have added a new `CAT_NAME` 
> column with `NOT NULL` constraint in version >3.0.0, queries to analyze 
> statistics break for Hive versions <3.0.0 when used against an upgraded DB. 
> One such miss is handled in HIVE-21739.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23852) Natively support Date type in ReduceSink operator

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23852:
--
Labels: pull-request-available  (was: )

> Natively support Date type in ReduceSink operator
> -
>
> Key: HIVE-23852
> URL: https://issues.apache.org/jira/browse/HIVE-23852
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is no native support currently meaning that these types end up being 
> serialized as multi-key columns which is much slower (iterating through batch 
> columns instead of writing a value directly)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23852) Natively support Date type in ReduceSink operator

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23852?focusedWorklogId=459377=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459377
 ]

ASF GitHub Bot logged work on HIVE-23852:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 16:23
Start Date: 15/Jul/20 16:23
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #1257:
URL: https://github.com/apache/hive/pull/1257


   Adding support for Date type in ReduceSink operator for native vector sink
   
   Change-Id: I0b151b72d70f3f57278144def5b64a063cd77623
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459377)
Remaining Estimate: 0h
Time Spent: 10m

> Natively support Date type in ReduceSink operator
> -
>
> Key: HIVE-23852
> URL: https://issues.apache.org/jira/browse/HIVE-23852
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is no native support currently meaning that these types end up being 
> serialized as multi-key columns which is much slower (iterating through batch 
> columns instead of writing a value directly)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23855) TestQueryShutdownHooks is flaky

2020-07-15 Thread Mustafa Iman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman reassigned HIVE-23855:
---

Assignee: Mustafa Iman

> TestQueryShutdownHooks is flaky
> ---
>
> Key: HIVE-23855
> URL: https://issues.apache.org/jira/browse/HIVE-23855
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Mustafa Iman
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/100/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23852) Natively support Date type in ReduceSink operator

2020-07-15 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-23852:
--
Summary: Natively support Date type in ReduceSink operator  (was: Natively 
support Date and Timestamp types in ReduceSink operator)

> Natively support Date type in ReduceSink operator
> -
>
> Key: HIVE-23852
> URL: https://issues.apache.org/jira/browse/HIVE-23852
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>
> There is no native support currently meaning that these types end up being 
> serialized as multi-key columns which is much slower (iterating through batch 
> columns instead of writing a value directly)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23853) CRUD based compaction also should update ACID file version metadata

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23853?focusedWorklogId=459362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459362
 ]

ASF GitHub Bot logged work on HIVE-23853:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 16:04
Start Date: 15/Jul/20 16:04
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1256:
URL: https://github.com/apache/hive/pull/1256#discussion_r455104994



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
##
@@ -161,6 +163,7 @@ private static synchronized MemoryManager 
getThreadLocalOrcLlapMemoryManager(fin
 LlapProxy.isDaemon()) {
 memory(getThreadLocalOrcLlapMemoryManager(conf));
   }
+  isCompaction = tableProperties != null && 
"true".equals(tableProperties.getProperty(AcidUtils.COMPACTOR_TABLE_PROPERTY));

Review comment:
   can use AcidUtils#isCompactionTable here

##
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java
##
@@ -25,9 +25,11 @@
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.Collections;
+import java.util.HashSet;

Review comment:
   unused import

##
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##
@@ -4206,6 +4206,10 @@ private static void copyFiles(final HiveConf conf, final 
FileSystem destFs,
 }
   }
   files = fileStatuses.toArray(new FileStatus[files.length]);
+
+  if (HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_WRITE_ACID_VERSION_FILE)) {
+AcidUtils.OrcAcidVersion.writeVersionFile(destf, destFs);
+  }

Review comment:
   Why write the version file to the destination file during MoveTask, if 
we already wrote one in FSOperator and we move everything that was written in 
FSOp to the destination file?

##
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java
##
@@ -25,9 +25,11 @@
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.Collections;
+import java.util.HashSet;
 import java.util.LinkedList;
 import java.util.List;
 import java.util.Optional;
+import java.util.Set;

Review comment:
   unused import





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459362)
Time Spent: 20m  (was: 10m)

> CRUD based compaction also should update ACID file version metadata
> ---
>
> Key: HIVE-23853
> URL: https://issues.apache.org/jira/browse/HIVE-23853
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Current CRUD compaction does not update the file metadata to contain the ACID 
> version. Also the {{_orc_acid_version}} version file is not created.
> We should do this to be consistent across the board.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23492) Remove unnecessary FileSystem#exists calls from ql module

2020-07-15 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HIVE-23492:
--
Description: Wherever there is an exists() call before open() or delete(), 
remove it and infer from the FileNotFoundException raised in open/delete that 
the file does not exist. Exists() just checks for a FileNotFoundException so 
it's a waste of time, especially on higher-latency stores  (was: Wherever there 
is an exists() call before open() or delete(), remove it and infer from the 
FileNotFoundException raised in open/delete that the file does not exist. 
Exists() just checks for a FileNotFoundException so it's a waste of time, 
especially on clunkier FSes)

> Remove unnecessary FileSystem#exists calls from ql module
> -
>
> Key: HIVE-23492
> URL: https://issues.apache.org/jira/browse/HIVE-23492
> Project: Hive
>  Issue Type: Improvement
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23492.01.patch, HIVE-23492.02.patch, 
> HIVE-23492.03.patch, HIVE-23492.04.patch, HIVE-23492.05.patch
>
>
> Wherever there is an exists() call before open() or delete(), remove it and 
> infer from the FileNotFoundException raised in open/delete that the file does 
> not exist. Exists() just checks for a FileNotFoundException so it's a waste 
> of time, especially on higher-latency stores



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459342
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 15:22
Start Date: 15/Jul/20 15:22
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r455136416



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpWork.java
##
@@ -207,4 +217,20 @@ public ReplicationMetricCollector getMetricCollector() {
   public void setMetricCollector(ReplicationMetricCollector metricCollector) {
 this.metricCollector = metricCollector;
   }
+
+  public ReplicationSpec getReplicationSpec() {
+return replicationSpec;
+  }
+
+  public void setReplicationSpec(ReplicationSpec replicationSpec) {
+this.replicationSpec = replicationSpec;
+  }
+
+  public FileList getFileList(Path backingFile, int cacheSize, HiveConf conf, 
boolean b) throws IOException {

Review comment:
   removed the change altogether





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459342)
Time Spent: 6h  (was: 5h 50m)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch, HIVE-23069.02.patch
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23838) KafkaRecordIteratorTest is flaky

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23838?focusedWorklogId=459340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459340
 ]

ASF GitHub Bot logged work on HIVE-23838:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 15:18
Start Date: 15/Jul/20 15:18
Worklog Time Spent: 10m 
  Work Description: klcopp commented on pull request #1245:
URL: https://github.com/apache/hive/pull/1245#issuecomment-658829863


   @belugabehr Thanks! I haven't found a real solution to this and it's getting 
worse. So apparently Z. Haindrich is disabling the test until I fix it.. or 
someone fixes it..



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459340)
Time Spent: 1h 20m  (was: 1h 10m)

> KafkaRecordIteratorTest is flaky
> 
>
> Key: HIVE-23838
> URL: https://issues.apache.org/jira/browse/HIVE-23838
> Project: Hive
>  Issue Type: Bug
>  Components: kafka integration
>Affects Versions: 4.0.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Failed on [4th run of flaky test 
> checker|http://ci.hive.apache.org/job/hive-flaky-check/69/] with
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 1milliseconds while awaiting InitProducerId



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23847) Extracting hive-parser module broke exec jar upload in tez

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23847?focusedWorklogId=459334=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459334
 ]

ASF GitHub Bot logged work on HIVE-23847:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 15:02
Start Date: 15/Jul/20 15:02
Worklog Time Spent: 10m 
  Work Description: asinkovits commented on a change in pull request #1252:
URL: https://github.com/apache/hive/pull/1252#discussion_r455122449



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java
##
@@ -109,7 +112,7 @@
 
   private final HiveConf conf;
   private Path tezScratchDir;
-  private LocalResource appJarLr;
+  private Collection appJarLrs;

Review comment:
   Thanks for the suggestion. Unfortunately it seems that the generated 
HiveParser.class has some huge methods, which causes the shade plugin to fail.
   
   Error creating shaded jar: Problem shading JAR 
/home/antals/.m2/repository/org/apache/hive/hive-parser/4.0.0-SNAPSHOT/hive-parser-4.0.0-SNAPSHOT.jar
 entry org/apache/hadoop/hive/ql/parse/HiveParser.class: 
java.lang.RuntimeException: Method code too large!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459334)
Time Spent: 0.5h  (was: 20m)

> Extracting hive-parser module broke exec jar upload in tez
> --
>
> Key: HIVE-23847
> URL: https://issues.apache.org/jira/browse/HIVE-23847
> Project: Hive
>  Issue Type: Bug
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> 2020-07-13 16:53:50,551 [INFO] [Dispatcher thread {Central}] 
> |HistoryEventHandler.criticalEvents|: 
> [HISTORY][DAG:dag_1594632473849_0001_1][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Map 1, taskAttemptId=attempt_1594632473849_0001_1_00_00_0, 
> creationTime=1594652027059, allocationTime=1594652028460, 
> startTime=1594652029356, finishTime=1594652030546, timeTaken=1190, 
> status=FAILED, taskFailureType=NON_FATAL, errorEnum=FRAMEWORK_ERROR, 
> diagnostics=Error: Error while running task ( failure ) : 
> attempt_1594632473849_0001_1_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:340)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
>   ... 16 more
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hive/ql/parse/ParseException
>   at java.lang.Class.getDeclaredConstructors0(Native Method)
>   at

[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=459315=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459315
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 14:17
Start Date: 15/Jul/20 14:17
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r455083837



##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * OutputFormat for standard JSON format.
+ * 
+ */ 
+public class JSONOutputFormat extends AbstractOutputFormat {
+  protected final BeeLine beeLine;
+  protected JsonGenerator generator;
+  
+
+  /**
+   * @param beeLine
+   */
+  JSONOutputFormat(BeeLine beeLine){ 
+this.beeLine = beeLine;
+ByteArrayOutputStream buf = new ByteArrayOutputStream();
+try {
+  this.generator = new JsonFactory().createGenerator(buf, 
JsonEncoding.UTF8);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printHeader(Rows.Row header) {
+try {
+  generator.writeStartObject();
+  generator.writeArrayFieldStart("resultset");
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printFooter(Rows.Row header) {
+try {
+  generator.writeEndArray();
+  generator.writeEndObject();
+  generator.flush();
+  String out = ((ByteArrayOutputStream) 
generator.getOutputTarget()).toString("UTF-8");
+  beeLine.output(out);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printRow(Rows rows, Rows.Row header, Rows.Row row) {
+String[] head = header.values;
+String[] vals = row.values;
+
+try {
+  int colCount = rows.rsMeta.getColumnCount();
+  boolean objStartFlag = true;
+
+  for (int i = 0; (i < head.length) && (i < vals.length); i++) {
+if (objStartFlag) {
+  generator.writeStartObject();
+  objStartFlag = false;
+}
+
+generator.writeFieldName(head[i]);
+switch(rows.rsMeta.getColumnType(i+1)) {
+  case Types.TINYINT:
+  case Types.SMALLINT:
+  case Types.INTEGER:
+  case Types.BIGINT:
+  case Types.REAL:
+  case Types.FLOAT:
+  case Types.DOUBLE:
+  case Types.DECIMAL:
+  case Types.NUMERIC:
+  case Types.ROWID:
+generator.writeNumber(vals[i]);
+break;
+  case Types.NULL:
+generator.writeNull();
+break;
+  case Types.BOOLEAN:
+generator.writeBoolean(Boolean.parseBoolean(vals[i]));
+break;
+  default:

Review comment:
   OK.  I follow now.  Sorry I didn't read your comment close enough. :)
   
   So, I think for now, what you should so is... 
   
   Create a new `case` for BINARY data and then check that that this is set to 
"true".  Throw an Exception otherwise.
   
   ```
   beeLine.getOpts().getConvertBinaryArrayToString();
   ```
   
   At least this way, we can have a separate ticket about making this base-64 
instead of the current implementation.  Then, just use `writeString`.
   
   If Google Guava is included, use the `Precondition` class to check





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please

[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=459311=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459311
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 14:14
Start Date: 15/Jul/20 14:14
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r455083837



##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * OutputFormat for standard JSON format.
+ * 
+ */ 
+public class JSONOutputFormat extends AbstractOutputFormat {
+  protected final BeeLine beeLine;
+  protected JsonGenerator generator;
+  
+
+  /**
+   * @param beeLine
+   */
+  JSONOutputFormat(BeeLine beeLine){ 
+this.beeLine = beeLine;
+ByteArrayOutputStream buf = new ByteArrayOutputStream();
+try {
+  this.generator = new JsonFactory().createGenerator(buf, 
JsonEncoding.UTF8);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printHeader(Rows.Row header) {
+try {
+  generator.writeStartObject();
+  generator.writeArrayFieldStart("resultset");
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printFooter(Rows.Row header) {
+try {
+  generator.writeEndArray();
+  generator.writeEndObject();
+  generator.flush();
+  String out = ((ByteArrayOutputStream) 
generator.getOutputTarget()).toString("UTF-8");
+  beeLine.output(out);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printRow(Rows rows, Rows.Row header, Rows.Row row) {
+String[] head = header.values;
+String[] vals = row.values;
+
+try {
+  int colCount = rows.rsMeta.getColumnCount();
+  boolean objStartFlag = true;
+
+  for (int i = 0; (i < head.length) && (i < vals.length); i++) {
+if (objStartFlag) {
+  generator.writeStartObject();
+  objStartFlag = false;
+}
+
+generator.writeFieldName(head[i]);
+switch(rows.rsMeta.getColumnType(i+1)) {
+  case Types.TINYINT:
+  case Types.SMALLINT:
+  case Types.INTEGER:
+  case Types.BIGINT:
+  case Types.REAL:
+  case Types.FLOAT:
+  case Types.DOUBLE:
+  case Types.DECIMAL:
+  case Types.NUMERIC:
+  case Types.ROWID:
+generator.writeNumber(vals[i]);
+break;
+  case Types.NULL:
+generator.writeNull();
+break;
+  case Types.BOOLEAN:
+generator.writeBoolean(Boolean.parseBoolean(vals[i]));
+break;
+  default:

Review comment:
   OK.  I follow now.  Sorry I didn't read your comment close enough. :)
   
   So, I think for now, what you should so is... 
   
   Create a new `case` for BINARY data and then check that that this is set to 
"true".  Throw an Exception otherwise.
   
   ```
   beeLine.getOpts().getConvertBinaryArrayToString();
   ```
   
   At least this way, we can have a separate ticket about making this base-64 
instead of the current implementation.
   
   If Google Guava is included, use the `Precondition` class to check





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

[jira] [Work logged] (HIVE-20447) Add JSON Outputformat support

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-20447?focusedWorklogId=459310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459310
 ]

ASF GitHub Bot logged work on HIVE-20447:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 14:13
Start Date: 15/Jul/20 14:13
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1169:
URL: https://github.com/apache/hive/pull/1169#discussion_r455083837



##
File path: beeline/src/java/org/apache/hive/beeline/JSONOutputFormat.java
##
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * This source file is based on code taken from SQLLine 1.9
+ * See SQLLine notice in LICENSE
+ */
+package org.apache.hive.beeline;
+
+import java.sql.SQLException;
+import java.sql.Types;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import com.fasterxml.jackson.core.JsonEncoding;
+import com.fasterxml.jackson.core.JsonFactory;
+import com.fasterxml.jackson.core.JsonGenerator;
+
+/**
+ * OutputFormat for standard JSON format.
+ * 
+ */ 
+public class JSONOutputFormat extends AbstractOutputFormat {
+  protected final BeeLine beeLine;
+  protected JsonGenerator generator;
+  
+
+  /**
+   * @param beeLine
+   */
+  JSONOutputFormat(BeeLine beeLine){ 
+this.beeLine = beeLine;
+ByteArrayOutputStream buf = new ByteArrayOutputStream();
+try {
+  this.generator = new JsonFactory().createGenerator(buf, 
JsonEncoding.UTF8);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printHeader(Rows.Row header) {
+try {
+  generator.writeStartObject();
+  generator.writeArrayFieldStart("resultset");
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printFooter(Rows.Row header) {
+try {
+  generator.writeEndArray();
+  generator.writeEndObject();
+  generator.flush();
+  String out = ((ByteArrayOutputStream) 
generator.getOutputTarget()).toString("UTF-8");
+  beeLine.output(out);
+} catch(IOException e) {
+  beeLine.handleException(e);
+}
+  }
+
+  @Override 
+  void printRow(Rows rows, Rows.Row header, Rows.Row row) {
+String[] head = header.values;
+String[] vals = row.values;
+
+try {
+  int colCount = rows.rsMeta.getColumnCount();
+  boolean objStartFlag = true;
+
+  for (int i = 0; (i < head.length) && (i < vals.length); i++) {
+if (objStartFlag) {
+  generator.writeStartObject();
+  objStartFlag = false;
+}
+
+generator.writeFieldName(head[i]);
+switch(rows.rsMeta.getColumnType(i+1)) {
+  case Types.TINYINT:
+  case Types.SMALLINT:
+  case Types.INTEGER:
+  case Types.BIGINT:
+  case Types.REAL:
+  case Types.FLOAT:
+  case Types.DOUBLE:
+  case Types.DECIMAL:
+  case Types.NUMERIC:
+  case Types.ROWID:
+generator.writeNumber(vals[i]);
+break;
+  case Types.NULL:
+generator.writeNull();
+break;
+  case Types.BOOLEAN:
+generator.writeBoolean(Boolean.parseBoolean(vals[i]));
+break;
+  default:

Review comment:
   OK.  I follow now.  Sorry I didn't read your comment close enough. :)
   
   So, I think for now, what you should so is... 
   
   Create a new `case` for BINARY data and then check that that this is set to 
"true".  Throw an Exception otherwise.
   
   ```
   beeLine.getOpts().getConvertBinaryArrayToString();
   ```
   
   At least this way, we can have a separate ticket about making this base-64 
instead of the current implementation.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog

[jira] [Work logged] (HIVE-23793) Review of QueryInfo Class

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23793?focusedWorklogId=459304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459304
 ]

ASF GitHub Bot logged work on HIVE-23793:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 14:03
Start Date: 15/Jul/20 14:03
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1197:
URL: https://github.com/apache/hive/pull/1197#issuecomment-658787201


   OK, changes look to me



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459304)
Time Spent: 2h 20m  (was: 2h 10m)

> Review of QueryInfo Class
> -
>
> Key: HIVE-23793
> URL: https://issues.apache.org/jira/browse/HIVE-23793
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (HIVE-23244) Extract Create View analyzer from SemanticAnalyzer

2020-07-15 Thread Miklos Gergely (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely reopened HIVE-23244:
---

Had to revert it as the new ANTLR token made the generated class too big, as in 
the meantime some other token was added too by another patch. Must solve that 
issue first.

> Extract Create View analyzer from SemanticAnalyzer
> --
>
> Key: HIVE-23244
> URL: https://issues.apache.org/jira/browse/HIVE-23244
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23244.01.patch, HIVE-23244.02.patch, 
> HIVE-23244.03.patch, HIVE-23244.04.patch, HIVE-23244.05.patch, 
> HIVE-23244.06.patch, HIVE-23244.07.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Create View commands are not queries, but commands which have queries as a 
> part of them. Therefore a separate CreateViewAnalyzer is needed which uses 
> SemanticAnalyer to analyze it's query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23793) Review of QueryInfo Class

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23793?focusedWorklogId=459300=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459300
 ]

ASF GitHub Bot logged work on HIVE-23793:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 13:55
Start Date: 15/Jul/20 13:55
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1197:
URL: https://github.com/apache/hive/pull/1197#discussion_r455070373



##
File path: ql/src/java/org/apache/hadoop/hive/ql/QueryInfo.java
##
@@ -70,36 +80,57 @@ public String getExecutionEngine() {
 return executionEngine;
   }
 
-  public synchronized String getState() {
+  public String getState() {
 return state;
   }
 
+  /**
+   * The time the query began in milliseconds.
+   *
+   * @return The time the query began
+   */
   public long getBeginTime() {
-return beginTime;
+return TimeUnit.NANOSECONDS.toMillis(beginTime);
   }
 
-  public synchronized Long getEndTime() {
-return endTime;
+  /**
+   * Get the end time in milliseconds. Only valid if {@link #isRunning()}
+   * returns false.
+   *
+   * @return Query end time
+   */
+  public long getEndTime() {
+return TimeUnit.NANOSECONDS.toMillis(endTime);
   }
 
-  public synchronized void updateState(String state) {
+  public void updateState(String state) {
 this.state = state;
   }
 
   public String getOperationId() {
 return operationId;
   }
 
-  public synchronized void setEndTime() {
-this.endTime = System.currentTimeMillis();
+  public void setEndTime() {
+this.endTime = System.nanoTime();
   }
 
-  public synchronized void setRuntime(long runtime) {
-this.runtime = runtime;
+  /**
+   * Set the amount of time the query spent actually running in milliseconds.
+   *
+   * @param runtime The amount of time this query spent running
+   */
+  public void setRuntime(long runtime) {
+this.runtime = TimeUnit.MILLISECONDS.toNanos(runtime);

Review comment:
   For simplicity sake, I wanted to keep all of the internal time values 
the same precision (nano):
   
   ```
/*
 * Times are stored internally with nanosecond precision.
 */
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459300)
Time Spent: 2h 10m  (was: 2h)

> Review of QueryInfo Class
> -
>
> Key: HIVE-23793
> URL: https://issues.apache.org/jira/browse/HIVE-23793
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23836) Make "cols" dependent so that it cascade deletes

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23836?focusedWorklogId=459298=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459298
 ]

ASF GitHub Bot logged work on HIVE-23836:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 13:53
Start Date: 15/Jul/20 13:53
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1239:
URL: https://github.com/apache/hive/pull/1239#issuecomment-658781200


   @ashutoshc You were correct about marking the association for deletion.  I'm 
not sure if this is already the default value, but better safe than sorry.  
Could use a review. Thanks :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459298)
Time Spent: 20m  (was: 10m)

> Make "cols" dependent so that it cascade deletes
> 
>
> Key: HIVE-23836
> URL: https://issues.apache.org/jira/browse/HIVE-23836
> Project: Hive
>  Issue Type: Bug
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {quote}
> If you want the deletion of a persistent object to cause the deletion of 
> related objects then you need to mark the related fields in the mapping to be 
> "dependent".
> {quote}
> http://www.datanucleus.org/products/accessplatform/jdo/persistence.html#dependent_fields
> http://www.datanucleus.org/products/datanucleus/jdo/persistence.html#_deleting_an_object
> The database won't do it:
> {code:sql|title=Derby Schema}
> ALTER TABLE "APP"."COLUMNS_V2" ADD CONSTRAINT "COLUMNS_V2_FK1" FOREIGN KEY 
> ("CD_ID") REFERENCES "APP"."CDS" ("CD_ID") ON DELETE NO ACTION ON UPDATE NO 
> ACTION;
> {code}
> https://github.com/apache/hive/blob/65cf6957cf9432277a096f91b40985237274579f/standalone-metastore/metastore-server/src/main/sql/derby/hive-schema-4.0.0.derby.sql#L452



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23244) Extract Create View analyzer from SemanticAnalyzer

2020-07-15 Thread Miklos Gergely (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-23244:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Merged to master, thank you [~belugabehr]

> Extract Create View analyzer from SemanticAnalyzer
> --
>
> Key: HIVE-23244
> URL: https://issues.apache.org/jira/browse/HIVE-23244
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23244.01.patch, HIVE-23244.02.patch, 
> HIVE-23244.03.patch, HIVE-23244.04.patch, HIVE-23244.05.patch, 
> HIVE-23244.06.patch, HIVE-23244.07.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Create View commands are not queries, but commands which have queries as a 
> part of them. Therefore a separate CreateViewAnalyzer is needed which uses 
> SemanticAnalyer to analyze it's query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-23814) Clean up Driver

2020-07-15 Thread Miklos Gergely (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely resolved HIVE-23814.
---
Resolution: Fixed

Merged to master, thank you [~pvary]

> Clean up Driver
> ---
>
> Key: HIVE-23814
> URL: https://issues.apache.org/jira/browse/HIVE-23814
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Driver is now cut down to it's minimal size by extracting all of it's sub 
> tasks to separate classes. The rest should be cleaned up by
>  * moving out some smaller parts of the code to sub task and utility classes 
> wherever it is still possible
>  * fix checkstyle issues
>  * add missing javadoc
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23244) Extract Create View analyzer from SemanticAnalyzer

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23244?focusedWorklogId=459284=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459284
 ]

ASF GitHub Bot logged work on HIVE-23244:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 13:13
Start Date: 15/Jul/20 13:13
Worklog Time Spent: 10m 
  Work Description: miklosgergely merged pull request #1125:
URL: https://github.com/apache/hive/pull/1125


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459284)
Time Spent: 40m  (was: 0.5h)

> Extract Create View analyzer from SemanticAnalyzer
> --
>
> Key: HIVE-23244
> URL: https://issues.apache.org/jira/browse/HIVE-23244
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23244.01.patch, HIVE-23244.02.patch, 
> HIVE-23244.03.patch, HIVE-23244.04.patch, HIVE-23244.05.patch, 
> HIVE-23244.06.patch, HIVE-23244.07.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Create View commands are not queries, but commands which have queries as a 
> part of them. Therefore a separate CreateViewAnalyzer is needed which uses 
> SemanticAnalyer to analyze it's query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459283
 ]

ASF GitHub Bot logged work on HIVE-23815:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 13:12
Start Date: 15/Jul/20 13:12
Worklog Time Spent: 10m 
  Work Description: xinghuayu007 opened a new pull request #1227:
URL: https://github.com/apache/hive/pull/1227


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459283)
Time Spent: 2h 20m  (was: 2h 10m)

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459282
 ]

ASF GitHub Bot logged work on HIVE-23815:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 13:09
Start Date: 15/Jul/20 13:09
Worklog Time Spent: 10m 
  Work Description: xinghuayu007 closed pull request #1227:
URL: https://github.com/apache/hive/pull/1227


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459282)
Time Spent: 2h 10m  (was: 2h)

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-07-15 Thread Syed Shameerur Rahman (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158137#comment-17158137
 ] 

Syed Shameerur Rahman edited comment on HIVE-23851 at 7/15/20, 1:02 PM:


[~asinkovits] I guess HIVE-23808 might be due to misconfiguration, may be the 
value of metastore.expression.proxy is forcefully set to 
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore. Check 
qtest msck_repair_drop.q 

In this case HIVE-23851 , We set metastore.expression.proxy to 
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore for 
partition filtering and hence hitting this issue


was (Author: srahman):
[~asinkovits] I guess HIVE-23808 might be due to misconfiguration, may be the 
value of metastore.expression.proxy is forcefully set to 
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore. Check 
qtest msck_repair_drop.q 

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I

[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-07-15 Thread Syed Shameerur Rahman (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158137#comment-17158137
 ] 

Syed Shameerur Rahman commented on HIVE-23851:
--

[~asinkovits] I guess HIVE-23808 might be due to misconfiguration, may be the 
value of metastore.expression.proxy is forcefully set to 
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore. Check 
qtest msck_repair_drop.q 

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck

[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459275=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459275
 ]

ASF GitHub Bot logged work on HIVE-23815:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 12:47
Start Date: 15/Jul/20 12:47
Worklog Time Spent: 10m 
  Work Description: xinghuayu007 opened a new pull request #1227:
URL: https://github.com/apache/hive/pull/1227


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459275)
Time Spent: 2h  (was: 1h 50m)

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23726) Create table may throw MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string)

2020-07-15 Thread Naveen Gangam (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158124#comment-17158124
 ] 

Naveen Gangam commented on HIVE-23726:
--

[~sankarh] [~samuelan] Could you please review ? Its a one-line fix. Thanks

> Create table may throw 
> MetaException(message:java.lang.IllegalArgumentException: Can not create a 
> Path from a null string)
> --
>
> Key: HIVE-23726
> URL: https://issues.apache.org/jira/browse/HIVE-23726
> Project: Hive
>  Issue Type: Bug
>Reporter: Istvan Fajth
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> - Given:
>  metastore.warehouse.tenant.colocation is set to true
>  a test database was created as {{create database test location '/data'}}
>  - When:
>  I try to create a table as {{create table t1 (a int) location '/data/t1'}}
>  - Then:
> The create table fails with the following exception:
> {code}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:java.lang.IllegalArgumentException: Can not create a 
> Path from a null string)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1138)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1143)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:148)
>   at 
> org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98)
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.metastore.api.MetaException: 
> java.lang.IllegalArgumentException: Can not create a Path from a null string
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63325)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63293)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result.read(ThriftHiveMetastore.java:63219)
>   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_req(ThriftHiveMetastore.java:1780)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_req(ThriftHiveMetastore.java:1767)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:3518)
>   at 
>

[jira] [Commented] (HIVE-23855) TestQueryShutdownHooks is flaky

2020-07-15 Thread Zoltan Haindrich (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158129#comment-17158129
 ] 

Zoltan Haindrich commented on HIVE-23855:
-

fyi [~mustafaiman]

> TestQueryShutdownHooks is flaky
> ---
>
> Key: HIVE-23855
> URL: https://issues.apache.org/jira/browse/HIVE-23855
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Priority: Major
>
> http://ci.hive.apache.org/job/hive-precommit/job/master/100/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459273=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459273
 ]

ASF GitHub Bot logged work on HIVE-23815:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 12:46
Start Date: 15/Jul/20 12:46
Worklog Time Spent: 10m 
  Work Description: xinghuayu007 closed pull request #1227:
URL: https://github.com/apache/hive/pull/1227


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459273)
Time Spent: 1h 50m  (was: 1h 40m)

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-07-15 Thread Antal Sinkovits (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158126#comment-17158126
 ] 

Antal Sinkovits commented on HIVE-23851:


[~srahman] I'm a bit confused now. Is this the same issue as 
https://issues.apache.org/jira/browse/HIVE-23808 ?

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely remove the need for class MsckPartitionExpressionProxy 
> and this also helps to

[jira] [Work logged] (HIVE-23814) Clean up Driver

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23814?focusedWorklogId=459274=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459274
 ]

ASF GitHub Bot logged work on HIVE-23814:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 12:46
Start Date: 15/Jul/20 12:46
Worklog Time Spent: 10m 
  Work Description: miklosgergely merged pull request #1222:
URL: https://github.com/apache/hive/pull/1222


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459274)
Time Spent: 2h 40m  (was: 2.5h)

> Clean up Driver
> ---
>
> Key: HIVE-23814
> URL: https://issues.apache.org/jira/browse/HIVE-23814
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Driver is now cut down to it's minimal size by extracting all of it's sub 
> tasks to separate classes. The rest should be cleaned up by
>  * moving out some smaller parts of the code to sub task and utility classes 
> wherever it is still possible
>  * fix checkstyle issues
>  * add missing javadoc
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23853) CRUD based compaction also should update ACID file version metadata

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23853:
--
Labels: pull-request-available  (was: )

> CRUD based compaction also should update ACID file version metadata
> ---
>
> Key: HIVE-23853
> URL: https://issues.apache.org/jira/browse/HIVE-23853
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Current CRUD compaction does not update the file metadata to contain the ACID 
> version. Also the {{_orc_acid_version}} version file is not created.
> We should do this to be consistent across the board.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23853) CRUD based compaction also should update ACID file version metadata

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23853?focusedWorklogId=459270=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459270
 ]

ASF GitHub Bot logged work on HIVE-23853:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 12:28
Start Date: 15/Jul/20 12:28
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request #1256:
URL: https://github.com/apache/hive/pull/1256


   Made sure that the version metadata is update at file creation, and the 
version file is created.
   Updated tests as well



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459270)
Remaining Estimate: 0h
Time Spent: 10m

> CRUD based compaction also should update ACID file version metadata
> ---
>
> Key: HIVE-23853
> URL: https://issues.apache.org/jira/browse/HIVE-23853
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Current CRUD compaction does not update the file metadata to contain the ACID 
> version. Also the {{_orc_acid_version}} version file is not created.
> We should do this to be consistent across the board.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23854) Natively support Double and Decimal CVs in ReduceSinkOperator

2020-07-15 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-23854:
-


> Natively support Double and Decimal CVs in ReduceSinkOperator
> -
>
> Key: HIVE-23854
> URL: https://issues.apache.org/jira/browse/HIVE-23854
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459267=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459267
 ]

ASF GitHub Bot logged work on HIVE-23815:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 12:26
Start Date: 15/Jul/20 12:26
Worklog Time Spent: 10m 
  Work Description: xinghuayu007 opened a new pull request #1227:
URL: https://github.com/apache/hive/pull/1227


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459267)
Time Spent: 1h 40m  (was: 1.5h)

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23853) CRUD based compaction also should update ACID file version metadata

2020-07-15 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-23853:
-


> CRUD based compaction also should update ACID file version metadata
> ---
>
> Key: HIVE-23853
> URL: https://issues.apache.org/jira/browse/HIVE-23853
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> Current CRUD compaction does not update the file metadata to contain the ACID 
> version. Also the {{_orc_acid_version}} version file is not created.
> We should do this to be consistent across the board.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459264
 ]

ASF GitHub Bot logged work on HIVE-23815:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 12:22
Start Date: 15/Jul/20 12:22
Worklog Time Spent: 10m 
  Work Description: xinghuayu007 closed pull request #1227:
URL: https://github.com/apache/hive/pull/1227


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459264)
Time Spent: 1.5h  (was: 1h 20m)

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459261=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459261
 ]

ASF GitHub Bot logged work on HIVE-23815:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 12:17
Start Date: 15/Jul/20 12:17
Worklog Time Spent: 10m 
  Work Description: xinghuayu007 opened a new pull request #1227:
URL: https://github.com/apache/hive/pull/1227


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459261)
Time Spent: 1h 20m  (was: 1h 10m)

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-23852) Natively support Date and Timestamp types in ReduceSink operator

2020-07-15 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-23852:
-


> Natively support Date and Timestamp types in ReduceSink operator
> 
>
> Key: HIVE-23852
> URL: https://issues.apache.org/jira/browse/HIVE-23852
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>
> There is no native support currently meaning that these types end up being 
> serialized as multi-key columns which is much slower (iterating through batch 
> columns instead of writing a value directly)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23815) output statistics of underlying datastore

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23815?focusedWorklogId=459254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459254
 ]

ASF GitHub Bot logged work on HIVE-23815:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 11:57
Start Date: 15/Jul/20 11:57
Worklog Time Spent: 10m 
  Work Description: xinghuayu007 closed pull request #1227:
URL: https://github.com/apache/hive/pull/1227


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459254)
Time Spent: 1h 10m  (was: 1h)

> output statistics of underlying datastore 
> --
>
> Key: HIVE-23815
> URL: https://issues.apache.org/jira/browse/HIVE-23815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rossetti Wong
>Assignee: Rossetti Wong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This patch provides a way to get the statistics data of metastore's 
> underlying datastore, like MySQL, Oracle and so on.  You can get the number 
> of datastore reads and writes, the average time of transaction execution, the 
> total active connection and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23540) Fix Findbugs Warnings in EncodedColumnBatch

2020-07-15 Thread Panagiotis Garefalakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-23540:
--
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

> Fix Findbugs Warnings in EncodedColumnBatch
> ---
>
> Key: HIVE-23540
> URL: https://issues.apache.org/jira/browse/HIVE-23540
> Project: Hive
>  Issue Type: Improvement
>  Components: storage-api
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-23540.1.patch, HIVE-23540.2.patch
>
>
> bq. Strings should not be concatenated using '+' in a loop



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23850) Allow PPD when subject is not a column with grouping sets present

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23850?focusedWorklogId=459239=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459239
 ]

ASF GitHub Bot logged work on HIVE-23850:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 11:19
Start Date: 15/Jul/20 11:19
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #1255:
URL: https://github.com/apache/hive/pull/1255


   … present
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459239)
Remaining Estimate: 0h
Time Spent: 10m

> Allow PPD when subject is not a column with grouping sets present
> -
>
> Key: HIVE-23850
> URL: https://issues.apache.org/jira/browse/HIVE-23850
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After [HIVE-19653|https://issues.apache.org/jira/browse/HIVE-19653],  filters 
> with only columns and constants are pushed down, but in some cases,  this may 
> not work as well, for example:
> SET hive.cbo.enable=false;
> SELECT a, b, sum(s)
> FROM T1
> GROUP BY a, b GROUPING SETS ((a), (a, b))
> HAVING upper(a) = "AAA" AND sum(s) > 100;
>  
> SELECT upper(a), b, sum(s)
> FROM T1
> GROUP BY upper(a), b GROUPING SETS ((upper(a)), (upper(a), b))
> HAVING upper(a) = "AAA" AND sum(s) > 100;
>  
> The filters pushed down to GBY can be f(gbyKey) or gbyKey with udf ,  not 
> only the column groupby keys.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23850) Allow PPD when subject is not a column with grouping sets present

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23850:
--
Labels: pull-request-available  (was: )

> Allow PPD when subject is not a column with grouping sets present
> -
>
> Key: HIVE-23850
> URL: https://issues.apache.org/jira/browse/HIVE-23850
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After [HIVE-19653|https://issues.apache.org/jira/browse/HIVE-19653],  filters 
> with only columns and constants are pushed down, but in some cases,  this may 
> not work as well, for example:
> SET hive.cbo.enable=false;
> SELECT a, b, sum(s)
> FROM T1
> GROUP BY a, b GROUPING SETS ((a), (a, b))
> HAVING upper(a) = "AAA" AND sum(s) > 100;
>  
> SELECT upper(a), b, sum(s)
> FROM T1
> GROUP BY upper(a), b GROUPING SETS ((upper(a)), (upper(a), b))
> HAVING upper(a) = "AAA" AND sum(s) > 100;
>  
> The filters pushed down to GBY can be f(gbyKey) or gbyKey with udf ,  not 
> only the column groupby keys.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-07-15 Thread Syed Shameerur Rahman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23851:
-
Fix Version/s: 4.0.0

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
> Fix For: 4.0.0
>
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely remove the need for class MsckPartitionExpressionProxy 
> and this also helps to reduce the complexity of Msck Repair command with 
> parition filtering to work with ease (no need to set the

[jira] [Updated] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-07-15 Thread Syed Shameerur Rahman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23851:
-
Target Version/s:   (was: 4.0.0)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely remove the need for class MsckPartitionExpressionProxy 
> and this also helps to reduce the complexity of Msck Repair command with 
> parition filtering to work with ease (no need to set the expression 
> proxyClass

[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-07-15 Thread Syed Shameerur Rahman (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158086#comment-17158086
 ] 

Syed Shameerur Rahman commented on HIVE-23851:
--

[~kgyrtkirk] [~prasanth_j] Any thoughts on the above issue?

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely remove the need for class MsckPartitionExpressionProxy 
> and this also helps to reduce the complexity of Msck Repair command with 
> parition filtering

[jira] [Assigned] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-07-15 Thread Syed Shameerur Rahman (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman reassigned HIVE-23851:



> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
>  ) hence the query fails with Failed to deserialize the expression.
> *Solutions*:
> I could think of two approaches to this problem
> # Since PartitionExpressionForMetastore is required only during parition 
> pruning step, We can switch back the expression proxy class to 
> MsckPartitionExpressionProxy once the partition pruning step is done.
> # The other solution is to make serialization process in msck drop partition 
> filter expression compatible with the one with 
> PartitionExpressionForMetastore, We can do this via Reflection since the drop 
> partition serialization happens in Msck class (standadlone-metatsore) by this 
> way we can completely remove the need for class MsckPartitionExpressionProxy 
> and this also helps to reduce the complexity of Msck Repair command with 
> parition filtering to work with ease (no need to set the expression 
> proxyClass config).
> I am personally

[jira] [Resolved] (HIVE-23848) TestHiveMetaStoreChecker and TestMiniLlapLocalCliDriver tests are failing in master

2020-07-15 Thread Peter Vary (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-23848.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.

Thanks for the patch [~kishendas]!

> TestHiveMetaStoreChecker and TestMiniLlapLocalCliDriver tests are failing in 
> master
> ---
>
> Key: HIVE-23848
> URL: https://issues.apache.org/jira/browse/HIVE-23848
> Project: Hive
>  Issue Type: Test
>  Components: HiveServer2
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
> Fix For: 4.0.0
>
>
> Below tests are failing after HIVE-23767 landed in master.
> testAddPartitionNormalDeltas – 
> org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker
>  testCliDriver[show_partitions2] – 
> org.apache.hadoop.hive.cli.split21.TestMiniLlapLocalCliDriver
>  testAddPartitionMMBase – 
> org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker
>  testAddPartitionCompactedDeltas – 
> org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker
>  testAddPartitionCompactedBase – 
> org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23727) Improve SQLOperation log handling when cancel background

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23727?focusedWorklogId=459216=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459216
 ]

ASF GitHub Bot logged work on HIVE-23727:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 10:17
Start Date: 15/Jul/20 10:17
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #1149:
URL: https://github.com/apache/hive/pull/1149


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HIVE-X: Fix a typo in YYY)
   For more details, please see 
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459216)
Time Spent: 3h  (was: 2h 50m)

> Improve SQLOperation log handling when cancel background
> 
>
> Key: HIVE-23727
> URL: https://issues.apache.org/jira/browse/HIVE-23727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The SQLOperation checks _if (shouldRunAsync() && state != 
> OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the 
> background task. If true, the state should not be OperationState.CANCELED, so 
> logging under the state == OperationState.CANCELED should never happen.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459212=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459212
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 10:12
Start Date: 15/Jul/20 10:12
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r454943283



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java
##
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl.util;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.concurrent.LinkedBlockingQueue;
+
+
+/**
+ * A file backed list of Strings which is in-memory till the threshold.
+ */
+public class FileList implements AutoCloseable, Iterator {
+  private static final Logger LOG = LoggerFactory.getLogger(FileList.class);
+  private static int fileListStreamerID = 0;
+  private static final String  FILE_LIST_STREAMER_PREFIX = 
"file-list-streamer-";
+
+  private LinkedBlockingQueue cache;
+  private volatile boolean thresholdHit = false;
+  private int thresholdPoint;
+  private float thresholdFactor = 0.9f;
+  private Path backingFile;
+  private FileListStreamer fileListStreamer;
+  private String nextElement;
+  private boolean noMoreElement;
+  private HiveConf conf;
+  private BufferedReader backingFileReader;
+
+
+  public FileList(Path backingFile, int cacheSize, HiveConf conf) throws 
IOException {
+this.backingFile = backingFile;
+if (cacheSize > 0) {
+  // Cache size must be > 0 for this list to be used for the write 
operation.
+  this.cache = new LinkedBlockingQueue<>(cacheSize);
+  fileListStreamer = new FileListStreamer(cache, backingFile, conf);
+  LOG.debug("File list backed by {} can be used for write operation.", 
backingFile);
+} else {
+  thresholdHit = true;
+}
+this.conf = conf;
+thresholdPoint = getThreshold(cacheSize);
+  }
+
+  /**
+   * Only add operation is safe for concurrent operations.
+   */
+  public void add(String entry) throws SemanticException {
+if (thresholdHit && !fileListStreamer.isAlive()) {
+  throw new SemanticException("List is not getting saved anymore to file " 
+ backingFile.toString());
+}
+try {
+  cache.put(entry);
+} catch (InterruptedException e) {
+  throw new SemanticException(e);
+}
+if (!thresholdHit && cache.size() >= thresholdPoint) {
+  initStoreToFile(cache.size());
+}
+  }
+
+  @Override
+  public boolean hasNext() {
+if (!thresholdHit) {
+  return (cache != null && !cache.isEmpty());
+}
+if (nextElement != null) {
+  return true;
+}
+if (noMoreElement) {
+  return false;
+}
+nextElement = readNextLine();
+if (nextElement == null) {
+  noMoreElement = true;
+}
+return !noMoreElement;
+  }
+
+  @Override
+  public String next() {
+if (!hasNext()) {
+  throw new NoSuchElementException("No more element in the list backed by 
" + backingFile);
+}
+String retVal = nextElement;
+nextElement = null;
+return thresholdHit ? retVal : cache.poll();
+  }
+
+  private synchronized void initStoreToFile(int cacheSize) {
+if (!thresholdHit) {
+  fileListStreamer.setName(getNextID());
+  fileListStreamer.setDaemon(true);
+  fileListStreamer.start();
+  thresholdHit = true;
+  LOG.info("Started streaming the list elements to file: {}, cache size 
{}", backingFile, cacheSize);
+}
+  }
+
+  private String readNextLine() {
+String nextElement = null;

[jira] [Updated] (HIVE-23474) Deny Repl Dump if the database is a target of replication

2020-07-15 Thread Aasha Medhi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23474:
---
Status: In Progress  (was: Patch Available)

> Deny Repl Dump if the database is a target of replication
> -
>
> Key: HIVE-23474
> URL: https://issues.apache.org/jira/browse/HIVE-23474
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23474.01.patch, HIVE-23474.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-23474) Deny Repl Dump if the database is a target of replication

2020-07-15 Thread Aasha Medhi (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23474:
---
Attachment: HIVE-23474.02.patch
Status: Patch Available  (was: In Progress)

> Deny Repl Dump if the database is a target of replication
> -
>
> Key: HIVE-23474
> URL: https://issues.apache.org/jira/browse/HIVE-23474
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23474.01.patch, HIVE-23474.02.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459208=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459208
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 10:06
Start Date: 15/Jul/20 10:06
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r454940146



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java
##
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl.util;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.concurrent.LinkedBlockingQueue;
+
+
+/**
+ * A file backed list of Strings which is in-memory till the threshold.
+ */
+public class FileList implements AutoCloseable, Iterator {
+  private static final Logger LOG = LoggerFactory.getLogger(FileList.class);
+  private static int fileListStreamerID = 0;
+  private static final String  FILE_LIST_STREAMER_PREFIX = 
"file-list-streamer-";
+
+  private LinkedBlockingQueue cache;
+  private volatile boolean thresholdHit = false;
+  private int thresholdPoint;
+  private float thresholdFactor = 0.9f;
+  private Path backingFile;
+  private FileListStreamer fileListStreamer;
+  private String nextElement;
+  private boolean noMoreElement;
+  private HiveConf conf;
+  private BufferedReader backingFileReader;
+
+
+  public FileList(Path backingFile, int cacheSize, HiveConf conf) throws 
IOException {
+this.backingFile = backingFile;
+if (cacheSize > 0) {
+  // Cache size must be > 0 for this list to be used for the write 
operation.
+  this.cache = new LinkedBlockingQueue<>(cacheSize);
+  fileListStreamer = new FileListStreamer(cache, backingFile, conf);
+  LOG.debug("File list backed by {} can be used for write operation.", 
backingFile);
+} else {
+  thresholdHit = true;
+}
+this.conf = conf;
+thresholdPoint = getThreshold(cacheSize);
+  }
+
+  /**
+   * Only add operation is safe for concurrent operations.
+   */
+  public void add(String entry) throws SemanticException {
+if (thresholdHit && !fileListStreamer.isAlive()) {
+  throw new SemanticException("List is not getting saved anymore to file " 
+ backingFile.toString());
+}
+try {
+  cache.put(entry);
+} catch (InterruptedException e) {
+  throw new SemanticException(e);
+}
+if (!thresholdHit && cache.size() >= thresholdPoint) {
+  initStoreToFile(cache.size());
+}
+  }
+
+  @Override
+  public boolean hasNext() {
+if (!thresholdHit) {
+  return (cache != null && !cache.isEmpty());
+}
+if (nextElement != null) {
+  return true;
+}
+if (noMoreElement) {
+  return false;
+}
+nextElement = readNextLine();
+if (nextElement == null) {
+  noMoreElement = true;
+}
+return !noMoreElement;
+  }
+
+  @Override
+  public String next() {
+if (!hasNext()) {
+  throw new NoSuchElementException("No more element in the list backed by 
" + backingFile);
+}
+String retVal = nextElement;
+nextElement = null;
+return thresholdHit ? retVal : cache.poll();
+  }
+
+  private synchronized void initStoreToFile(int cacheSize) {
+if (!thresholdHit) {
+  fileListStreamer.setName(getNextID());
+  fileListStreamer.setDaemon(true);
+  fileListStreamer.start();
+  thresholdHit = true;
+  LOG.info("Started streaming the list elements to file: {}, cache size 
{}", backingFile, cacheSize);
+}
+  }
+
+  private String readNextLine() {
+String nextElement =

[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459203
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 10:03
Start Date: 15/Jul/20 10:03
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r454938290



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java
##
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl.util;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.concurrent.LinkedBlockingQueue;
+
+
+/**
+ * A file backed list of Strings which is in-memory till the threshold.
+ */
+public class FileList implements AutoCloseable, Iterator {
+  private static final Logger LOG = LoggerFactory.getLogger(FileList.class);
+  private static int fileListStreamerID = 0;
+  private static final String  FILE_LIST_STREAMER_PREFIX = 
"file-list-streamer-";
+
+  private LinkedBlockingQueue cache;
+  private volatile boolean thresholdHit = false;
+  private int thresholdPoint;
+  private float thresholdFactor = 0.9f;
+  private Path backingFile;
+  private FileListStreamer fileListStreamer;
+  private String nextElement;
+  private boolean noMoreElement;
+  private HiveConf conf;
+  private BufferedReader backingFileReader;
+
+
+  public FileList(Path backingFile, int cacheSize, HiveConf conf) throws 
IOException {
+this.backingFile = backingFile;
+if (cacheSize > 0) {
+  // Cache size must be > 0 for this list to be used for the write 
operation.
+  this.cache = new LinkedBlockingQueue<>(cacheSize);
+  fileListStreamer = new FileListStreamer(cache, backingFile, conf);
+  LOG.debug("File list backed by {} can be used for write operation.", 
backingFile);
+} else {
+  thresholdHit = true;
+}
+this.conf = conf;
+thresholdPoint = getThreshold(cacheSize);
+  }
+
+  /**
+   * Only add operation is safe for concurrent operations.
+   */
+  public void add(String entry) throws SemanticException {
+if (thresholdHit && !fileListStreamer.isAlive()) {
+  throw new SemanticException("List is not getting saved anymore to file " 
+ backingFile.toString());
+}
+try {
+  cache.put(entry);
+} catch (InterruptedException e) {
+  throw new SemanticException(e);
+}
+if (!thresholdHit && cache.size() >= thresholdPoint) {
+  initStoreToFile(cache.size());
+}
+  }
+
+  @Override
+  public boolean hasNext() {
+if (!thresholdHit) {
+  return (cache != null && !cache.isEmpty());
+}
+if (nextElement != null) {
+  return true;
+}
+if (noMoreElement) {
+  return false;
+}
+nextElement = readNextLine();
+if (nextElement == null) {
+  noMoreElement = true;
+}
+return !noMoreElement;
+  }
+
+  @Override
+  public String next() {
+if (!hasNext()) {
+  throw new NoSuchElementException("No more element in the list backed by 
" + backingFile);
+}
+String retVal = nextElement;
+nextElement = null;
+return thresholdHit ? retVal : cache.poll();
+  }
+
+  private synchronized void initStoreToFile(int cacheSize) {
+if (!thresholdHit) {
+  fileListStreamer.setName(getNextID());
+  fileListStreamer.setDaemon(true);
+  fileListStreamer.start();
+  thresholdHit = true;
+  LOG.info("Started streaming the list elements to file: {}, cache size 
{}", backingFile, cacheSize);
+}
+  }
+
+  private String readNextLine() {
+String nextElement = null;

[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459191=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459191
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 09:49
Start Date: 15/Jul/20 09:49
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r454929778



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileList.java
##
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl.util;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.ql.parse.SemanticException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.concurrent.LinkedBlockingQueue;
+
+
+/**
+ * A file backed list of Strings which is in-memory till the threshold.
+ */
+public class FileList implements AutoCloseable, Iterator {
+  private static final Logger LOG = LoggerFactory.getLogger(FileList.class);
+  private static int fileListStreamerID = 0;
+  private static final String  FILE_LIST_STREAMER_PREFIX = 
"file-list-streamer-";
+
+  private LinkedBlockingQueue cache;
+  private volatile boolean thresholdHit = false;
+  private int thresholdPoint;
+  private float thresholdFactor = 0.9f;
+  private Path backingFile;
+  private FileListStreamer fileListStreamer;
+  private String nextElement;
+  private boolean noMoreElement;
+  private HiveConf conf;
+  private BufferedReader backingFileReader;
+
+
+  public FileList(Path backingFile, int cacheSize, HiveConf conf) throws 
IOException {
+this.backingFile = backingFile;
+if (cacheSize > 0) {
+  // Cache size must be > 0 for this list to be used for the write 
operation.
+  this.cache = new LinkedBlockingQueue<>(cacheSize);
+  fileListStreamer = new FileListStreamer(cache, backingFile, conf);
+  LOG.debug("File list backed by {} can be used for write operation.", 
backingFile);
+} else {
+  thresholdHit = true;
+}
+this.conf = conf;
+thresholdPoint = getThreshold(cacheSize);
+  }
+
+  /**
+   * Only add operation is safe for concurrent operations.
+   */
+  public void add(String entry) throws SemanticException {
+if (thresholdHit && !fileListStreamer.isAlive()) {
+  throw new SemanticException("List is not getting saved anymore to file " 
+ backingFile.toString());
+}
+try {
+  cache.put(entry);
+} catch (InterruptedException e) {
+  throw new SemanticException(e);
+}
+if (!thresholdHit && cache.size() >= thresholdPoint) {
+  initStoreToFile(cache.size());
+}
+  }
+
+  @Override
+  public boolean hasNext() {
+if (!thresholdHit) {
+  return (cache != null && !cache.isEmpty());
+}
+if (nextElement != null) {
+  return true;
+}
+if (noMoreElement) {
+  return false;
+}
+nextElement = readNextLine();
+if (nextElement == null) {
+  noMoreElement = true;
+}
+return !noMoreElement;
+  }
+
+  @Override
+  public String next() {
+if (!hasNext()) {
+  throw new NoSuchElementException("No more element in the list backed by 
" + backingFile);
+}
+String retVal = nextElement;
+nextElement = null;
+return thresholdHit ? retVal : cache.poll();
+  }
+
+  private synchronized void initStoreToFile(int cacheSize) {
+if (!thresholdHit) {
+  fileListStreamer.setName(getNextID());
+  fileListStreamer.setDaemon(true);
+  fileListStreamer.start();
+  thresholdHit = true;
+  LOG.info("Started streaming the list elements to file: {}, cache size 
{}", backingFile, cacheSize);
+}
+  }
+
+  private String readNextLine() {
+String nextElement =

[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459183
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 09:46
Start Date: 15/Jul/20 09:46
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r454928086



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/FileListStreamer.java
##
@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.exec.repl.util;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedWriter;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStreamWriter;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.TimeUnit;
+
+public class FileListStreamer extends Thread implements Closeable {
+  private static final Logger LOG = 
LoggerFactory.getLogger(FileListStreamer.class);
+  private static BufferedWriter backingFileWriterInTest;
+  private static final long TIMEOUT_IN_SECS = 5L;
+  private volatile boolean signalTostop;
+  private final LinkedBlockingQueue cache;
+  private Path backingFile;
+  private Configuration conf;
+  private BufferedWriter backingFileWriter;
+  private volatile boolean valid = true;
+  private final Object COMPLETION_LOCK = new Object();
+  private volatile boolean completed = false;
+  private volatile boolean initialized = false;
+
+
+
+  public FileListStreamer(LinkedBlockingQueue cache, Path backingFile, 
Configuration conf) throws IOException {
+this.cache = cache;
+this.backingFile = backingFile;
+this.conf = conf;
+  }
+
+  private void lazyInit() throws IOException {
+if (backingFileWriterInTest == null) {
+  FileSystem fs = FileSystem.get(backingFile.toUri(), conf);
+  backingFileWriter = new BufferedWriter(new 
OutputStreamWriter(fs.create(backingFile)));
+} else {
+  backingFileWriter = backingFileWriterInTest;
+}
+initialized = true;
+LOG.info("Initialized a file based store to save a list at: {}", 
backingFile);
+  }
+
+  public boolean isValid() {
+return valid;
+  }
+
+  // Blocks for remaining entries to be flushed to file.
+  @Override
+  public void close() throws IOException {
+signalTostop = true;
+synchronized (COMPLETION_LOCK) {
+  while (motiveToWait()) {
+try {
+  COMPLETION_LOCK.wait(TimeUnit.SECONDS.toMillis(TIMEOUT_IN_SECS));
+} catch (InterruptedException e) {
+  // no-op
+}
+  }
+}
+if (!isValid()) {
+  throw new IOException("File list is not in a valid state:" + 
backingFile);
+}
+  }
+
+  private boolean motiveToWait() {
+return !completed && valid;
+  }
+
+  @Override
+  public void run() {
+try {
+  lazyInit();
+} catch (IOException e) {
+  valid = false;
+  throw new RuntimeException("Unable to initialize the file list 
streamer", e);
+}
+boolean exThrown = false;
+while (!exThrown && (!signalTostop || !cache.isEmpty())) {
+  try {
+String nextEntry = cache.poll(TIMEOUT_IN_SECS, TimeUnit.SECONDS);
+if (nextEntry != null) {
+  backingFileWriter.write(nextEntry);
+  backingFileWriter.newLine();
+  LOG.debug("Writing entry {} to file list backed by {}", nextEntry, 
backingFile);
+}
+  } catch (Exception iEx) {
+if (!(iEx instanceof InterruptedException)) {
+  // not draining any more. Inform the producer to avoid OOM.
+  valid = false;
+  LOG.error("Exception while saving the list to file " + backingFile, 
iEx);
+  exThrown = true;
+}
+  }
+}
+try{
+  closeBackingFile();
+  completed = true;
+} finally {
+

[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459181=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459181
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 09:44
Start Date: 15/Jul/20 09:44
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r454926767



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java
##
@@ -1559,6 +1645,76 @@ public void testIncrementalLoad() throws IOException {
 verifyRun("SELECT a from " + replDbName + ".ptned WHERE b=2", ptnData2, 
driverMirror);
   }
 
+  @Test
+  public void testIncrementalLoadLazyCopy() throws IOException {

Review comment:
   There are many existing tests with lazy load false. For external table 
which uses mini hdfs we already have a test for lazy load true. I will add one 
for other table as well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459181)
Time Spent: 5h  (was: 4h 50m)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch, HIVE-23069.02.patch
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459176=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459176
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 09:38
Start Date: 15/Jul/20 09:38
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r454923611



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/FileOperations.java
##
@@ -165,4 +178,83 @@ private void validateSrcPathListExists() throws 
IOException, LoginException {
   throw new FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage()));
 }
   }
+
+  /**
+   * This needs the root data directory to which the data needs to be exported 
to.
+   * The data export here is a list of files either in table/partition that 
are written to the _files
+   * in the exportRootDataDir provided.
+   */
+   void exportFilesAsList() throws SemanticException, IOException, 
LoginException {
+if (dataPathList.isEmpty()) {
+  return;
+}
+Retry retryable = new Retry(IOException.class) {
+  @Override
+  public Void execute() throws Exception {
+try (BufferedWriter writer = writer()) {
+  for (Path dataPath : dataPathList) {
+writeFilesList(listFilesInDir(dataPath), writer, 
AcidUtils.getAcidSubDir(dataPath));
+  }
+} catch (IOException e) {
+  if (e instanceof FileNotFoundException) {

Review comment:
   Shouldn't this suffice?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459176)
Time Spent: 4h 50m  (was: 4h 40m)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch, HIVE-23069.02.patch
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459175=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459175
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 09:37
Start Date: 15/Jul/20 09:37
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r454923124



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/FileOperations.java
##
@@ -165,4 +178,83 @@ private void validateSrcPathListExists() throws 
IOException, LoginException {
   throw new FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage()));
 }
   }
+
+  /**
+   * This needs the root data directory to which the data needs to be exported 
to.
+   * The data export here is a list of files either in table/partition that 
are written to the _files
+   * in the exportRootDataDir provided.
+   */
+   void exportFilesAsList() throws SemanticException, IOException, 
LoginException {
+if (dataPathList.isEmpty()) {
+  return;
+}
+Retry retryable = new Retry(IOException.class) {
+  @Override
+  public Void execute() throws Exception {
+try (BufferedWriter writer = writer()) {
+  for (Path dataPath : dataPathList) {
+writeFilesList(listFilesInDir(dataPath), writer, 
AcidUtils.getAcidSubDir(dataPath));
+  }
+} catch (IOException e) {
+  if (e instanceof FileNotFoundException) {

Review comment:
   if (e instanceof FileNotFoundException) {
   logger.error("exporting data files in dir : " + dataPathList + " 
to " + exportRootDataDir + " failed");
   throw new 
FileNotFoundException(FILE_NOT_FOUND.format(e.getMessage()));
 }





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459175)
Time Spent: 4h 40m  (was: 4.5h)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch, HIVE-23069.02.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459172=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459172
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 09:34
Start Date: 15/Jul/20 09:34
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r454921294



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpWork.java
##
@@ -207,4 +217,20 @@ public ReplicationMetricCollector getMetricCollector() {
   public void setMetricCollector(ReplicationMetricCollector metricCollector) {
 this.metricCollector = metricCollector;
   }
+
+  public ReplicationSpec getReplicationSpec() {
+return replicationSpec;
+  }
+
+  public void setReplicationSpec(ReplicationSpec replicationSpec) {
+this.replicationSpec = replicationSpec;
+  }
+
+  public FileList getFileList(Path backingFile, int cacheSize, HiveConf conf, 
boolean b) throws IOException {

Review comment:
   This is required for some old test which isn't  mockito based .





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459172)
Time Spent: 4.5h  (was: 4h 20m)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch, HIVE-23069.02.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23069) Memory efficient iterator should be used during replication.

2020-07-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-23069?focusedWorklogId=459169=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-459169
 ]

ASF GitHub Bot logged work on HIVE-23069:
-

Author: ASF GitHub Bot
Created on: 15/Jul/20 09:32
Start Date: 15/Jul/20 09:32
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1225:
URL: https://github.com/apache/hive/pull/1225#discussion_r454919825



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
##
@@ -210,6 +210,66 @@ public void externalTableReplicationWithDefaultPaths() 
throws Throwable {
 assertExternalFileInfo(Arrays.asList("t2", "t3", "t4"), 
tuple.dumpLocation, true);
   }
 
+  @Test
+  public void externalTableReplicationWithDefaultPathsLazyCopy() throws 
Throwable {
+List lazyCopyClause = Arrays.asList("'" + 
HiveConf.ConfVars.REPL_DATA_COPY_LAZY.varname + "'='true'");
+//creates external tables with partitions
+WarehouseInstance.Tuple tuple = primary
+.run("use " + primaryDbName)
+.run("create external table t1 (id int)")
+.run("insert into table t1 values (1)")
+.run("insert into table t1 values (2)")
+.run("create external table t2 (place string) partitioned by 
(country string)")
+.run("insert into table t2 partition(country='india') values 
('bangalore')")
+.run("insert into table t2 partition(country='us') values 
('austin')")
+.run("insert into table t2 partition(country='france') values 
('paris')")
+.dump(primaryDbName, lazyCopyClause);
+
+// verify that the external table info is written correctly for bootstrap
+assertExternalFileInfo(Arrays.asList("t1", "t2"), tuple.dumpLocation, 
primaryDbName, false);
+
+
+
+replica.load(replicatedDbName, primaryDbName, lazyCopyClause)
+.run("use " + replicatedDbName)
+.run("show tables like 't1'")
+.verifyResult("t1")
+.run("show tables like 't2'")
+.verifyResult("t2")
+.run("repl status " + replicatedDbName)
+.verifyResult(tuple.lastReplicationId)
+.run("select country from t2 where country = 'us'")
+.verifyResult("us")
+.run("select country from t2 where country = 'france'")
+.verifyResult("france")
+.run("show partitions t2").verifyResults(new String[] 
{"country=france", "country=india", "country=us"});
+
+String hiveDumpLocation = tuple.dumpLocation + File.separator + 
ReplUtils.REPL_HIVE_BASE_DIR;
+// Ckpt should be set on bootstrapped db.
+replica.verifyIfCkptSet(replicatedDbName, hiveDumpLocation);
+
+assertTablePartitionLocation(primaryDbName + ".t1", replicatedDbName + 
".t1");
+assertTablePartitionLocation(primaryDbName + ".t2", replicatedDbName + 
".t2");
+
+tuple = primary.run("use " + primaryDbName)
+.run("create external table t3 (id int)")
+.run("insert into table t3 values (10)")
+.run("create external table t4 as select id from t3")

Review comment:
   How is that related to this patch?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 459169)
Time Spent: 4h 20m  (was: 4h 10m)

> Memory efficient iterator should be used during replication.
> 
>
> Key: HIVE-23069
> URL: https://issues.apache.org/jira/browse/HIVE-23069
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23069.01.patch, HIVE-23069.02.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Currently the iterator used while copying table data is memory based. In case 
> of a database with very large number of table/partitions, such iterator may 
> cause HS2 process to go OOM.
> Also introduces a config option to run data copy tasks during repl load 
> operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 >

1 - 100 of 128 matches

Mail list logo