[jira] [Updated] (HIVE-24473) Update HBase version to 2.1.10

2020-12-03 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HIVE-24473:
---
Attachment: (was: HIVE-24473.patch)

> Update HBase version to 2.1.10
> --
>
> Key: HIVE-24473
> URL: https://issues.apache.org/jira/browse/HIVE-24473
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 4.0.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive currently builds with a 2.0.0 pre-release.
> Update HBase to more recent version.
> We cannot use anything later than 2.2.4 because of HBASE-22394
> So the options are 2.1.10 and 2.2.4
> I suggest 2.1.10 because it's a chronologically later release, and it 
> maximises compatibility with HBase server deployments.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24473) Update HBase version to 2.1.10

2020-12-03 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth updated HIVE-24473:
---
Attachment: (was: HIVE-24473.02.patch)

> Update HBase version to 2.1.10
> --
>
> Key: HIVE-24473
> URL: https://issues.apache.org/jira/browse/HIVE-24473
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Affects Versions: 4.0.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive currently builds with a 2.0.0 pre-release.
> Update HBase to more recent version.
> We cannot use anything later than 2.2.4 because of HBASE-22394
> So the options are 2.1.10 and 2.2.4
> I suggest 2.1.10 because it's a chronologically later release, and it 
> maximises compatibility with HBase server deployments.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24433?focusedWorklogId=520054=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520054
 ]

ASF GitHub Bot logged work on HIVE-24433:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 07:49
Start Date: 04/Dec/20 07:49
Worklog Time Spent: 10m 
  Work Description: nareshpr commented on a change in pull request #1712:
URL: https://github.com/apache/hive/pull/1712#discussion_r535899436



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -2725,7 +2725,7 @@ private void insertTxnComponents(long txnid, LockRequest 
rqst, Connection dbConn
   }
   String dbName = normalizeCase(lc.getDbname());
   String tblName = normalizeCase(lc.getTablename());
-  String partName = normalizeCase(lc.getPartitionname());
+  String partName = lc.getPartitionname();

Review comment:
   I changed it to split & convert partition key to lower key.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 520054)
Time Spent: 2h  (was: 1h 50m)

> AutoCompaction is not getting triggered for CamelCase Partition Values
> --
>
> Key: HIVE-24433
> URL: https://issues.apache.org/jira/browse/HIVE-24433
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> PartionKeyValue is getting converted into lowerCase in below 2 places.
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728]
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851]
> Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries 
> from proper partition values.
> When query completes, the entry moves from TXN_COMPONENTS to 
> COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the 
> partition & considers it as invalid partition
> {code:java}
> create table abc(name string) partitioned by(city string) stored as orc 
> tblproperties('transactional'='true');
> insert into abc partition(city='Bangalore') values('aaa');
> {code}
> Example entry in COMPLETED_TXN_COMPONENTS
> {noformat}
> +---+--++---+-+-+---+
> | CTC_TXNID | CTC_DATABASE | CTC_TABLE          | CTC_PARTITION     | 
> CTC_TIMESTAMP       | CTC_WRITEID | CTC_UPDATE_DELETE |
> +---+--++---+-+-+---+
> |         2 | default      | abc    | city=bangalore    | 2020-11-25 09:26:59 
> |           1 | N                 |
> +---+--++---+-+-+---+
> {noformat}
>  
> AutoCompaction fails to get triggered with below error
> {code:java}
> 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(98)) - Checking to see if we should compact 
> default.abc.city=bangalore
> 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator 
> (Initiator.java:run(155)) - Can't find partition 
> default.compaction_test.city=bangalore, assuming it has been dropped and 
> moving on{code}
> I verifed below 4 SQL's with my PR, those all produced correct 
> PartitionKeyValue
> i.e, COMPLETED_TXN_COMPONENTS.CTC_PARTITION="city=Bangalore"
> {code:java}
> insert into table abc PARTITION(CitY='Bangalore') values('Dan');
> insert overwrite table abc partition(CiTy='Bangalore') select Name from abc;
> update table abc set Name='xy' where CiTy='Bangalore';
> delete from abc where CiTy='Bangalore';{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem

2020-12-03 Thread guojh (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guojh reassigned HIVE-24467:


Assignee: guojh

> ConditionalTask remove tasks that not selected exists thread safety problem
> ---
>
> Key: HIVE-24467
> URL: https://issues.apache.org/jira/browse/HIVE-24467
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.4
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When hive execute jobs in parallel(control by “hive.exec.parallel” 
> parameter), ConditionalTasks  remove the tasks that not selected in parallel, 
> because there are thread safety issues, some task may not remove from the 
> dependent task tree. This is a very serious bug, which causes some stage task 
> not trigger execution.
> In our production cluster, the query run three conditional task in parallel, 
> after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit 
> to runnable list for his parent Stage-31 is not done. But Stage-31 should 
> removed for it not selected.
> Stage dependencies is below:
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-41 is a root stage
>   Stage-26 depends on stages: Stage-41
>   Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, 
> Stage-2
>   Stage-39 has a backup stage: Stage-2
>   Stage-23 depends on stages: Stage-39
>   Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, 
> Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36
>   Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6
>   Stage-5
>   Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
>   Stage-51 depends on stages: Stage-0
>   Stage-4
>   Stage-6
>   Stage-7 depends on stages: Stage-6
>   Stage-40 has a backup stage: Stage-2
>   Stage-24 depends on stages: Stage-40
>   Stage-2
>   Stage-44 is a root stage
>   Stage-30 depends on stages: Stage-44
>   Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, 
> Stage-12
>   Stage-42 has a backup stage: Stage-12
>   Stage-27 depends on stages: Stage-42
>   Stage-43 has a backup stage: Stage-12
>   Stage-28 depends on stages: Stage-43
>   Stage-12
>   Stage-47 is a root stage
>   Stage-34 depends on stages: Stage-47
>   Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, 
> Stage-16
>   Stage-45 has a backup stage: Stage-16
>   Stage-31 depends on stages: Stage-45
>   Stage-46 has a backup stage: Stage-16
>   Stage-32 depends on stages: Stage-46
>   Stage-16
>   Stage-50 is a root stage
>   Stage-38 depends on stages: Stage-50
>   Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, 
> Stage-20
>   Stage-48 has a backup stage: Stage-20
>   Stage-35 depends on stages: Stage-48
>   Stage-49 has a backup stage: Stage-20
>   Stage-36 depends on stages: Stage-49
>   Stage-20
> {code}
> Stage tasks execute log is below, we can see Stage-33 is conditional task and 
> it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 
> and Stage-46 should remove from the dependent tree, Stage-31 is child of 
> Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the 
> below log, we find Stage-31 is still in the parent list of Stage-3, this 
> should not happend.
> {code:java}
> 2020-12-03T01:09:50,939  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 1 out of 17
> 2020-12-03T01:09:50,940  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-26:MAPRED] in parallel
> 2020-12-03T01:09:50,941  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 2 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-30:MAPRED] in parallel
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 3 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-34:MAPRED] in parallel
> 2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 4 out of 17
> 2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-38:MAPRED] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting 

[jira] [Work logged] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24467?focusedWorklogId=519995=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519995
 ]

ASF GitHub Bot logged work on HIVE-24467:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 04:29
Start Date: 04/Dec/20 04:29
Worklog Time Spent: 10m 
  Work Description: anishek commented on pull request #1743:
URL: https://github.com/apache/hive/pull/1743#issuecomment-738557480


   may be someone with more experience on the execution side should look at 
this, @maheshk114  can you help here ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519995)
Time Spent: 0.5h  (was: 20m)

> ConditionalTask remove tasks that not selected exists thread safety problem
> ---
>
> Key: HIVE-24467
> URL: https://issues.apache.org/jira/browse/HIVE-24467
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.4
>Reporter: guojh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When hive execute jobs in parallel(control by “hive.exec.parallel” 
> parameter), ConditionalTasks  remove the tasks that not selected in parallel, 
> because there are thread safety issues, some task may not remove from the 
> dependent task tree. This is a very serious bug, which causes some stage task 
> not trigger execution.
> In our production cluster, the query run three conditional task in parallel, 
> after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit 
> to runnable list for his parent Stage-31 is not done. But Stage-31 should 
> removed for it not selected.
> Stage dependencies is below:
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-41 is a root stage
>   Stage-26 depends on stages: Stage-41
>   Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, 
> Stage-2
>   Stage-39 has a backup stage: Stage-2
>   Stage-23 depends on stages: Stage-39
>   Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, 
> Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36
>   Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6
>   Stage-5
>   Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
>   Stage-51 depends on stages: Stage-0
>   Stage-4
>   Stage-6
>   Stage-7 depends on stages: Stage-6
>   Stage-40 has a backup stage: Stage-2
>   Stage-24 depends on stages: Stage-40
>   Stage-2
>   Stage-44 is a root stage
>   Stage-30 depends on stages: Stage-44
>   Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, 
> Stage-12
>   Stage-42 has a backup stage: Stage-12
>   Stage-27 depends on stages: Stage-42
>   Stage-43 has a backup stage: Stage-12
>   Stage-28 depends on stages: Stage-43
>   Stage-12
>   Stage-47 is a root stage
>   Stage-34 depends on stages: Stage-47
>   Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, 
> Stage-16
>   Stage-45 has a backup stage: Stage-16
>   Stage-31 depends on stages: Stage-45
>   Stage-46 has a backup stage: Stage-16
>   Stage-32 depends on stages: Stage-46
>   Stage-16
>   Stage-50 is a root stage
>   Stage-38 depends on stages: Stage-50
>   Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, 
> Stage-20
>   Stage-48 has a backup stage: Stage-20
>   Stage-35 depends on stages: Stage-48
>   Stage-49 has a backup stage: Stage-20
>   Stage-36 depends on stages: Stage-49
>   Stage-20
> {code}
> Stage tasks execute log is below, we can see Stage-33 is conditional task and 
> it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 
> and Stage-46 should remove from the dependent tree, Stage-31 is child of 
> Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the 
> below log, we find Stage-31 is still in the parent list of Stage-3, this 
> should not happend.
> {code:java}
> 2020-12-03T01:09:50,939  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 1 out of 17
> 2020-12-03T01:09:50,940  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-26:MAPRED] in parallel
> 2020-12-03T01:09:50,941  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 2 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-30:MAPRED] in parallel
> 2020-12-03T01:09:50,943  INFO 

[jira] [Work logged] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24467?focusedWorklogId=519985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519985
 ]

ASF GitHub Bot logged work on HIVE-24467:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 03:38
Start Date: 04/Dec/20 03:38
Worklog Time Spent: 10m 
  Work Description: gjhkael commented on pull request #1743:
URL: https://github.com/apache/hive/pull/1743#issuecomment-738545155


   @anishek Please review this pr. Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519985)
Time Spent: 20m  (was: 10m)

> ConditionalTask remove tasks that not selected exists thread safety problem
> ---
>
> Key: HIVE-24467
> URL: https://issues.apache.org/jira/browse/HIVE-24467
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.4
>Reporter: guojh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When hive execute jobs in parallel(control by “hive.exec.parallel” 
> parameter), ConditionalTasks  remove the tasks that not selected in parallel, 
> because there are thread safety issues, some task may not remove from the 
> dependent task tree. This is a very serious bug, which causes some stage task 
> not trigger execution.
> In our production cluster, the query run three conditional task in parallel, 
> after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit 
> to runnable list for his parent Stage-31 is not done. But Stage-31 should 
> removed for it not selected.
> Stage dependencies is below:
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-41 is a root stage
>   Stage-26 depends on stages: Stage-41
>   Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, 
> Stage-2
>   Stage-39 has a backup stage: Stage-2
>   Stage-23 depends on stages: Stage-39
>   Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, 
> Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36
>   Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6
>   Stage-5
>   Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
>   Stage-51 depends on stages: Stage-0
>   Stage-4
>   Stage-6
>   Stage-7 depends on stages: Stage-6
>   Stage-40 has a backup stage: Stage-2
>   Stage-24 depends on stages: Stage-40
>   Stage-2
>   Stage-44 is a root stage
>   Stage-30 depends on stages: Stage-44
>   Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, 
> Stage-12
>   Stage-42 has a backup stage: Stage-12
>   Stage-27 depends on stages: Stage-42
>   Stage-43 has a backup stage: Stage-12
>   Stage-28 depends on stages: Stage-43
>   Stage-12
>   Stage-47 is a root stage
>   Stage-34 depends on stages: Stage-47
>   Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, 
> Stage-16
>   Stage-45 has a backup stage: Stage-16
>   Stage-31 depends on stages: Stage-45
>   Stage-46 has a backup stage: Stage-16
>   Stage-32 depends on stages: Stage-46
>   Stage-16
>   Stage-50 is a root stage
>   Stage-38 depends on stages: Stage-50
>   Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, 
> Stage-20
>   Stage-48 has a backup stage: Stage-20
>   Stage-35 depends on stages: Stage-48
>   Stage-49 has a backup stage: Stage-20
>   Stage-36 depends on stages: Stage-49
>   Stage-20
> {code}
> Stage tasks execute log is below, we can see Stage-33 is conditional task and 
> it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 
> and Stage-46 should remove from the dependent tree, Stage-31 is child of 
> Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the 
> below log, we find Stage-31 is still in the parent list of Stage-3, this 
> should not happend.
> {code:java}
> 2020-12-03T01:09:50,939  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 1 out of 17
> 2020-12-03T01:09:50,940  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-26:MAPRED] in parallel
> 2020-12-03T01:09:50,941  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 2 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-30:MAPRED] in parallel
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 3 

[jira] [Updated] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24467:
--
Labels: pull-request-available  (was: )

> ConditionalTask remove tasks that not selected exists thread safety problem
> ---
>
> Key: HIVE-24467
> URL: https://issues.apache.org/jira/browse/HIVE-24467
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.4
>Reporter: guojh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When hive execute jobs in parallel(control by “hive.exec.parallel” 
> parameter), ConditionalTasks  remove the tasks that not selected in parallel, 
> because there are thread safety issues, some task may not remove from the 
> dependent task tree. This is a very serious bug, which causes some stage task 
> not trigger execution.
> In our production cluster, the query run three conditional task in parallel, 
> after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit 
> to runnable list for his parent Stage-31 is not done. But Stage-31 should 
> removed for it not selected.
> Stage dependencies is below:
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-41 is a root stage
>   Stage-26 depends on stages: Stage-41
>   Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, 
> Stage-2
>   Stage-39 has a backup stage: Stage-2
>   Stage-23 depends on stages: Stage-39
>   Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, 
> Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36
>   Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6
>   Stage-5
>   Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
>   Stage-51 depends on stages: Stage-0
>   Stage-4
>   Stage-6
>   Stage-7 depends on stages: Stage-6
>   Stage-40 has a backup stage: Stage-2
>   Stage-24 depends on stages: Stage-40
>   Stage-2
>   Stage-44 is a root stage
>   Stage-30 depends on stages: Stage-44
>   Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, 
> Stage-12
>   Stage-42 has a backup stage: Stage-12
>   Stage-27 depends on stages: Stage-42
>   Stage-43 has a backup stage: Stage-12
>   Stage-28 depends on stages: Stage-43
>   Stage-12
>   Stage-47 is a root stage
>   Stage-34 depends on stages: Stage-47
>   Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, 
> Stage-16
>   Stage-45 has a backup stage: Stage-16
>   Stage-31 depends on stages: Stage-45
>   Stage-46 has a backup stage: Stage-16
>   Stage-32 depends on stages: Stage-46
>   Stage-16
>   Stage-50 is a root stage
>   Stage-38 depends on stages: Stage-50
>   Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, 
> Stage-20
>   Stage-48 has a backup stage: Stage-20
>   Stage-35 depends on stages: Stage-48
>   Stage-49 has a backup stage: Stage-20
>   Stage-36 depends on stages: Stage-49
>   Stage-20
> {code}
> Stage tasks execute log is below, we can see Stage-33 is conditional task and 
> it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 
> and Stage-46 should remove from the dependent tree, Stage-31 is child of 
> Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the 
> below log, we find Stage-31 is still in the parent list of Stage-3, this 
> should not happend.
> {code:java}
> 2020-12-03T01:09:50,939  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 1 out of 17
> 2020-12-03T01:09:50,940  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-26:MAPRED] in parallel
> 2020-12-03T01:09:50,941  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 2 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-30:MAPRED] in parallel
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 3 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-34:MAPRED] in parallel
> 2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 4 out of 17
> 2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-38:MAPRED] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel
> 2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: 

[jira] [Work logged] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24467?focusedWorklogId=519984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519984
 ]

ASF GitHub Bot logged work on HIVE-24467:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 03:37
Start Date: 04/Dec/20 03:37
Worklog Time Spent: 10m 
  Work Description: gjhkael opened a new pull request #1743:
URL: https://github.com/apache/hive/pull/1743


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519984)
Remaining Estimate: 0h
Time Spent: 10m

> ConditionalTask remove tasks that not selected exists thread safety problem
> ---
>
> Key: HIVE-24467
> URL: https://issues.apache.org/jira/browse/HIVE-24467
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.4
>Reporter: guojh
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When hive execute jobs in parallel(control by “hive.exec.parallel” 
> parameter), ConditionalTasks  remove the tasks that not selected in parallel, 
> because there are thread safety issues, some task may not remove from the 
> dependent task tree. This is a very serious bug, which causes some stage task 
> not trigger execution.
> In our production cluster, the query run three conditional task in parallel, 
> after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit 
> to runnable list for his parent Stage-31 is not done. But Stage-31 should 
> removed for it not selected.
> Stage dependencies is below:
> {code:java}
> STAGE DEPENDENCIES:
>   Stage-41 is a root stage
>   Stage-26 depends on stages: Stage-41
>   Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, 
> Stage-2
>   Stage-39 has a backup stage: Stage-2
>   Stage-23 depends on stages: Stage-39
>   Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, 
> Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36
>   Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6
>   Stage-5
>   Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
>   Stage-51 depends on stages: Stage-0
>   Stage-4
>   Stage-6
>   Stage-7 depends on stages: Stage-6
>   Stage-40 has a backup stage: Stage-2
>   Stage-24 depends on stages: Stage-40
>   Stage-2
>   Stage-44 is a root stage
>   Stage-30 depends on stages: Stage-44
>   Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, 
> Stage-12
>   Stage-42 has a backup stage: Stage-12
>   Stage-27 depends on stages: Stage-42
>   Stage-43 has a backup stage: Stage-12
>   Stage-28 depends on stages: Stage-43
>   Stage-12
>   Stage-47 is a root stage
>   Stage-34 depends on stages: Stage-47
>   Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, 
> Stage-16
>   Stage-45 has a backup stage: Stage-16
>   Stage-31 depends on stages: Stage-45
>   Stage-46 has a backup stage: Stage-16
>   Stage-32 depends on stages: Stage-46
>   Stage-16
>   Stage-50 is a root stage
>   Stage-38 depends on stages: Stage-50
>   Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, 
> Stage-20
>   Stage-48 has a backup stage: Stage-20
>   Stage-35 depends on stages: Stage-48
>   Stage-49 has a backup stage: Stage-20
>   Stage-36 depends on stages: Stage-49
>   Stage-20
> {code}
> Stage tasks execute log is below, we can see Stage-33 is conditional task and 
> it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 
> and Stage-46 should remove from the dependent tree, Stage-31 is child of 
> Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the 
> below log, we find Stage-31 is still in the parent list of Stage-3, this 
> should not happend.
> {code:java}
> 2020-12-03T01:09:50,939  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 1 out of 17
> 2020-12-03T01:09:50,940  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-26:MAPRED] in parallel
> 2020-12-03T01:09:50,941  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 2 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Starting task [Stage-30:MAPRED] in parallel
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
> ql.Driver: Launching Job 3 out of 17
> 2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: 

[jira] [Updated] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem

2020-12-03 Thread guojh (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guojh updated HIVE-24467:
-
Description: 
When hive execute jobs in parallel(control by “hive.exec.parallel” parameter), 
ConditionalTasks  remove the tasks that not selected in parallel, because there 
are thread safety issues, some task may not remove from the dependent task 
tree. This is a very serious bug, which causes some stage task not trigger 
execution.

In our production cluster, the query run three conditional task in parallel, 
after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit to 
runnable list for his parent Stage-31 is not done. But Stage-31 should removed 
for it not selected.

Stage dependencies is below:
{code:java}
STAGE DEPENDENCIES:
  Stage-41 is a root stage
  Stage-26 depends on stages: Stage-41
  Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, Stage-2
  Stage-39 has a backup stage: Stage-2
  Stage-23 depends on stages: Stage-39
  Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, 
Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36
  Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6
  Stage-5
  Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
  Stage-51 depends on stages: Stage-0
  Stage-4
  Stage-6
  Stage-7 depends on stages: Stage-6
  Stage-40 has a backup stage: Stage-2
  Stage-24 depends on stages: Stage-40
  Stage-2
  Stage-44 is a root stage
  Stage-30 depends on stages: Stage-44
  Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, 
Stage-12
  Stage-42 has a backup stage: Stage-12
  Stage-27 depends on stages: Stage-42
  Stage-43 has a backup stage: Stage-12
  Stage-28 depends on stages: Stage-43
  Stage-12
  Stage-47 is a root stage
  Stage-34 depends on stages: Stage-47
  Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, 
Stage-16
  Stage-45 has a backup stage: Stage-16
  Stage-31 depends on stages: Stage-45
  Stage-46 has a backup stage: Stage-16
  Stage-32 depends on stages: Stage-46
  Stage-16
  Stage-50 is a root stage
  Stage-38 depends on stages: Stage-50
  Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, 
Stage-20
  Stage-48 has a backup stage: Stage-20
  Stage-35 depends on stages: Stage-48
  Stage-49 has a backup stage: Stage-20
  Stage-36 depends on stages: Stage-49
  Stage-20
{code}
Stage tasks execute log is below, we can see Stage-33 is conditional task and 
it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 and 
Stage-46 should remove from the dependent tree, Stage-31 is child of Stage-45 
parent of Stage-3, So, Stage-31 should removed too. As see in the below log, we 
find Stage-31 is still in the parent list of Stage-3, this should not happend.
{code:java}
2020-12-03T01:09:50,939  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 1 out of 17
2020-12-03T01:09:50,940  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-26:MAPRED] in parallel
2020-12-03T01:09:50,941  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 2 out of 17
2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-30:MAPRED] in parallel
2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 3 out of 17
2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-34:MAPRED] in parallel
2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 4 out of 17
2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-38:MAPRED] in parallel
2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel
2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel
2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-37:CONDITIONAL] in parallel
2020-12-03T01:10:34,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 5 out of 17
2020-12-03T01:10:34,947  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-16:MAPRED] in parallel
2020-12-03T01:10:34,948  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 6 out of 17
2020-12-03T01:10:34,948  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-12:MAPRED] in parallel
2020-12-03T01:10:34,949  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 7 out of 17
2020-12-03T01:10:34,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task 

[jira] [Updated] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem

2020-12-03 Thread guojh (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guojh updated HIVE-24467:
-
Description: 
When hive execute jobs in parallel(control by “hive.exec.parallel” parameter), 
ConditionalTasks  remove the tasks that not selected in parallel, because there 
are thread safety issues, some task may not remove from the dependent task 
tree. This is a very serious bug, which causes some stage task not trigger 
execution.

In our production cluster, the query run three conditional task in parallel, 
after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit to 
runnable list for his parent Stage-31 is not done. But Stage-31 should removed 
for it not selected.

Stage dependencies is below:
{code:java}
STAGE DEPENDENCIES:
  Stage-41 is a root stage
  Stage-26 depends on stages: Stage-41
  Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, Stage-2
  Stage-39 has a backup stage: Stage-2
  Stage-23 depends on stages: Stage-39
  Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, 
Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36
  Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6
  Stage-5
  Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
  Stage-51 depends on stages: Stage-0
  Stage-4
  Stage-6
  Stage-7 depends on stages: Stage-6
  Stage-40 has a backup stage: Stage-2
  Stage-24 depends on stages: Stage-40
  Stage-2
  Stage-44 is a root stage
  Stage-30 depends on stages: Stage-44
  Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, 
Stage-12
  Stage-42 has a backup stage: Stage-12
  Stage-27 depends on stages: Stage-42
  Stage-43 has a backup stage: Stage-12
  Stage-28 depends on stages: Stage-43
  Stage-12
  Stage-47 is a root stage
  Stage-34 depends on stages: Stage-47
  Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, 
Stage-16
  Stage-45 has a backup stage: Stage-16
  Stage-31 depends on stages: Stage-45
  Stage-46 has a backup stage: Stage-16
  Stage-32 depends on stages: Stage-46
  Stage-16
  Stage-50 is a root stage
  Stage-38 depends on stages: Stage-50
  Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, 
Stage-20
  Stage-48 has a backup stage: Stage-20
  Stage-35 depends on stages: Stage-48
  Stage-49 has a backup stage: Stage-20
  Stage-36 depends on stages: Stage-49
  Stage-20
{code}
Stage tasks execute log is below, we can see Stage-33 is conditional task and 
it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 and 
Stage-46 should remove from the dependent tree, Stage-31 is child of Stage-45 
parent of Stage-3, So, Stage-31 should removed too.
{code:java}
2020-12-03T01:09:50,939  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 1 out of 17
2020-12-03T01:09:50,940  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-26:MAPRED] in parallel
2020-12-03T01:09:50,941  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 2 out of 17
2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-30:MAPRED] in parallel
2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 3 out of 17
2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-34:MAPRED] in parallel
2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 4 out of 17
2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-38:MAPRED] in parallel
2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel
2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel
2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-37:CONDITIONAL] in parallel
2020-12-03T01:10:34,946  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 5 out of 17
2020-12-03T01:10:34,947  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-16:MAPRED] in parallel
2020-12-03T01:10:34,948  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 6 out of 17
2020-12-03T01:10:34,948  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-12:MAPRED] in parallel
2020-12-03T01:10:34,949  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Launching Job 7 out of 17
2020-12-03T01:10:34,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: Starting task [Stage-20:MAPRED] in parallel
2020-12-03T01:10:34,950  INFO [HiveServer2-Background-Pool: Thread-87372] 
ql.Driver: 

[jira] [Updated] (HIVE-24484) Upgrade Hadoop to 3.2.1

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24484:
--
Labels: pull-request-available  (was: )

> Upgrade Hadoop to 3.2.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.2.1

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=519956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519956
 ]

ASF GitHub Bot logged work on HIVE-24484:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 00:48
Start Date: 04/Dec/20 00:48
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1742:
URL: https://github.com/apache/hive/pull/1742


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519956)
Remaining Estimate: 0h
Time Spent: 10m

> Upgrade Hadoop to 3.2.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23891) Using UNION sql clause and speculative execution can cause file duplication in Tez

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23891?focusedWorklogId=519955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519955
 ]

ASF GitHub Bot logged work on HIVE-23891:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 00:47
Start Date: 04/Dec/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1294:
URL: https://github.com/apache/hive/pull/1294


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519955)
Time Spent: 2h 20m  (was: 2h 10m)

> Using UNION sql clause and speculative execution can cause file duplication 
> in Tez
> --
>
> Key: HIVE-23891
> URL: https://issues.apache.org/jira/browse/HIVE-23891
> Project: Hive
>  Issue Type: Bug
>Reporter: George Pachitariu
>Assignee: George Pachitariu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23891.1.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Hello, 
> the specific scenario when this can happen:
>  - the execution engine is Tez;
>  - speculative execution is on;
>  - the query inserts into a table and the last step is a UNION sql clause;
> The problem is that Tez creates an extra layer of subdirectories when there 
> is a UNION. Later, when deduplicating, Hive doesn't take that into account 
> and only deduplicates folders but not the files inside.
> So for a query like this:
> {code:sql}
> insert overwrite table union_all
> select * from union_first_part
> union all
> select * from union_second_part;
> {code}
> The folder structure afterwards will be like this (a possible example):
> {code:java}
> .../union_all/HIVE_UNION_SUBDIR_1/00_0
> .../union_all/HIVE_UNION_SUBDIR_1/00_1
> .../union_all/HIVE_UNION_SUBDIR_2/00_1
> {code}
> The attached patch increases the number of folder levels that Hive will check 
> recursively for duplicates when we have a UNION in Tez.
> Feel free to reach out if you have any questions :).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24484) Upgrade Hadoop to 3.2.1

2020-12-03 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-24484:
-


> Upgrade Hadoop to 3.2.1
> ---
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21588) Remove HBase dependency from hive-metastore

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21588?focusedWorklogId=519923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519923
 ]

ASF GitHub Bot logged work on HIVE-21588:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 00:02
Start Date: 04/Dec/20 00:02
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1723:
URL: https://github.com/apache/hive/pull/1723#issuecomment-738461352


   Merged. Thanks @wangyum !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519923)
Time Spent: 1h 40m  (was: 1.5h)

> Remove HBase dependency from hive-metastore
> ---
>
> Key: HIVE-21588
> URL: https://issues.apache.org/jira/browse/HIVE-21588
> Project: Hive
>  Issue Type: Task
>  Components: HBase Metastore
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21588.01.patch, HIVE-21588.02.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> HIVE-17234 has removed HBase metastore from master. But maven dependency have 
> not been removed. We should remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21588) Remove HBase dependency from hive-metastore

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21588?focusedWorklogId=519922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519922
 ]

ASF GitHub Bot logged work on HIVE-21588:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 00:01
Start Date: 04/Dec/20 00:01
Worklog Time Spent: 10m 
  Work Description: sunchao merged pull request #1723:
URL: https://github.com/apache/hive/pull/1723


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519922)
Time Spent: 1.5h  (was: 1h 20m)

> Remove HBase dependency from hive-metastore
> ---
>
> Key: HIVE-21588
> URL: https://issues.apache.org/jira/browse/HIVE-21588
> Project: Hive
>  Issue Type: Task
>  Components: HBase Metastore
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21588.01.patch, HIVE-21588.02.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> HIVE-17234 has removed HBase metastore from master. But maven dependency have 
> not been removed. We should remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21588) Remove HBase dependency from hive-metastore

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21588?focusedWorklogId=519921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519921
 ]

ASF GitHub Bot logged work on HIVE-21588:
-

Author: ASF GitHub Bot
Created on: 04/Dec/20 00:01
Start Date: 04/Dec/20 00:01
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #1723:
URL: https://github.com/apache/hive/pull/1723#discussion_r535735693



##
File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestWorker.java
##
@@ -17,7 +17,6 @@
  */
 package org.apache.hadoop.hive.ql.txn.compactor;
 
-import it.unimi.dsi.fastutil.booleans.AbstractBooleanBidirectionalIterator;

Review comment:
   Gotcha, cool. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519921)
Time Spent: 1h 20m  (was: 1h 10m)

> Remove HBase dependency from hive-metastore
> ---
>
> Key: HIVE-21588
> URL: https://issues.apache.org/jira/browse/HIVE-21588
> Project: Hive
>  Issue Type: Task
>  Components: HBase Metastore
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21588.01.patch, HIVE-21588.02.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> HIVE-17234 has removed HBase metastore from master. But maven dependency have 
> not been removed. We should remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21588) Remove HBase dependency from hive-metastore

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21588?focusedWorklogId=519918=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519918
 ]

ASF GitHub Bot logged work on HIVE-21588:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 23:36
Start Date: 03/Dec/20 23:36
Worklog Time Spent: 10m 
  Work Description: wangyum commented on a change in pull request #1723:
URL: https://github.com/apache/hive/pull/1723#discussion_r535725208



##
File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestWorker.java
##
@@ -17,7 +17,6 @@
  */
 package org.apache.hadoop.hive.ql.txn.compactor;
 
-import it.unimi.dsi.fastutil.booleans.AbstractBooleanBidirectionalIterator;

Review comment:
   Yes, this need `it.unimi.dsi:fastutil`, we have removed this dependency:
   
![image](https://user-images.githubusercontent.com/5399861/100756200-f6910480-3427-11eb-9919-af782b870b9e.png)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519918)
Time Spent: 1h 10m  (was: 1h)

> Remove HBase dependency from hive-metastore
> ---
>
> Key: HIVE-21588
> URL: https://issues.apache.org/jira/browse/HIVE-21588
> Project: Hive
>  Issue Type: Task
>  Components: HBase Metastore
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21588.01.patch, HIVE-21588.02.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> HIVE-17234 has removed HBase metastore from master. But maven dependency have 
> not been removed. We should remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24220) Unable to reopen a closed bug report

2020-12-03 Thread Ankur Tagra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Tagra resolved HIVE-24220.

Resolution: Won't Fix

> Unable to reopen a closed bug report
> 
>
> Key: HIVE-24220
> URL: https://issues.apache.org/jira/browse/HIVE-24220
> Project: Hive
>  Issue Type: Bug
>Reporter: Ankur Tagra
>Assignee: Ankur Tagra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HIVE-24220) Unable to reopen a closed bug report

2020-12-03 Thread Ankur Tagra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Tagra reopened HIVE-24220:


> Unable to reopen a closed bug report
> 
>
> Key: HIVE-24220
> URL: https://issues.apache.org/jira/browse/HIVE-24220
> Project: Hive
>  Issue Type: Bug
>Reporter: Ankur Tagra
>Assignee: Ankur Tagra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24220) Unable to reopen a closed bug report

2020-12-03 Thread Ankur Tagra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Tagra resolved HIVE-24220.

Resolution: Fixed

> Unable to reopen a closed bug report
> 
>
> Key: HIVE-24220
> URL: https://issues.apache.org/jira/browse/HIVE-24220
> Project: Hive
>  Issue Type: Bug
>Reporter: Ankur Tagra
>Assignee: Ankur Tagra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HIVE-24220) Unable to reopen a closed bug report

2020-12-03 Thread Ankur Tagra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Tagra reopened HIVE-24220:


> Unable to reopen a closed bug report
> 
>
> Key: HIVE-24220
> URL: https://issues.apache.org/jira/browse/HIVE-24220
> Project: Hive
>  Issue Type: Bug
>Reporter: Ankur Tagra
>Assignee: Ankur Tagra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=519874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519874
 ]

ASF GitHub Bot logged work on HIVE-24432:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 21:21
Start Date: 03/Dec/20 21:21
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1710:
URL: https://github.com/apache/hive/pull/1710#issuecomment-738326173


   @nrg4878 Review please for HMS work?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519874)
Time Spent: 1h 20m  (was: 1h 10m)

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.1

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=519872=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519872
 ]

ASF GitHub Bot logged work on HIVE-21737:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 21:12
Start Date: 03/Dec/20 21:12
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #1635:
URL: https://github.com/apache/hive/pull/1635#issuecomment-738315389


   Thanks @sunchao eager to see this finally happening !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519872)
Time Spent: 3h 40m  (was: 3.5h)

> Upgrade Avro to version 1.10.1
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24220) Unable to reopen a closed bug report

2020-12-03 Thread Ankur Tagra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Tagra resolved HIVE-24220.

Resolution: Fixed

> Unable to reopen a closed bug report
> 
>
> Key: HIVE-24220
> URL: https://issues.apache.org/jira/browse/HIVE-24220
> Project: Hive
>  Issue Type: Bug
>Reporter: Ankur Tagra
>Assignee: Ankur Tagra
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=519861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519861
 ]

ASF GitHub Bot logged work on HIVE-24468:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 20:39
Start Date: 03/Dec/20 20:39
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1728:
URL: https://github.com/apache/hive/pull/1728#issuecomment-738293741


   @pvary Just so we are clear "now()" is not an SQL function.  It's 
implemented on this DBListener class:
   
   ```
 private int now() {
   long millis = System.currentTimeMillis();
   millis /= 1000;
   if (millis > Integer.MAX_VALUE) {
 LOG.warn("We've passed max int value in seconds since the epoch, " +
 "all notification times will be the same!");
 return Integer.MAX_VALUE;
   }
   return (int)millis;
 }
   ```
   
   
https://github.com/apache/hive/blob/master/hcatalog/server-extensions/src/main/java/org/apache/hive/hcatalog/listener/DbNotificationListener.java#L941-L950



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519861)
Time Spent: 1.5h  (was: 1h 20m)

> Use Event Time instead of Current Time in Notification Log DB Entry
> ---
>
> Key: HIVE-24468
> URL: https://issues.apache.org/jira/browse/HIVE-24468
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=519849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519849
 ]

ASF GitHub Bot logged work on HIVE-24468:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 20:01
Start Date: 03/Dec/20 20:01
Worklog Time Spent: 10m 
  Work Description: belugabehr edited a comment on pull request #1728:
URL: https://github.com/apache/hive/pull/1728#issuecomment-738274162


   @pvary 
   
   I agree with your understanding that the SELECT FOR UPDATE is a lock, and 
therefore the timestamps should be always increasing, but imagine if the HMS 
clock on two instances were off by 5s (or more).  The HMS with the slower clock 
would generate events that were earlier in time, but with a higher ID.  So 
there is not strong enforcement of the time being sequential.  It's all based 
on the HMS clocks being in sync and trusting those clocks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519849)
Time Spent: 1h 20m  (was: 1h 10m)

> Use Event Time instead of Current Time in Notification Log DB Entry
> ---
>
> Key: HIVE-24468
> URL: https://issues.apache.org/jira/browse/HIVE-24468
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=519847=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519847
 ]

ASF GitHub Bot logged work on HIVE-24468:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 20:00
Start Date: 03/Dec/20 20:00
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1728:
URL: https://github.com/apache/hive/pull/1728#issuecomment-738274162


   @pvary 
   
   I agree with your understanding that the SELECT FOR UPDATE is a lock, and 
therefore the time is the same, but imagine if the HMS clock on two instances 
were off by 5s (or more).  The HMS with the slower clock would generate events 
that were earlier in time, but with a higher ID.  So there is not strong 
enforcement of the time being sequential.  It's all based on the HMS clocks 
being in sync and trusting those clocks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519847)
Time Spent: 1h 10m  (was: 1h)

> Use Event Time instead of Current Time in Notification Log DB Entry
> ---
>
> Key: HIVE-24468
> URL: https://issues.apache.org/jira/browse/HIVE-24468
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21588) Remove HBase dependency from hive-metastore

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21588?focusedWorklogId=519825=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519825
 ]

ASF GitHub Bot logged work on HIVE-21588:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 19:03
Start Date: 03/Dec/20 19:03
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #1723:
URL: https://github.com/apache/hive/pull/1723#discussion_r535502356



##
File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestWorker.java
##
@@ -17,7 +17,6 @@
  */
 package org.apache.hadoop.hive.ql.txn.compactor;
 
-import it.unimi.dsi.fastutil.booleans.AbstractBooleanBidirectionalIterator;

Review comment:
   is this related?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519825)
Time Spent: 1h  (was: 50m)

> Remove HBase dependency from hive-metastore
> ---
>
> Key: HIVE-21588
> URL: https://issues.apache.org/jira/browse/HIVE-21588
> Project: Hive
>  Issue Type: Task
>  Components: HBase Metastore
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21588.01.patch, HIVE-21588.02.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> HIVE-17234 has removed HBase metastore from master. But maven dependency have 
> not been removed. We should remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=519813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519813
 ]

ASF GitHub Bot logged work on HIVE-24470:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 18:44
Start Date: 03/Dec/20 18:44
Worklog Time Spent: 10m 
  Work Description: fenglu-g commented on pull request #1740:
URL: https://github.com/apache/hive/pull/1740#issuecomment-738212050


   @nrg4878 and others, PTAL, thanks. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519813)
Time Spent: 20m  (was: 10m)

> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=519812=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519812
 ]

ASF GitHub Bot logged work on HIVE-24468:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 18:42
Start Date: 03/Dec/20 18:42
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #1728:
URL: https://github.com/apache/hive/pull/1728#issuecomment-738211024


   > So, the answer is yes. The timestamps could be out of order.
   
   Before this patch the timestamps were in order as we locked the 
NEXT_EVENT_ID table with SELECT FOR UPDATE, so the timestamp was aligned with 
the EVENT_ID. (There might be some exceptions if some backend RDBMS reuses the 
value returned by the function now() in a single transaction, but I think we 
should overlook this for now )
   
   After this PR the timestamps could become out of order. Which is IMHO an API 
change even if the order requirement is not documented. So the users should be 
aware of this change and we should seriously consider this before proceeding.
   
   Good to have you back and starting to cleaning up these stuff!
   Thanks,
   Peter
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519812)
Time Spent: 1h  (was: 50m)

> Use Event Time instead of Current Time in Notification Log DB Entry
> ---
>
> Key: HIVE-24468
> URL: https://issues.apache.org/jira/browse/HIVE-24468
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24470:
--
Labels: pull-request-available  (was: )

> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=519811=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519811
 ]

ASF GitHub Bot logged work on HIVE-24470:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 18:41
Start Date: 03/Dec/20 18:41
Worklog Time Spent: 10m 
  Work Description: Noremac201 opened a new pull request #1740:
URL: https://github.com/apache/hive/pull/1740


   ### What changes were proposed in this pull request?
   
   1. Refactor HiveMetastore.HMSHandler into its own class
   
   ### Why are the changes needed?
   
   This will pave the way for cleaner changes since now we don't have the 
driver class nested with 10,000 line HMSHandler file so there is a clearer 
separation of duties.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Existing unit tests, building/running manually
   Not additional tests were added since this was a pure refactoring
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519811)
Remaining Estimate: 0h
Time Spent: 10m

> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24281) Unable to reopen a closed bug report

2020-12-03 Thread Ankur Tagra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Tagra resolved HIVE-24281.

Resolution: Fixed

> Unable to reopen a closed bug report
> 
>
> Key: HIVE-24281
> URL: https://issues.apache.org/jira/browse/HIVE-24281
> Project: Hive
>  Issue Type: Bug
>  Components: API
>Affects Versions: 1.2.0
>Reporter: Ankur Tagra
>Assignee: Ankur Tagra
>Priority: Trivial
>
> Unable to reopen a closed bug report



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24394) Enable printing explain to console at query start

2020-12-03 Thread Johan Gustavsson (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johan Gustavsson reassigned HIVE-24394:
---

Assignee: Jesus Camacho Rodriguez

> Enable printing explain to console at query start
> -
>
> Key: HIVE-24394
> URL: https://issues.apache.org/jira/browse/HIVE-24394
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Query Processor
>Affects Versions: 2.3.7, 3.1.2
>Reporter: Johan Gustavsson
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there is a hive.log.explain.output option that prints extended 
> explain to log. While this is helpful for internal investigations, it limits 
> the information that is available to users. So we should add options to make 
> this print non-extended explain to console,. for general user consumption, to 
> make it easier for users to debug queries and workflows without having to 
> resubmit queries with explain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HIVE-24281) Unable to reopen a closed bug report

2020-12-03 Thread Ankur Tagra (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Tagra reopened HIVE-24281:


> Unable to reopen a closed bug report
> 
>
> Key: HIVE-24281
> URL: https://issues.apache.org/jira/browse/HIVE-24281
> Project: Hive
>  Issue Type: Bug
>  Components: API
>Affects Versions: 1.2.0
>Reporter: Ankur Tagra
>Assignee: Ankur Tagra
>Priority: Trivial
> Fix For: 0.11.1
>
>
> Unable to reopen a closed bug report



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-17709) remove sun.misc.Cleaner references

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-17709?focusedWorklogId=519785=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519785
 ]

ASF GitHub Bot logged work on HIVE-17709:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 17:29
Start Date: 03/Dec/20 17:29
Worklog Time Spent: 10m 
  Work Description: abstractdog opened a new pull request #1739:
URL: https://github.com/apache/hive/pull/1739


   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519785)
Remaining Estimate: 0h
Time Spent: 10m

> remove sun.misc.Cleaner references
> --
>
> Key: HIVE-17709
> URL: https://issues.apache.org/jira/browse/HIVE-17709
> Project: Hive
>  Issue Type: Sub-task
>  Components: Build Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: László Bodor
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> according to: 
> https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36
> HADOOP-12760 will be the long term fix



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-17709) remove sun.misc.Cleaner references

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-17709:
--
Labels: pull-request-available  (was: )

> remove sun.misc.Cleaner references
> --
>
> Key: HIVE-17709
> URL: https://issues.apache.org/jira/browse/HIVE-17709
> Project: Hive
>  Issue Type: Sub-task
>  Components: Build Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> according to: 
> https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36
> HADOOP-12760 will be the long term fix



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24481) Skipped compaction can cause data corruption with streaming

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24481?focusedWorklogId=519781=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519781
 ]

ASF GitHub Bot logged work on HIVE-24481:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 17:13
Start Date: 03/Dec/20 17:13
Worklog Time Spent: 10m 
  Work Description: pvargacl opened a new pull request #1738:
URL: https://github.com/apache/hive/pull/1738


   
   ### What changes were proposed in this pull request?
   See the details in  HIVE-24481
   
   
   ### Why are the changes needed?
   Fix the data corruption issue.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Unit test
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519781)
Remaining Estimate: 0h
Time Spent: 10m

> Skipped compaction can cause data corruption with streaming
> ---
>
> Key: HIVE-24481
> URL: https://issues.apache.org/jira/browse/HIVE-24481
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: Compaction
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Timeline:
> 1. create a partitioned table, add one static partition
> 2. transaction 1 writes delta_1, and aborts
> 3. create streaming connection, with batch 3, withStaticPartitionValues with 
> the existing partition
> 4. beginTransaction, write, commitTransaction
> 5. beginTransaction, write, abortTransaction
> 6. beingTransaction, write, commitTransaction
> 7. close connection, count of the table is 2
> 8. run manual minor compaction on the partition. it will skip compaction, 
> because deltacount =1 but clean, because there is aborted txn1
> 9. cleaner will remove both aborted record from txn_components
> 10. wait for acidhousekeeper to remove empty aborted txns
> 11. select * from table return *3* records, reading the aborted record



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24481) Skipped compaction can cause data corruption with streaming

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24481:
--
Labels: Compaction pull-request-available  (was: Compaction)

> Skipped compaction can cause data corruption with streaming
> ---
>
> Key: HIVE-24481
> URL: https://issues.apache.org/jira/browse/HIVE-24481
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: Compaction, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Timeline:
> 1. create a partitioned table, add one static partition
> 2. transaction 1 writes delta_1, and aborts
> 3. create streaming connection, with batch 3, withStaticPartitionValues with 
> the existing partition
> 4. beginTransaction, write, commitTransaction
> 5. beginTransaction, write, abortTransaction
> 6. beingTransaction, write, commitTransaction
> 7. close connection, count of the table is 2
> 8. run manual minor compaction on the partition. it will skip compaction, 
> because deltacount =1 but clean, because there is aborted txn1
> 9. cleaner will remove both aborted record from txn_components
> 10. wait for acidhousekeeper to remove empty aborted txns
> 11. select * from table return *3* records, reading the aborted record



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-17709) remove sun.misc.Cleaner references

2020-12-03 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243363#comment-17243363
 ] 

László Bodor commented on HIVE-17709:
-

super-cool, let me pick it from there! that's urgent for java11 llap runtime, 
and you can take care of the rest in HIVE-22415, does it make sense to you?

> remove sun.misc.Cleaner references
> --
>
> Key: HIVE-17709
> URL: https://issues.apache.org/jira/browse/HIVE-17709
> Project: Hive
>  Issue Type: Sub-task
>  Components: Build Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: László Bodor
>Priority: Major
>
> according to: 
> https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36
> HADOOP-12760 will be the long term fix



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-17709) remove sun.misc.Cleaner references

2020-12-03 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243358#comment-17243358
 ] 

David Mollitor commented on HIVE-17709:
---

https://github.com/apache/hive/pull/1624/commits/efbdf2ab17d6f1504cfadd2a02ac9b53673b83a6

> remove sun.misc.Cleaner references
> --
>
> Key: HIVE-17709
> URL: https://issues.apache.org/jira/browse/HIVE-17709
> Project: Hive
>  Issue Type: Sub-task
>  Components: Build Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: László Bodor
>Priority: Major
>
> according to: 
> https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36
> HADOOP-12760 will be the long term fix



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.1

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=519778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519778
 ]

ASF GitHub Bot logged work on HIVE-21737:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 17:02
Start Date: 03/Dec/20 17:02
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1635:
URL: https://github.com/apache/hive/pull/1635#issuecomment-738140417


   > Do you have an estimate of where a possible vote to get the previous 
changes (and hopefully this one) released as 2.3.8?
   Thinking about Spark could take it in.
   
   I think @wangyum is still testing the combination in 
https://github.com/apache/spark/pull/30517 and we are also doing testing 
internally. Once that is done, I'll prepare for the release and start a vote.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519778)
Time Spent: 3.5h  (was: 3h 20m)

> Upgrade Avro to version 1.10.1
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-17709) remove sun.misc.Cleaner references

2020-12-03 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243356#comment-17243356
 ] 

László Bodor commented on HIVE-17709:
-

thanks, let me check! there are a bunch of commits, which one contains Cleaner 
related stuff?

> remove sun.misc.Cleaner references
> --
>
> Key: HIVE-17709
> URL: https://issues.apache.org/jira/browse/HIVE-17709
> Project: Hive
>  Issue Type: Sub-task
>  Components: Build Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: László Bodor
>Priority: Major
>
> according to: 
> https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36
> HADOOP-12760 will be the long term fix



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-17709) remove sun.misc.Cleaner references

2020-12-03 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243355#comment-17243355
 ] 

David Mollitor commented on HIVE-17709:
---

[~abstractdog] Take a look at my PR where I have already done this work:

 

https://github.com/apache/hive/pull/1624/files

> remove sun.misc.Cleaner references
> --
>
> Key: HIVE-17709
> URL: https://issues.apache.org/jira/browse/HIVE-17709
> Project: Hive
>  Issue Type: Sub-task
>  Components: Build Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: László Bodor
>Priority: Major
>
> according to: 
> https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36
> HADOOP-12760 will be the long term fix



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-17709) remove sun.misc.Cleaner references

2020-12-03 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243347#comment-17243347
 ] 

László Bodor commented on HIVE-17709:
-

thanks [~belugabehr], glad to hear that!
please let's do them independently, I'm actively working on JDK11 in-house 
(CLDR), I don't want to be blocked anymore, I'm open to dirty solutions at the 
moment and we'll clean later (copying a util class) :) 

> remove sun.misc.Cleaner references
> --
>
> Key: HIVE-17709
> URL: https://issues.apache.org/jira/browse/HIVE-17709
> Project: Hive
>  Issue Type: Sub-task
>  Components: Build Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: László Bodor
>Priority: Major
>
> according to: 
> https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36
> HADOOP-12760 will be the long term fix



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-17709) remove sun.misc.Cleaner references

2020-12-03 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243346#comment-17243346
 ] 

David Mollitor commented on HIVE-17709:
---

[~abstractdog] If I recall, I already dealt with this issue in my PR.  The 
issue at hand is that Hadoop 3.x does not itself support JDK 11 until 3.2

> remove sun.misc.Cleaner references
> --
>
> Key: HIVE-17709
> URL: https://issues.apache.org/jira/browse/HIVE-17709
> Project: Hive
>  Issue Type: Sub-task
>  Components: Build Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: László Bodor
>Priority: Major
>
> according to: 
> https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36
> HADOOP-12760 will be the long term fix



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL

2020-12-03 Thread Kishen Das (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243343#comment-17243343
 ] 

Kishen Das edited comment on HIVE-24482 at 12/3/20, 4:51 PM:
-

[~dkuzmenko] That's the idea, once we implement all the subtasks. Rather than 
going directly to DB, CachedStore is supposed to refresh the latest data from 
DB, before serving.  [~ashish-kumar-sharma] is driving the CachedStore changes. 


was (Author: kishendas):
[~dkuzmenko] That's the idea, once we implement all the subtasks. 
[~ashish-kumar-sharma] is driving the CachedStore changes. 

> Advance write Id during AlterTableAddConstraint DDL
> ---
>
> Key: HIVE-24482
> URL: https://issues.apache.org/jira/browse/HIVE-24482
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> For AlterTableAddConstraint related DDL tasks, although we might be advancing 
> the write ID, looks like it's not updated correctly during the Analyzer 
> phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL

2020-12-03 Thread Kishen Das (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243343#comment-17243343
 ] 

Kishen Das commented on HIVE-24482:
---

[~dkuzmenko] That's the idea, once we implement all the subtasks. 
[~ashish-kumar-sharma] is driving the CachedStore changes. 

> Advance write Id during AlterTableAddConstraint DDL
> ---
>
> Key: HIVE-24482
> URL: https://issues.apache.org/jira/browse/HIVE-24482
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> For AlterTableAddConstraint related DDL tasks, although we might be advancing 
> the write ID, looks like it's not updated correctly during the Analyzer 
> phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-17709) remove sun.misc.Cleaner references

2020-12-03 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243338#comment-17243338
 ] 

László Bodor commented on HIVE-17709:
-

I think we should not block jdk11 effort on the hadoop 3.2 upgrade just because 
hadoop introduced a CleanerUtil class, let's create a copy of that and use it, 
and then we'll turn back to hadoop's implementation once we upgraded
we have only 2 class references on Cleaner at the moment:
{code}
grep -iRH "import sun.misc.Cleaner"
llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java:import
 sun.misc.Cleaner;
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java:import
 sun.misc.Cleaner;
{code}

> remove sun.misc.Cleaner references
> --
>
> Key: HIVE-17709
> URL: https://issues.apache.org/jira/browse/HIVE-17709
> Project: Hive
>  Issue Type: Sub-task
>  Components: Build Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: László Bodor
>Priority: Major
>
> according to: 
> https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36
> HADOOP-12760 will be the long term fix



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL

2020-12-03 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243335#comment-17243335
 ] 

Denys Kuzmenko commented on HIVE-24482:
---

Does it mean that AlterTableAddConstraint on HMS1 is going to invalidate 
ValidWriteIdList in the CachedStore on HMS 2 and following select statement 
from this table should go db directly?

> Advance write Id during AlterTableAddConstraint DDL
> ---
>
> Key: HIVE-24482
> URL: https://issues.apache.org/jira/browse/HIVE-24482
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> For AlterTableAddConstraint related DDL tasks, although we might be advancing 
> the write ID, looks like it's not updated correctly during the Analyzer 
> phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL

2020-12-03 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243335#comment-17243335
 ] 

Denys Kuzmenko edited comment on HIVE-24482 at 12/3/20, 4:45 PM:
-

Does it mean that AlterTableAddConstraint on HMS1 is going to invalidate 
ValidWriteIdList in the CachedStore on HMS2 and following select statement from 
this table should go db directly?


was (Author: dkuzmenko):
Does it mean that AlterTableAddConstraint on HMS1 is going to invalidate 
ValidWriteIdList in the CachedStore on HMS 2 and following select statement 
from this table should go db directly?

> Advance write Id during AlterTableAddConstraint DDL
> ---
>
> Key: HIVE-24482
> URL: https://issues.apache.org/jira/browse/HIVE-24482
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> For AlterTableAddConstraint related DDL tasks, although we might be advancing 
> the write ID, looks like it's not updated correctly during the Analyzer 
> phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-17709) remove sun.misc.Cleaner references

2020-12-03 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243330#comment-17243330
 ] 

David Mollitor commented on HIVE-17709:
---

I am looking at Hadoop 3.2 upgrade in Hive right now actually.  Working on a PR.

> remove sun.misc.Cleaner references
> --
>
> Key: HIVE-17709
> URL: https://issues.apache.org/jira/browse/HIVE-17709
> Project: Hive
>  Issue Type: Sub-task
>  Components: Build Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: László Bodor
>Priority: Major
>
> according to: 
> https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36
> HADOOP-12760 will be the long term fix



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-17709) remove sun.misc.Cleaner references

2020-12-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-17709:
---

Assignee: László Bodor

> remove sun.misc.Cleaner references
> --
>
> Key: HIVE-17709
> URL: https://issues.apache.org/jira/browse/HIVE-17709
> Project: Hive
>  Issue Type: Sub-task
>  Components: Build Infrastructure
>Reporter: Zoltan Haindrich
>Assignee: László Bodor
>Priority: Major
>
> according to: 
> https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36
> HADOOP-12760 will be the long term fix



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=519762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519762
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 16:12
Start Date: 03/Dec/20 16:12
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r535372633



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -316,6 +314,30 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
   }
   fs.delete(dead, true);
 }
-return true;
+// Check if there will be more obsolete directories to clean when 
possible. We will only mark cleaned when this
+// number reaches 0.
+return getNumEventuallyObsoleteDirs(location, dirSnapshots) == 0;
+  }
+
+  /**
+   * Get the number of base/delta directories the Cleaner should remove 
eventually. If we check this after cleaning
+   * we can see if the Cleaner has further work to do in this table/partition 
directory that it hasn't been able to
+   * finish, e.g. because of an open transaction at the time of compaction.
+   * We do this by assuming that there are no open transactions anywhere and 
then calling getAcidState. If there are
+   * obsolete directories, then the Cleaner has more work to do.
+   * @param location location of table
+   * @return number of dirs left for the cleaner to clean – eventually
+   * @throws IOException
+   */
+  private int getNumEventuallyObsoleteDirs(String location, Map dirSnapshots)
+  throws IOException {
+ValidTxnList validTxnList = new ValidReadTxnList();
+//save it so that getAcidState() sees it
+conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.writeToString());
+ValidReaderWriteIdList validWriteIdList = new ValidReaderWriteIdList();
+Path locPath = new Path(location);
+AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, 
validWriteIdList,
+Ref.from(false), false, dirSnapshots);
+return dir.getObsolete().size();

Review comment:
   if it's only for versions without HIVE-23107, should it be behind the 
feature flag/schema version check? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519762)
Time Spent: 7h 10m  (was: 7h)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL

2020-12-03 Thread Kishen Das (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243303#comment-17243303
 ] 

Kishen Das commented on HIVE-24482:
---

[~dkuzmenko] Please go through -> 
[https://cwiki.apache.org/confluence/display/Hive/Synchronized+Metastore+Cache] 
. 

> Advance write Id during AlterTableAddConstraint DDL
> ---
>
> Key: HIVE-24482
> URL: https://issues.apache.org/jira/browse/HIVE-24482
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> For AlterTableAddConstraint related DDL tasks, although we might be advancing 
> the write ID, looks like it's not updated correctly during the Analyzer 
> phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL

2020-12-03 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243302#comment-17243302
 ] 

Denys Kuzmenko commented on HIVE-24482:
---

[~kishendas], qq, why should we advance write ID in this case? We are not 
changing the data. 

> Advance write Id during AlterTableAddConstraint DDL
> ---
>
> Key: HIVE-24482
> URL: https://issues.apache.org/jira/browse/HIVE-24482
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> For AlterTableAddConstraint related DDL tasks, although we might be advancing 
> the write ID, looks like it's not updated correctly during the Analyzer 
> phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24471) Add support for combiner in hash mode group aggregation

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24471:
--
Labels: pull-request-available  (was: )

> Add support for combiner in hash mode group aggregation 
> 
>
> Key: HIVE-24471
> URL: https://issues.apache.org/jira/browse/HIVE-24471
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In map side group aggregation, partial grouped aggregation is calculated to 
> reduce the data written to disk by map task. In case of hash aggregation, 
> where the input data is not sorted, hash table is used. If the hash table 
> size increases beyond configurable limit, data is flushed to disk and new 
> hash table is generated. If the reduction by hash table is less than min hash 
> aggregation reduction calculated during compile time, the map side 
> aggregation is converted to streaming mode. So if the first few batch of 
> records does not result into significant reduction, then the mode is switched 
> to streaming mode. This may have impact on performance, if the subsequent 
> batch of records have less number of distinct values. To mitigate this 
> situation, a combiner can be added to the map task after the keys are sorted. 
> This will make sure that the aggregation is done if possible and reduce the 
> data written to disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=519750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519750
 ]

ASF GitHub Bot logged work on HIVE-24471:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 15:57
Start Date: 03/Dec/20 15:57
Worklog Time Spent: 10m 
  Work Description: maheshk114 opened a new pull request #1736:
URL: https://github.com/apache/hive/pull/1736


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519750)
Remaining Estimate: 0h
Time Spent: 10m

> Add support for combiner in hash mode group aggregation 
> 
>
> Key: HIVE-24471
> URL: https://issues.apache.org/jira/browse/HIVE-24471
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In map side group aggregation, partial grouped aggregation is calculated to 
> reduce the data written to disk by map task. In case of hash aggregation, 
> where the input data is not sorted, hash table is used. If the hash table 
> size increases beyond configurable limit, data is flushed to disk and new 
> hash table is generated. If the reduction by hash table is less than min hash 
> aggregation reduction calculated during compile time, the map side 
> aggregation is converted to streaming mode. So if the first few batch of 
> records does not result into significant reduction, then the mode is switched 
> to streaming mode. This may have impact on performance, if the subsequent 
> batch of records have less number of distinct values. To mitigate this 
> situation, a combiner can be added to the map task after the keys are sorted. 
> This will make sure that the aggregation is done if possible and reduce the 
> data written to disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24481) Skipped compaction can cause data corruption with streaming

2020-12-03 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-24481:
--
Labels: Compaction  (was: )

> Skipped compaction can cause data corruption with streaming
> ---
>
> Key: HIVE-24481
> URL: https://issues.apache.org/jira/browse/HIVE-24481
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: Compaction
>
> Timeline:
> 1. create a partitioned table, add one static partition
> 2. transaction 1 writes delta_1, and aborts
> 3. create streaming connection, with batch 3, withStaticPartitionValues with 
> the existing partition
> 4. beginTransaction, write, commitTransaction
> 5. beginTransaction, write, abortTransaction
> 6. beingTransaction, write, commitTransaction
> 7. close connection, count of the table is 2
> 8. run manual minor compaction on the partition. it will skip compaction, 
> because deltacount =1 but clean, because there is aborted txn1
> 9. cleaner will remove both aborted record from txn_components
> 10. wait for acidhousekeeper to remove empty aborted txns
> 11. select * from table return *3* records, reading the aborted record



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21737) Upgrade Avro to version 1.10.1

2020-12-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated HIVE-21737:

Summary: Upgrade Avro to version 1.10.1  (was: Upgrade Avro to version 
1.10.0)

> Upgrade Avro to version 1.10.1
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=519740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519740
 ]

ASF GitHub Bot logged work on HIVE-21737:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 15:52
Start Date: 03/Dec/20 15:52
Worklog Time Spent: 10m 
  Work Description: iemejia commented on a change in pull request #1635:
URL: https://github.com/apache/hive/pull/1635#discussion_r535356472



##
File path: llap-tez/pom.xml
##
@@ -104,6 +104,11 @@
   hadoop-yarn-registry
   true
 
+
+  org.xerial.snappy
+  snappy-java

Review comment:
   Snappy and zstd are now optional on Avro, you probably have these 
already defined somewhere else but the tests in this module were complaining. I 
don't know if this could imply some runtime issue in other parts of the 
codebase that tests may not match, so worth to think about this if it is 
eventually the case.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519740)
Time Spent: 3h 20m  (was: 3h 10m)

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=519736=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519736
 ]

ASF GitHub Bot logged work on HIVE-21737:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 15:50
Start Date: 03/Dec/20 15:50
Worklog Time Spent: 10m 
  Work Description: iemejia commented on a change in pull request #1635:
URL: https://github.com/apache/hive/pull/1635#discussion_r535354656



##
File path: ql/pom.xml
##
@@ -220,7 +220,7 @@
 
   org.apache.avro
   avro-mapred
-  hadoop2
+  ${avro.version}

Review comment:
   Updated thx





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519736)
Time Spent: 3h 10m  (was: 3h)

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL

2020-12-03 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das reassigned HIVE-24482:
-

Assignee: Kishen Das

> Advance write Id during AlterTableAddConstraint DDL
> ---
>
> Key: HIVE-24482
> URL: https://issues.apache.org/jira/browse/HIVE-24482
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> For AlterTableAddConstraint related DDL tasks, although we might be advancing 
> the write ID, looks like it's not updated correctly during the Analyzer 
> phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL

2020-12-03 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24482 started by Kishen Das.
-
> Advance write Id during AlterTableAddConstraint DDL
> ---
>
> Key: HIVE-24482
> URL: https://issues.apache.org/jira/browse/HIVE-24482
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> For AlterTableAddConstraint related DDL tasks, although we might be advancing 
> the write ID, looks like it's not updated correctly during the Analyzer 
> phase. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24481) Skipped compaction can cause data corruption with streaming

2020-12-03 Thread Peter Varga (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Varga reassigned HIVE-24481:
--


> Skipped compaction can cause data corruption with streaming
> ---
>
> Key: HIVE-24481
> URL: https://issues.apache.org/jira/browse/HIVE-24481
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>
> Timeline:
> 1. create a partitioned table, add one static partition
> 2. transaction 1 writes delta_1, and aborts
> 3. create streaming connection, with batch 3, withStaticPartitionValues with 
> the existing partition
> 4. beginTransaction, write, commitTransaction
> 5. beginTransaction, write, abortTransaction
> 6. beingTransaction, write, commitTransaction
> 7. close connection, count of the table is 2
> 8. run manual minor compaction on the partition. it will skip compaction, 
> because deltacount =1 but clean, because there is aborted txn1
> 9. cleaner will remove both aborted record from txn_components
> 10. wait for acidhousekeeper to remove empty aborted txns
> 11. select * from table return *3* records, reading the aborted record



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=519730=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519730
 ]

ASF GitHub Bot logged work on HIVE-21737:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 15:36
Start Date: 03/Dec/20 15:36
Worklog Time Spent: 10m 
  Work Description: wangyum commented on a change in pull request #1635:
URL: https://github.com/apache/hive/pull/1635#discussion_r535340015



##
File path: ql/pom.xml
##
@@ -220,7 +220,7 @@
 
   org.apache.avro
   avro-mapred
-  hadoop2
+  ${avro.version}

Review comment:
   Do not need version here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519730)
Time Spent: 3h  (was: 2h 50m)

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519718=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519718
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 15:19
Start Date: 03/Dec/20 15:19
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1714:
URL: https://github.com/apache/hive/pull/1714#issuecomment-738072615


   > @kgyrtkirk The previous issue was not due to flakiness. The schema of the 
metastore changed between the time that pre-commit tests were run and the time 
that this PR was merged to master. To avoid a similar situation the PR should 
be merged ASAP after running the pre-commit with the tip of the master.
   
   okay; the last precommit run for this changeset was executed on 11.27 - 
which was a few days ago ;lets wait for the 3 runs I've scheduled and merge it 
right after those



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519718)
Time Spent: 7.5h  (was: 7h 20m)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519717
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 15:18
Start Date: 03/Dec/20 15:18
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #1714:
URL: https://github.com/apache/hive/pull/1714#discussion_r535322152



##
File path: 
standalone-metastore/metastore-server/src/test/resources/sql/postgres/upgrade-3.1.3000-to-4.0.0.postgres.sql
##
@@ -0,0 +1,77 @@
+-- The file has some overlapping with upgrade-3.2.0-to-4.0.0.postgres.sql
+SELECT 'Upgrading MetaStore schema from 3.1.3000 to 4.0.0';

Review comment:
   I created HIVE-24480 for that purpose.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519717)
Time Spent: 7h 20m  (was: 7h 10m)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-12-03 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243250#comment-17243250
 ] 

David Mollitor commented on HIVE-21737:
---

Also, some of the work I've done:

 
 # AVRO-2335: Drop dependency on JODA Time
 # AVRO-2333: Drop commons-codec dependency
 # AVRO-2333: Drop commons-logging dependency
 # AVRO-2061: Better error messages
 # AVRO-2056: Better performance with Double types
 # AVRO-2696: Better performance for Doubles and Floats
 # AVRO-2801: Better performance when using Strings in Maps
 # Lots of other small improvements

 

In particular, AVRO-2335, AVRO-2333, AVRO-2061 was based on my experience with 
Hive and Avro integration.

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24474) Failed compaction always logs TxnAbortedException (again)

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24474?focusedWorklogId=519710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519710
 ]

ASF GitHub Bot logged work on HIVE-24474:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 15:06
Start Date: 03/Dec/20 15:06
Worklog Time Spent: 10m 
  Work Description: klcopp opened a new pull request #1735:
URL: https://github.com/apache/hive/pull/1735


   ### What changes were proposed in this pull request?
   ### Why are the changes needed?
   See HIVE-24474
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Manually, since the TxnAbortedException only appears in the logs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519710)
Remaining Estimate: 0h
Time Spent: 10m

> Failed compaction always logs TxnAbortedException (again)
> -
>
> Key: HIVE-24474
> URL: https://issues.apache.org/jira/browse/HIVE-24474
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Re-introduced with HIVE-24096.
> If there is an error during compaction, the compaction's txn is aborted but 
> in the finally clause, we try to commit it (commitTxnIfSet), so Worker throws 
> a TxnAbortedException.
> We should set compactorTxnId to TXN_ID_NOT_SET if the compaction's txn is 
> aborted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24474) Failed compaction always logs TxnAbortedException (again)

2020-12-03 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-24474:
-
Summary: Failed compaction always logs TxnAbortedException (again)  (was: 
Failed compaction always throws TxnAbortedException (again))

> Failed compaction always logs TxnAbortedException (again)
> -
>
> Key: HIVE-24474
> URL: https://issues.apache.org/jira/browse/HIVE-24474
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
> Fix For: 4.0.0
>
>
> Re-introduced with HIVE-24096.
> If there is an error during compaction, the compaction's txn is aborted but 
> in the finally clause, we try to commit it (commitTxnIfSet), so Worker throws 
> a TxnAbortedException.
> We should set compactorTxnId to TXN_ID_NOT_SET if the compaction's txn is 
> aborted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24474) Failed compaction always logs TxnAbortedException (again)

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24474:
--
Labels: pull-request-available  (was: )

> Failed compaction always logs TxnAbortedException (again)
> -
>
> Key: HIVE-24474
> URL: https://issues.apache.org/jira/browse/HIVE-24474
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Re-introduced with HIVE-24096.
> If there is an error during compaction, the compaction's txn is aborted but 
> in the finally clause, we try to commit it (commitTxnIfSet), so Worker throws 
> a TxnAbortedException.
> We should set compactorTxnId to TXN_ID_NOT_SET if the compaction's txn is 
> aborted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24479) Introduce setting to set lower bound of hash aggregation reduction.

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24479?focusedWorklogId=519709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519709
 ]

ASF GitHub Bot logged work on HIVE-24479:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 15:05
Start Date: 03/Dec/20 15:05
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #1734:
URL: https://github.com/apache/hive/pull/1734


   ### What changes were proposed in this pull request?
   Introduce a new HiveConf setting to set lower bound of hash aggregation 
reduction. The default value is 0.5.
   During query compilation hash aggregation reduction is adjusted by 
calculating its effectiveness.
   With this patch if the adjusted reduction value is less than the configured 
lower bound the lower bound value will be used.
   
   ### Why are the changes needed?
   Some cases we end up with 0 values forcing the Group by operator to skip 
Hash aggregates and chooses streaming mode.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519709)
Remaining Estimate: 0h
Time Spent: 10m

> Introduce setting to set lower bound of hash aggregation reduction.
> ---
>
> Key: HIVE-24479
> URL: https://issues.apache.org/jira/browse/HIVE-24479
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 4.0.0
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Default setting of hash group by min reduction % is 0.99. 
> * During compilation, we check its effectiveness and adjust it accordingly in 
> {{SetHashGroupByMinReduction}}:
> {code}
> float defaultMinReductionHashAggrFactor = desc.getMinReductionHashAggr();
> float minReductionHashAggrFactor = 1f - ((float) ndvProduct / numRows);
> if (minReductionHashAggrFactor < defaultMinReductionHashAggrFactor) {
>   desc.setMinReductionHashAggr(minReductionHashAggrFactor);
> }
> {code}
> For certain queries, this computation turns out to be "0".
> This forces operator to skip HashAggregates completely and always ends up 
> choosing streaming mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24479) Introduce setting to set lower bound of hash aggregation reduction.

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24479:
--
Labels: pull-request-available  (was: )

> Introduce setting to set lower bound of hash aggregation reduction.
> ---
>
> Key: HIVE-24479
> URL: https://issues.apache.org/jira/browse/HIVE-24479
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 4.0.0
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Default setting of hash group by min reduction % is 0.99. 
> * During compilation, we check its effectiveness and adjust it accordingly in 
> {{SetHashGroupByMinReduction}}:
> {code}
> float defaultMinReductionHashAggrFactor = desc.getMinReductionHashAggr();
> float minReductionHashAggrFactor = 1f - ((float) ndvProduct / numRows);
> if (minReductionHashAggrFactor < defaultMinReductionHashAggrFactor) {
>   desc.setMinReductionHashAggr(minReductionHashAggrFactor);
> }
> {code}
> For certain queries, this computation turns out to be "0".
> This forces operator to skip HashAggregates completely and always ends up 
> choosing streaming mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24346) Store HPL/SQL packages into HMS

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24346?focusedWorklogId=519706=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519706
 ]

ASF GitHub Bot logged work on HIVE-24346:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:56
Start Date: 03/Dec/20 14:56
Worklog Time Spent: 10m 
  Work Description: zeroflag opened a new pull request #1733:
URL: https://github.com/apache/hive/pull/1733


   HPLSQL procedures are already stored in HMS but packages wasn't. This patch 
addresses this and makes use of HMS as a storage backend for HPLSQL packages.
   
   The whole package code is stored in the RDBMS as a text. When the client 
references a hplsql will look it up in HMS via the thrift API.
   
   PL/SQL allows us to define the package header and the implementation 
separately, or change the body later, therefore there are 2 columns in the 
table, one for the header and one for the body.
   
   cc: @kgyrtkirk 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519706)
Remaining Estimate: 0h
Time Spent: 10m

> Store HPL/SQL packages into HMS
> ---
>
> Key: HIVE-24346
> URL: https://issues.apache.org/jira/browse/HIVE-24346
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24346) Store HPL/SQL packages into HMS

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24346:
--
Labels: pull-request-available  (was: )

> Store HPL/SQL packages into HMS
> ---
>
> Key: HIVE-24346
> URL: https://issues.apache.org/jira/browse/HIVE-24346
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=519705=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519705
 ]

ASF GitHub Bot logged work on HIVE-21737:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:54
Start Date: 03/Dec/20 14:54
Worklog Time Spent: 10m 
  Work Description: iemejia edited a comment on pull request #1635:
URL: https://github.com/apache/hive/pull/1635#issuecomment-738045721


   Do you have an estimate of where a possible vote to get the previous changes 
(and hopefully this one) released as 2.3.8?
   Thinking about Spark could take it in.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519705)
Time Spent: 2h 50m  (was: 2h 40m)

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=519704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519704
 ]

ASF GitHub Bot logged work on HIVE-21737:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:54
Start Date: 03/Dec/20 14:54
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #1635:
URL: https://github.com/apache/hive/pull/1635#issuecomment-738045721


   Do you have an estimate of where a possible vote to get the previous changes 
(and hopefully this one) released as 2.3.8?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519704)
Time Spent: 2h 40m  (was: 2.5h)

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24479) Introduce setting to set lower bound of hash aggregation reduction.

2020-12-03 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-24479:
-


> Introduce setting to set lower bound of hash aggregation reduction.
> ---
>
> Key: HIVE-24479
> URL: https://issues.apache.org/jira/browse/HIVE-24479
> Project: Hive
>  Issue Type: Improvement
>  Components: Physical Optimizer
>Affects Versions: 4.0.0
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
>
> * Default setting of hash group by min reduction % is 0.99. 
> * During compilation, we check its effectiveness and adjust it accordingly in 
> {{SetHashGroupByMinReduction}}:
> {code}
> float defaultMinReductionHashAggrFactor = desc.getMinReductionHashAggr();
> float minReductionHashAggrFactor = 1f - ((float) ndvProduct / numRows);
> if (minReductionHashAggrFactor < defaultMinReductionHashAggrFactor) {
>   desc.setMinReductionHashAggr(minReductionHashAggrFactor);
> }
> {code}
> For certain queries, this computation turns out to be "0".
> This forces operator to skip HashAggregates completely and always ends up 
> choosing streaming mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.0

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=519702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519702
 ]

ASF GitHub Bot logged work on HIVE-21737:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:52
Start Date: 03/Dec/20 14:52
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #1635:
URL: https://github.com/apache/hive/pull/1635#issuecomment-738044346


   PR updated to the last version of Avro let's see if this gets us more tests 
passing now. :crossed_fingers: @sunchao @wangyum



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519702)
Time Spent: 2.5h  (was: 2h 20m)

> Upgrade Avro to version 1.10.0
> --
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without 
> Jackson in the public API and Guava as a dependency. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24478) Inner GroupBy with Distinct SemanticException: Invalid column reference

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24478?focusedWorklogId=519699=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519699
 ]

ASF GitHub Bot logged work on HIVE-24478:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:48
Start Date: 03/Dec/20 14:48
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #1732:
URL: https://github.com/apache/hive/pull/1732


   Change-Id: I29000afd1c47e59d07db74a212a7629e2b5afe73
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519699)
Remaining Estimate: 0h
Time Spent: 10m

> Inner GroupBy with Distinct SemanticException: Invalid column reference
> ---
>
> Key: HIVE-24478
> URL: https://issues.apache.org/jira/browse/HIVE-24478
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java}
> CREATE TABLE tmp_src1(
>   `npp` string,
>   `nsoc` string) stored as orc;
> INSERT INTO tmp_src1 (npp,nsoc) VALUES ('1-1000CG61', '7273111');
> SELECT `min_nsoc`
> FROM
>  (SELECT `npp`,
>  MIN(`nsoc`) AS `min_nsoc`,
>  COUNT(DISTINCT `nsoc`) AS `nb_nsoc`
>   FROM tmp_src1
>   GROUP BY `npp`) `a`
> WHERE `nb_nsoc` > 0;
> {code}
> Issue:
> {code:java}
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid column 
> reference 'nsoc' at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator1(SemanticAnalyzer.java:5405)
> {code}
> Query runs fine when we include `nb_nsoc` in the Select expression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24478) Inner GroupBy with Distinct SemanticException: Invalid column reference

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24478:
--
Labels: pull-request-available  (was: )

> Inner GroupBy with Distinct SemanticException: Invalid column reference
> ---
>
> Key: HIVE-24478
> URL: https://issues.apache.org/jira/browse/HIVE-24478
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java}
> CREATE TABLE tmp_src1(
>   `npp` string,
>   `nsoc` string) stored as orc;
> INSERT INTO tmp_src1 (npp,nsoc) VALUES ('1-1000CG61', '7273111');
> SELECT `min_nsoc`
> FROM
>  (SELECT `npp`,
>  MIN(`nsoc`) AS `min_nsoc`,
>  COUNT(DISTINCT `nsoc`) AS `nb_nsoc`
>   FROM tmp_src1
>   GROUP BY `npp`) `a`
> WHERE `nb_nsoc` > 0;
> {code}
> Issue:
> {code:java}
> org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid column 
> reference 'nsoc' at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator1(SemanticAnalyzer.java:5405)
> {code}
> Query runs fine when we include `nb_nsoc` in the Select expression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24230) Integrate HPL/SQL into HiveServer2

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24230?focusedWorklogId=519698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519698
 ]

ASF GitHub Bot logged work on HIVE-24230:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:44
Start Date: 03/Dec/20 14:44
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1633:
URL: https://github.com/apache/hive/pull/1633


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519698)
Time Spent: 3h  (was: 2h 50m)

> Integrate HPL/SQL into HiveServer2
> --
>
> Key: HIVE-24230
> URL: https://issues.apache.org/jira/browse/HIVE-24230
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> HPL/SQL is a standalone command line program that can store and load scripts 
> from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL 
> depends on Hive and not the other way around.
> Changing the dependency order between HPL/SQL and HiveServer would open up 
> some possibilities which are currently not feasable to implement. For example 
> one might want to use a third party SQL tool to run selects on stored 
> procedure (or rather function in this case) outputs.
> {code:java}
> SELECT * from myStoredProcedure(1, 2); {code}
> HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not 
> work with the current architecture.
> Another important factor is performance. Declarative SQL commands are sent to 
> Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC 
> and use HiveSever’s internal API for compilation and execution.
> The third factor is that existing tools like Beeline or Hue cannot be used 
> with HPL/SQL since it has its own, separated CLI.
>  
> To make it easier to implement, we keep things separated in the inside at 
> first, by introducing a hive session level JDBC parameter.
> {code:java}
> jdbc:hive2://localhost:1/default;hplsqlMode=true {code}
>  
> The hplsqlMode indicates that we are in procedural SQL mode where the user 
> can create and call stored procedures. HPLSQL allows you to write any kind of 
> procedural statement at the top level. This patch doesn't limit this but it 
> might be better to eventually restrict what statements are allowed outside of 
> stored procedures.
>  
> Since HPLSQL and Hive are running in the same process there is no need to use 
> the JDBC driver between them. The patch adds an abstraction with 2 different 
> implementations, one for executing queries on JDBC (for keeping the existing 
> behaviour) and another one for directly calling Hive's compiler. In HPLSQL 
> mode the latter is used.
> In the inside a new operation (HplSqlOperation) and operation type 
> (PROCEDURAL_SQL) was added which works similar to the SQLOperation but it 
> uses the hplsql interpreter to execute arbitrary scripts. This operation 
> might spawns new SQLOpertions.
> For example consider the following statement:
> {code:java}
> FOR i in 1..10 LOOP   
>   SELECT * FROM table 
> END LOOP;{code}
> We send this to beeline while we'er in hplsql mode. Hive will create a hplsql 
> interpreter and store it in the session state. A new HplSqlOperation is 
> created to run the script on the interpreter.
> HPLSQL knows how to execute the for loop, but i'll call Hive to run the 
> select expression. The HplSqlOperation is notified when the select reads a 
> row and accumulates the rows into a RowSet (memory consumption need to be 
> considered here) which can be retrieved via thrift from the client side.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24230) Integrate HPL/SQL into HiveServer2

2020-12-03 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24230.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

merged into master. Thank you [~amagyar]!

> Integrate HPL/SQL into HiveServer2
> --
>
> Key: HIVE-24230
> URL: https://issues.apache.org/jira/browse/HIVE-24230
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> HPL/SQL is a standalone command line program that can store and load scripts 
> from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL 
> depends on Hive and not the other way around.
> Changing the dependency order between HPL/SQL and HiveServer would open up 
> some possibilities which are currently not feasable to implement. For example 
> one might want to use a third party SQL tool to run selects on stored 
> procedure (or rather function in this case) outputs.
> {code:java}
> SELECT * from myStoredProcedure(1, 2); {code}
> HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not 
> work with the current architecture.
> Another important factor is performance. Declarative SQL commands are sent to 
> Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC 
> and use HiveSever’s internal API for compilation and execution.
> The third factor is that existing tools like Beeline or Hue cannot be used 
> with HPL/SQL since it has its own, separated CLI.
>  
> To make it easier to implement, we keep things separated in the inside at 
> first, by introducing a hive session level JDBC parameter.
> {code:java}
> jdbc:hive2://localhost:1/default;hplsqlMode=true {code}
>  
> The hplsqlMode indicates that we are in procedural SQL mode where the user 
> can create and call stored procedures. HPLSQL allows you to write any kind of 
> procedural statement at the top level. This patch doesn't limit this but it 
> might be better to eventually restrict what statements are allowed outside of 
> stored procedures.
>  
> Since HPLSQL and Hive are running in the same process there is no need to use 
> the JDBC driver between them. The patch adds an abstraction with 2 different 
> implementations, one for executing queries on JDBC (for keeping the existing 
> behaviour) and another one for directly calling Hive's compiler. In HPLSQL 
> mode the latter is used.
> In the inside a new operation (HplSqlOperation) and operation type 
> (PROCEDURAL_SQL) was added which works similar to the SQLOperation but it 
> uses the hplsql interpreter to execute arbitrary scripts. This operation 
> might spawns new SQLOpertions.
> For example consider the following statement:
> {code:java}
> FOR i in 1..10 LOOP   
>   SELECT * FROM table 
> END LOOP;{code}
> We send this to beeline while we'er in hplsql mode. Hive will create a hplsql 
> interpreter and store it in the session state. A new HplSqlOperation is 
> created to run the script on the interpreter.
> HPLSQL knows how to execute the for loop, but i'll call Hive to run the 
> select expression. The HplSqlOperation is notified when the select reads a 
> row and accumulates the rows into a RowSet (memory consumption need to be 
> considered here) which can be retrieved via thrift from the client side.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519687
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:37
Start Date: 03/Dec/20 14:37
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1714:
URL: https://github.com/apache/hive/pull/1714#discussion_r535281104



##
File path: 
standalone-metastore/metastore-server/src/test/resources/sql/postgres/upgrade-3.1.3000-to-4.0.0.postgres.sql
##
@@ -0,0 +1,77 @@
+-- The file has some overlapping with upgrade-3.2.0-to-4.0.0.postgres.sql
+SELECT 'Upgrading MetaStore schema from 3.1.3000 to 4.0.0';

Review comment:
   You go with this, but I think we would need the follow up soon, this two 
upgrade path will cause some confusions :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519687)
Time Spent: 7h 10m  (was: 7h)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519682
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:33
Start Date: 03/Dec/20 14:33
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #1714:
URL: https://github.com/apache/hive/pull/1714#discussion_r535276481



##
File path: 
standalone-metastore/metastore-server/src/test/resources/sql/postgres/upgrade-3.1.3000-to-4.0.0.postgres.sql
##
@@ -0,0 +1,77 @@
+-- The file has some overlapping with upgrade-3.2.0-to-4.0.0.postgres.sql
+SELECT 'Upgrading MetaStore schema from 3.1.3000 to 4.0.0';

Review comment:
   Sure we can try do this but given that it might take some time I would 
rather leave it as a follow up. WDYT?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519682)
Time Spent: 7h  (was: 6h 50m)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519678=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519678
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:25
Start Date: 03/Dec/20 14:25
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1714:
URL: https://github.com/apache/hive/pull/1714#discussion_r535268741



##
File path: 
standalone-metastore/metastore-server/src/test/resources/sql/postgres/upgrade-3.1.3000-to-4.0.0.postgres.sql
##
@@ -0,0 +1,77 @@
+-- The file has some overlapping with upgrade-3.2.0-to-4.0.0.postgres.sql
+SELECT 'Upgrading MetaStore schema from 3.1.3000 to 4.0.0';

Review comment:
   How big is the differences between 3.2.0 and 3.1.3000? Is it not 
possible to manually apply changes to the schema to be 3.2.0 in the image and 
then use 3.2.0 -> 4.0.0 update path?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519678)
Time Spent: 6h 50m  (was: 6h 40m)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24460) Refactor Get Next Event ID for DbNotificationListener

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24460?focusedWorklogId=519672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519672
 ]

ASF GitHub Bot logged work on HIVE-24460:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:22
Start Date: 03/Dec/20 14:22
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1725:
URL: https://github.com/apache/hive/pull/1725#issuecomment-738024630


   @pvary @nrg4878 Can you please take a look at this one too?  I am doing 
quite a bit of work within this class.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519672)
Time Spent: 40m  (was: 0.5h)

> Refactor Get Next Event ID for DbNotificationListener
> -
>
> Key: HIVE-24460
> URL: https://issues.apache.org/jira/browse/HIVE-24460
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Refactor event ID generation to match notification log ID generation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24460) Refactor Get Next Event ID for DbNotificationListener

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24460?focusedWorklogId=519674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519674
 ]

ASF GitHub Bot logged work on HIVE-24460:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:22
Start Date: 03/Dec/20 14:22
Worklog Time Spent: 10m 
  Work Description: belugabehr edited a comment on pull request #1725:
URL: https://github.com/apache/hive/pull/1725#issuecomment-738024630


   @pvary @nrg4878 Can you please take a look at this one too?  I am doing 
quite a bit of work within this class in this, and other, PRs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519674)
Time Spent: 50m  (was: 40m)

> Refactor Get Next Event ID for DbNotificationListener
> -
>
> Key: HIVE-24460
> URL: https://issues.apache.org/jira/browse/HIVE-24460
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Refactor event ID generation to match notification log ID generation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=519673=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519673
 ]

ASF GitHub Bot logged work on HIVE-24432:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:22
Start Date: 03/Dec/20 14:22
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1710:
URL: https://github.com/apache/hive/pull/1710


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519673)
Time Spent: 1h 10m  (was: 1h)

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519669=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519669
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:20
Start Date: 03/Dec/20 14:20
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #1714:
URL: https://github.com/apache/hive/pull/1714#discussion_r535264287



##
File path: 
standalone-metastore/metastore-server/src/test/resources/sql/postgres/upgrade-3.1.3000-to-4.0.0.postgres.sql
##
@@ -0,0 +1,77 @@
+-- The file has some overlapping with upgrade-3.2.0-to-4.0.0.postgres.sql
+SELECT 'Upgrading MetaStore schema from 3.1.3000 to 4.0.0';

Review comment:
   Yes, that's the idea for now; I couldn't think of a better alternative 
at the moment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519669)
Time Spent: 6h 40m  (was: 6.5h)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=519668=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519668
 ]

ASF GitHub Bot logged work on HIVE-24432:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:19
Start Date: 03/Dec/20 14:19
Worklog Time Spent: 10m 
  Work Description: belugabehr closed pull request #1710:
URL: https://github.com/apache/hive/pull/1710


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519668)
Time Spent: 1h  (was: 50m)

> Delete Notification Events in Batches
> -
>
> Key: HIVE-24432
> URL: https://issues.apache.org/jira/browse/HIVE-24432
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Notification events are loaded in batches (reduces memory pressure on the 
> HMS), but all of the deletes happen under a single transactions and, when 
> deleting many records, can put a lot of pressure on the backend database.
> Instead, delete events in batches (in different transactions) as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=519667=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519667
 ]

ASF GitHub Bot logged work on HIVE-24468:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:18
Start Date: 03/Dec/20 14:18
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #1728:
URL: https://github.com/apache/hive/pull/1728#issuecomment-738021939


   @pvary Thanks for the review.
   
   So, the answer is yes.  The timestamps could be out of order.
   
   If two instances of HMS are running at the same time, and let's say they 
both create events at times T and T+1.
   
   The HMS which generates the event at time T could experience a long GC and 
then try to submit it to the DB.  At that point, the event at T+1 is going to 
be submitted first to the table, and receive a lower ID.
   
   However, there does not seem to be any documentation around this constraint.
   
   1. Is there docs somewhere that state that the event times will always be 
increasing from one record to the next?
   2. Isn't it a bit confusing that they are assigned an arbitrary time that 
masks that true event time (debugging, audit issues)?
   3. The time stamps are generated using each HMS's "now" time, which could 
possibly not be adequately synced across HMS instances and this issue of 
in-order timestamps is in jeopardy.  If in-order timestamps are a requirement, 
they should be generated using the `now()` of the SQL server itself as a single 
source of "now" truth.
   
   Thanks!
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519667)
Time Spent: 50m  (was: 40m)

> Use Event Time instead of Current Time in Notification Log DB Entry
> ---
>
> Key: HIVE-24468
> URL: https://issues.apache.org/jira/browse/HIVE-24468
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519665
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:12
Start Date: 03/Dec/20 14:12
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1714:
URL: https://github.com/apache/hive/pull/1714#discussion_r535258754



##
File path: 
standalone-metastore/metastore-server/src/test/resources/sql/postgres/upgrade-3.1.3000-to-4.0.0.postgres.sql
##
@@ -0,0 +1,77 @@
+-- The file has some overlapping with upgrade-3.2.0-to-4.0.0.postgres.sql
+SELECT 'Upgrading MetaStore schema from 3.1.3000 to 4.0.0';

Review comment:
   @zabetak qq: will this file be maintained  from now on parallel with 
upgrade-3.2.0-to-4.0.0.postgres.sql ? Is this needed for upstream?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519665)
Time Spent: 6.5h  (was: 6h 20m)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24475) Generalize fixacidkeyindex utility

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24475?focusedWorklogId=519658=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519658
 ]

ASF GitHub Bot logged work on HIVE-24475:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:03
Start Date: 03/Dec/20 14:03
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on a change in pull request #1730:
URL: https://github.com/apache/hive/pull/1730#discussion_r535251574



##
File path: ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFixAcidKeyIndex.java
##
@@ -243,6 +233,12 @@ public void testInvalidKeyIndex() throws Exception {
 checkInvalidKeyIndex(testFilePath);
 // Try fixing, this should result in new fixed file.
 fixInvalidIndex(testFilePath);
+
+// Multiple stripes
+createTestAcidFile(testFilePath, 12000, new FaultyKeyIndexBuilder());
+checkInvalidKeyIndex(testFilePath);
+// Try fixing, this should result in new fixed file.
+fixInvalidIndex(testFilePath);

Review comment:
   Ah ok, missed that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519658)
Time Spent: 40m  (was: 0.5h)

> Generalize fixacidkeyindex utility
> --
>
> Key: HIVE-24475
> URL: https://issues.apache.org/jira/browse/HIVE-24475
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Transactions
>Affects Versions: 3.0.0
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> There is a utility in hive which can validate/fix corrupted 
> hive.acid.key.index.
> hive --service fixacidkeyindex
> Unfortunately it is only tailored for a specific problem 
> (https://issues.apache.org/jira/browse/HIVE-18907), instead of generally 
> validating and recovering the hive.acid.key.index from the stripe data itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519657
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 14:03
Start Date: 03/Dec/20 14:03
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #1714:
URL: https://github.com/apache/hive/pull/1714#issuecomment-738013284


   > How do we know that the previous issue doesnt happen again?
   > I'll run the check on the PR a few more times...just in case
   
   @kgyrtkirk The previous issue was not due to flakiness. The schema of the 
metastore changed between the time that pre-commit tests were run and the time 
that this PR was merged to master. To avoid a similar situation the PR should 
be merged ASAP after running the pre-commit with the tip of the master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519657)
Time Spent: 6h 20m  (was: 6h 10m)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519649
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 13:42
Start Date: 03/Dec/20 13:42
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1714:
URL: https://github.com/apache/hive/pull/1714#issuecomment-738001343


   How do we know that the previous issue doesnt happen again?
   I'll run the check on the PR a few more times...just in case



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519649)
Time Spent: 6h 10m  (was: 6h)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519644
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 13:39
Start Date: 03/Dec/20 13:39
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1714:
URL: https://github.com/apache/hive/pull/1714#issuecomment-737999631


   cool



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519644)
Time Spent: 6h  (was: 5h 50m)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: master355.tgz
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS

2020-12-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=519643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519643
 ]

ASF GitHub Bot logged work on HIVE-2:
-

Author: ASF GitHub Bot
Created on: 03/Dec/20 13:37
Start Date: 03/Dec/20 13:37
Worklog Time Spent: 10m 
  Work Description: klcopp commented on a change in pull request #1716:
URL: https://github.com/apache/hive/pull/1716#discussion_r535232912



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
##
@@ -316,6 +314,30 @@ private boolean removeFiles(String location, 
ValidWriteIdList writeIdList, Compa
   }
   fs.delete(dead, true);
 }
-return true;
+// Check if there will be more obsolete directories to clean when 
possible. We will only mark cleaned when this
+// number reaches 0.
+return getNumEventuallyObsoleteDirs(location, dirSnapshots) == 0;
+  }
+
+  /**
+   * Get the number of base/delta directories the Cleaner should remove 
eventually. If we check this after cleaning
+   * we can see if the Cleaner has further work to do in this table/partition 
directory that it hasn't been able to
+   * finish, e.g. because of an open transaction at the time of compaction.
+   * We do this by assuming that there are no open transactions anywhere and 
then calling getAcidState. If there are
+   * obsolete directories, then the Cleaner has more work to do.
+   * @param location location of table
+   * @return number of dirs left for the cleaner to clean – eventually
+   * @throws IOException
+   */
+  private int getNumEventuallyObsoleteDirs(String location, Map dirSnapshots)
+  throws IOException {
+ValidTxnList validTxnList = new ValidReadTxnList();
+//save it so that getAcidState() sees it
+conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.writeToString());
+ValidReaderWriteIdList validWriteIdList = new ValidReaderWriteIdList();
+Path locPath = new Path(location);
+AcidUtils.Directory dir = 
AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, 
validWriteIdList,
+Ref.from(false), false, dirSnapshots);
+return dir.getObsolete().size();

Review comment:
   No, this isn't necessary upstream, this change is for versions without 
HIVE-23107 etc. But I don't want to hurt upstream functionality with it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 519643)
Time Spent: 7h  (was: 6h 50m)

> compactor.Cleaner should not set state "mark cleaned" if there are obsolete 
> files in the FS
> ---
>
> Key: HIVE-2
> URL: https://issues.apache.org/jira/browse/HIVE-2
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> This is an improvement on HIVE-24314, in which markCleaned() is called only 
> if +any+ files are deleted by the cleaner. This could cause a problem in the 
> following case:
> Say for table_1 compaction1 cleaning was blocked by an open txn, and 
> compaction is run again on the same table (compaction2). Both compaction1 and 
> compaction2 could be in "ready for cleaning" at the same time. By this time 
> the blocking open txn could be committed. When the cleaner runs, one of 
> compaction1 and compaction2 will remain in the "ready for cleaning" state:
> Say compaction2 is picked up by the cleaner first. The Cleaner deletes all 
> obsolete files.  Then compaction1 is picked up by the cleaner; the cleaner 
> doesn't remove any files and compaction1 will stay in the queue in a "ready 
> for cleaning" state.
> HIVE-24291 already solves this issue but if it isn't usable (for example if 
> HMS schema changes are out the question) then HIVE-24314 + this change will 
> fix the issue of the Cleaner not removing all obsolete files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >