[jira] [Updated] (HIVE-24473) Update HBase version to 2.1.10
[ https://issues.apache.org/jira/browse/HIVE-24473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Istvan Toth updated HIVE-24473: --- Attachment: (was: HIVE-24473.patch) > Update HBase version to 2.1.10 > -- > > Key: HIVE-24473 > URL: https://issues.apache.org/jira/browse/HIVE-24473 > Project: Hive > Issue Type: Improvement > Components: HBase Handler >Affects Versions: 4.0.0 >Reporter: Istvan Toth >Assignee: Istvan Toth >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Hive currently builds with a 2.0.0 pre-release. > Update HBase to more recent version. > We cannot use anything later than 2.2.4 because of HBASE-22394 > So the options are 2.1.10 and 2.2.4 > I suggest 2.1.10 because it's a chronologically later release, and it > maximises compatibility with HBase server deployments. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24473) Update HBase version to 2.1.10
[ https://issues.apache.org/jira/browse/HIVE-24473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Istvan Toth updated HIVE-24473: --- Attachment: (was: HIVE-24473.02.patch) > Update HBase version to 2.1.10 > -- > > Key: HIVE-24473 > URL: https://issues.apache.org/jira/browse/HIVE-24473 > Project: Hive > Issue Type: Improvement > Components: HBase Handler >Affects Versions: 4.0.0 >Reporter: Istvan Toth >Assignee: Istvan Toth >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Hive currently builds with a 2.0.0 pre-release. > Update HBase to more recent version. > We cannot use anything later than 2.2.4 because of HBASE-22394 > So the options are 2.1.10 and 2.2.4 > I suggest 2.1.10 because it's a chronologically later release, and it > maximises compatibility with HBase server deployments. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24433) AutoCompaction is not getting triggered for CamelCase Partition Values
[ https://issues.apache.org/jira/browse/HIVE-24433?focusedWorklogId=520054=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-520054 ] ASF GitHub Bot logged work on HIVE-24433: - Author: ASF GitHub Bot Created on: 04/Dec/20 07:49 Start Date: 04/Dec/20 07:49 Worklog Time Spent: 10m Work Description: nareshpr commented on a change in pull request #1712: URL: https://github.com/apache/hive/pull/1712#discussion_r535899436 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java ## @@ -2725,7 +2725,7 @@ private void insertTxnComponents(long txnid, LockRequest rqst, Connection dbConn } String dbName = normalizeCase(lc.getDbname()); String tblName = normalizeCase(lc.getTablename()); - String partName = normalizeCase(lc.getPartitionname()); + String partName = lc.getPartitionname(); Review comment: I changed it to split & convert partition key to lower key. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 520054) Time Spent: 2h (was: 1h 50m) > AutoCompaction is not getting triggered for CamelCase Partition Values > -- > > Key: HIVE-24433 > URL: https://issues.apache.org/jira/browse/HIVE-24433 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > PartionKeyValue is getting converted into lowerCase in below 2 places. > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2728] > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L2851] > Because of which TXN_COMPONENTS & HIVE_LOCKS tables are not having entries > from proper partition values. > When query completes, the entry moves from TXN_COMPONENTS to > COMPLETED_TXN_COMPONENTS. Hive AutoCompaction will not recognize the > partition & considers it as invalid partition > {code:java} > create table abc(name string) partitioned by(city string) stored as orc > tblproperties('transactional'='true'); > insert into abc partition(city='Bangalore') values('aaa'); > {code} > Example entry in COMPLETED_TXN_COMPONENTS > {noformat} > +---+--++---+-+-+---+ > | CTC_TXNID | CTC_DATABASE | CTC_TABLE | CTC_PARTITION | > CTC_TIMESTAMP | CTC_WRITEID | CTC_UPDATE_DELETE | > +---+--++---+-+-+---+ > | 2 | default | abc | city=bangalore | 2020-11-25 09:26:59 > | 1 | N | > +---+--++---+-+-+---+ > {noformat} > > AutoCompaction fails to get triggered with below error > {code:java} > 2020-11-25T09:35:10,364 INFO [Thread-9]: compactor.Initiator > (Initiator.java:run(98)) - Checking to see if we should compact > default.abc.city=bangalore > 2020-11-25T09:35:10,380 INFO [Thread-9]: compactor.Initiator > (Initiator.java:run(155)) - Can't find partition > default.compaction_test.city=bangalore, assuming it has been dropped and > moving on{code} > I verifed below 4 SQL's with my PR, those all produced correct > PartitionKeyValue > i.e, COMPLETED_TXN_COMPONENTS.CTC_PARTITION="city=Bangalore" > {code:java} > insert into table abc PARTITION(CitY='Bangalore') values('Dan'); > insert overwrite table abc partition(CiTy='Bangalore') select Name from abc; > update table abc set Name='xy' where CiTy='Bangalore'; > delete from abc where CiTy='Bangalore';{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem
[ https://issues.apache.org/jira/browse/HIVE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guojh reassigned HIVE-24467: Assignee: guojh > ConditionalTask remove tasks that not selected exists thread safety problem > --- > > Key: HIVE-24467 > URL: https://issues.apache.org/jira/browse/HIVE-24467 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.4 >Reporter: guojh >Assignee: guojh >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When hive execute jobs in parallel(control by “hive.exec.parallel” > parameter), ConditionalTasks remove the tasks that not selected in parallel, > because there are thread safety issues, some task may not remove from the > dependent task tree. This is a very serious bug, which causes some stage task > not trigger execution. > In our production cluster, the query run three conditional task in parallel, > after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit > to runnable list for his parent Stage-31 is not done. But Stage-31 should > removed for it not selected. > Stage dependencies is below: > {code:java} > STAGE DEPENDENCIES: > Stage-41 is a root stage > Stage-26 depends on stages: Stage-41 > Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, > Stage-2 > Stage-39 has a backup stage: Stage-2 > Stage-23 depends on stages: Stage-39 > Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, > Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36 > Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6 > Stage-5 > Stage-0 depends on stages: Stage-5, Stage-4, Stage-7 > Stage-51 depends on stages: Stage-0 > Stage-4 > Stage-6 > Stage-7 depends on stages: Stage-6 > Stage-40 has a backup stage: Stage-2 > Stage-24 depends on stages: Stage-40 > Stage-2 > Stage-44 is a root stage > Stage-30 depends on stages: Stage-44 > Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, > Stage-12 > Stage-42 has a backup stage: Stage-12 > Stage-27 depends on stages: Stage-42 > Stage-43 has a backup stage: Stage-12 > Stage-28 depends on stages: Stage-43 > Stage-12 > Stage-47 is a root stage > Stage-34 depends on stages: Stage-47 > Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, > Stage-16 > Stage-45 has a backup stage: Stage-16 > Stage-31 depends on stages: Stage-45 > Stage-46 has a backup stage: Stage-16 > Stage-32 depends on stages: Stage-46 > Stage-16 > Stage-50 is a root stage > Stage-38 depends on stages: Stage-50 > Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, > Stage-20 > Stage-48 has a backup stage: Stage-20 > Stage-35 depends on stages: Stage-48 > Stage-49 has a backup stage: Stage-20 > Stage-36 depends on stages: Stage-49 > Stage-20 > {code} > Stage tasks execute log is below, we can see Stage-33 is conditional task and > it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 > and Stage-46 should remove from the dependent tree, Stage-31 is child of > Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the > below log, we find Stage-31 is still in the parent list of Stage-3, this > should not happend. > {code:java} > 2020-12-03T01:09:50,939 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 1 out of 17 > 2020-12-03T01:09:50,940 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-26:MAPRED] in parallel > 2020-12-03T01:09:50,941 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 2 out of 17 > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-30:MAPRED] in parallel > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 3 out of 17 > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-34:MAPRED] in parallel > 2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 4 out of 17 > 2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-38:MAPRED] in parallel > 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel > 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel > 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting
[jira] [Work logged] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem
[ https://issues.apache.org/jira/browse/HIVE-24467?focusedWorklogId=519995=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519995 ] ASF GitHub Bot logged work on HIVE-24467: - Author: ASF GitHub Bot Created on: 04/Dec/20 04:29 Start Date: 04/Dec/20 04:29 Worklog Time Spent: 10m Work Description: anishek commented on pull request #1743: URL: https://github.com/apache/hive/pull/1743#issuecomment-738557480 may be someone with more experience on the execution side should look at this, @maheshk114 can you help here ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519995) Time Spent: 0.5h (was: 20m) > ConditionalTask remove tasks that not selected exists thread safety problem > --- > > Key: HIVE-24467 > URL: https://issues.apache.org/jira/browse/HIVE-24467 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.4 >Reporter: guojh >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When hive execute jobs in parallel(control by “hive.exec.parallel” > parameter), ConditionalTasks remove the tasks that not selected in parallel, > because there are thread safety issues, some task may not remove from the > dependent task tree. This is a very serious bug, which causes some stage task > not trigger execution. > In our production cluster, the query run three conditional task in parallel, > after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit > to runnable list for his parent Stage-31 is not done. But Stage-31 should > removed for it not selected. > Stage dependencies is below: > {code:java} > STAGE DEPENDENCIES: > Stage-41 is a root stage > Stage-26 depends on stages: Stage-41 > Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, > Stage-2 > Stage-39 has a backup stage: Stage-2 > Stage-23 depends on stages: Stage-39 > Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, > Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36 > Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6 > Stage-5 > Stage-0 depends on stages: Stage-5, Stage-4, Stage-7 > Stage-51 depends on stages: Stage-0 > Stage-4 > Stage-6 > Stage-7 depends on stages: Stage-6 > Stage-40 has a backup stage: Stage-2 > Stage-24 depends on stages: Stage-40 > Stage-2 > Stage-44 is a root stage > Stage-30 depends on stages: Stage-44 > Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, > Stage-12 > Stage-42 has a backup stage: Stage-12 > Stage-27 depends on stages: Stage-42 > Stage-43 has a backup stage: Stage-12 > Stage-28 depends on stages: Stage-43 > Stage-12 > Stage-47 is a root stage > Stage-34 depends on stages: Stage-47 > Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, > Stage-16 > Stage-45 has a backup stage: Stage-16 > Stage-31 depends on stages: Stage-45 > Stage-46 has a backup stage: Stage-16 > Stage-32 depends on stages: Stage-46 > Stage-16 > Stage-50 is a root stage > Stage-38 depends on stages: Stage-50 > Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, > Stage-20 > Stage-48 has a backup stage: Stage-20 > Stage-35 depends on stages: Stage-48 > Stage-49 has a backup stage: Stage-20 > Stage-36 depends on stages: Stage-49 > Stage-20 > {code} > Stage tasks execute log is below, we can see Stage-33 is conditional task and > it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 > and Stage-46 should remove from the dependent tree, Stage-31 is child of > Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the > below log, we find Stage-31 is still in the parent list of Stage-3, this > should not happend. > {code:java} > 2020-12-03T01:09:50,939 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 1 out of 17 > 2020-12-03T01:09:50,940 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-26:MAPRED] in parallel > 2020-12-03T01:09:50,941 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 2 out of 17 > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-30:MAPRED] in parallel > 2020-12-03T01:09:50,943 INFO
[jira] [Work logged] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem
[ https://issues.apache.org/jira/browse/HIVE-24467?focusedWorklogId=519985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519985 ] ASF GitHub Bot logged work on HIVE-24467: - Author: ASF GitHub Bot Created on: 04/Dec/20 03:38 Start Date: 04/Dec/20 03:38 Worklog Time Spent: 10m Work Description: gjhkael commented on pull request #1743: URL: https://github.com/apache/hive/pull/1743#issuecomment-738545155 @anishek Please review this pr. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519985) Time Spent: 20m (was: 10m) > ConditionalTask remove tasks that not selected exists thread safety problem > --- > > Key: HIVE-24467 > URL: https://issues.apache.org/jira/browse/HIVE-24467 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.4 >Reporter: guojh >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When hive execute jobs in parallel(control by “hive.exec.parallel” > parameter), ConditionalTasks remove the tasks that not selected in parallel, > because there are thread safety issues, some task may not remove from the > dependent task tree. This is a very serious bug, which causes some stage task > not trigger execution. > In our production cluster, the query run three conditional task in parallel, > after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit > to runnable list for his parent Stage-31 is not done. But Stage-31 should > removed for it not selected. > Stage dependencies is below: > {code:java} > STAGE DEPENDENCIES: > Stage-41 is a root stage > Stage-26 depends on stages: Stage-41 > Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, > Stage-2 > Stage-39 has a backup stage: Stage-2 > Stage-23 depends on stages: Stage-39 > Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, > Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36 > Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6 > Stage-5 > Stage-0 depends on stages: Stage-5, Stage-4, Stage-7 > Stage-51 depends on stages: Stage-0 > Stage-4 > Stage-6 > Stage-7 depends on stages: Stage-6 > Stage-40 has a backup stage: Stage-2 > Stage-24 depends on stages: Stage-40 > Stage-2 > Stage-44 is a root stage > Stage-30 depends on stages: Stage-44 > Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, > Stage-12 > Stage-42 has a backup stage: Stage-12 > Stage-27 depends on stages: Stage-42 > Stage-43 has a backup stage: Stage-12 > Stage-28 depends on stages: Stage-43 > Stage-12 > Stage-47 is a root stage > Stage-34 depends on stages: Stage-47 > Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, > Stage-16 > Stage-45 has a backup stage: Stage-16 > Stage-31 depends on stages: Stage-45 > Stage-46 has a backup stage: Stage-16 > Stage-32 depends on stages: Stage-46 > Stage-16 > Stage-50 is a root stage > Stage-38 depends on stages: Stage-50 > Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, > Stage-20 > Stage-48 has a backup stage: Stage-20 > Stage-35 depends on stages: Stage-48 > Stage-49 has a backup stage: Stage-20 > Stage-36 depends on stages: Stage-49 > Stage-20 > {code} > Stage tasks execute log is below, we can see Stage-33 is conditional task and > it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 > and Stage-46 should remove from the dependent tree, Stage-31 is child of > Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the > below log, we find Stage-31 is still in the parent list of Stage-3, this > should not happend. > {code:java} > 2020-12-03T01:09:50,939 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 1 out of 17 > 2020-12-03T01:09:50,940 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-26:MAPRED] in parallel > 2020-12-03T01:09:50,941 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 2 out of 17 > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-30:MAPRED] in parallel > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 3
[jira] [Updated] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem
[ https://issues.apache.org/jira/browse/HIVE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24467: -- Labels: pull-request-available (was: ) > ConditionalTask remove tasks that not selected exists thread safety problem > --- > > Key: HIVE-24467 > URL: https://issues.apache.org/jira/browse/HIVE-24467 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.4 >Reporter: guojh >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When hive execute jobs in parallel(control by “hive.exec.parallel” > parameter), ConditionalTasks remove the tasks that not selected in parallel, > because there are thread safety issues, some task may not remove from the > dependent task tree. This is a very serious bug, which causes some stage task > not trigger execution. > In our production cluster, the query run three conditional task in parallel, > after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit > to runnable list for his parent Stage-31 is not done. But Stage-31 should > removed for it not selected. > Stage dependencies is below: > {code:java} > STAGE DEPENDENCIES: > Stage-41 is a root stage > Stage-26 depends on stages: Stage-41 > Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, > Stage-2 > Stage-39 has a backup stage: Stage-2 > Stage-23 depends on stages: Stage-39 > Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, > Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36 > Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6 > Stage-5 > Stage-0 depends on stages: Stage-5, Stage-4, Stage-7 > Stage-51 depends on stages: Stage-0 > Stage-4 > Stage-6 > Stage-7 depends on stages: Stage-6 > Stage-40 has a backup stage: Stage-2 > Stage-24 depends on stages: Stage-40 > Stage-2 > Stage-44 is a root stage > Stage-30 depends on stages: Stage-44 > Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, > Stage-12 > Stage-42 has a backup stage: Stage-12 > Stage-27 depends on stages: Stage-42 > Stage-43 has a backup stage: Stage-12 > Stage-28 depends on stages: Stage-43 > Stage-12 > Stage-47 is a root stage > Stage-34 depends on stages: Stage-47 > Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, > Stage-16 > Stage-45 has a backup stage: Stage-16 > Stage-31 depends on stages: Stage-45 > Stage-46 has a backup stage: Stage-16 > Stage-32 depends on stages: Stage-46 > Stage-16 > Stage-50 is a root stage > Stage-38 depends on stages: Stage-50 > Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, > Stage-20 > Stage-48 has a backup stage: Stage-20 > Stage-35 depends on stages: Stage-48 > Stage-49 has a backup stage: Stage-20 > Stage-36 depends on stages: Stage-49 > Stage-20 > {code} > Stage tasks execute log is below, we can see Stage-33 is conditional task and > it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 > and Stage-46 should remove from the dependent tree, Stage-31 is child of > Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the > below log, we find Stage-31 is still in the parent list of Stage-3, this > should not happend. > {code:java} > 2020-12-03T01:09:50,939 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 1 out of 17 > 2020-12-03T01:09:50,940 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-26:MAPRED] in parallel > 2020-12-03T01:09:50,941 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 2 out of 17 > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-30:MAPRED] in parallel > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 3 out of 17 > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-34:MAPRED] in parallel > 2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 4 out of 17 > 2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-38:MAPRED] in parallel > 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel > 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel > 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver:
[jira] [Work logged] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem
[ https://issues.apache.org/jira/browse/HIVE-24467?focusedWorklogId=519984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519984 ] ASF GitHub Bot logged work on HIVE-24467: - Author: ASF GitHub Bot Created on: 04/Dec/20 03:37 Start Date: 04/Dec/20 03:37 Worklog Time Spent: 10m Work Description: gjhkael opened a new pull request #1743: URL: https://github.com/apache/hive/pull/1743 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519984) Remaining Estimate: 0h Time Spent: 10m > ConditionalTask remove tasks that not selected exists thread safety problem > --- > > Key: HIVE-24467 > URL: https://issues.apache.org/jira/browse/HIVE-24467 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.4 >Reporter: guojh >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When hive execute jobs in parallel(control by “hive.exec.parallel” > parameter), ConditionalTasks remove the tasks that not selected in parallel, > because there are thread safety issues, some task may not remove from the > dependent task tree. This is a very serious bug, which causes some stage task > not trigger execution. > In our production cluster, the query run three conditional task in parallel, > after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit > to runnable list for his parent Stage-31 is not done. But Stage-31 should > removed for it not selected. > Stage dependencies is below: > {code:java} > STAGE DEPENDENCIES: > Stage-41 is a root stage > Stage-26 depends on stages: Stage-41 > Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, > Stage-2 > Stage-39 has a backup stage: Stage-2 > Stage-23 depends on stages: Stage-39 > Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, > Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36 > Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6 > Stage-5 > Stage-0 depends on stages: Stage-5, Stage-4, Stage-7 > Stage-51 depends on stages: Stage-0 > Stage-4 > Stage-6 > Stage-7 depends on stages: Stage-6 > Stage-40 has a backup stage: Stage-2 > Stage-24 depends on stages: Stage-40 > Stage-2 > Stage-44 is a root stage > Stage-30 depends on stages: Stage-44 > Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, > Stage-12 > Stage-42 has a backup stage: Stage-12 > Stage-27 depends on stages: Stage-42 > Stage-43 has a backup stage: Stage-12 > Stage-28 depends on stages: Stage-43 > Stage-12 > Stage-47 is a root stage > Stage-34 depends on stages: Stage-47 > Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, > Stage-16 > Stage-45 has a backup stage: Stage-16 > Stage-31 depends on stages: Stage-45 > Stage-46 has a backup stage: Stage-16 > Stage-32 depends on stages: Stage-46 > Stage-16 > Stage-50 is a root stage > Stage-38 depends on stages: Stage-50 > Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, > Stage-20 > Stage-48 has a backup stage: Stage-20 > Stage-35 depends on stages: Stage-48 > Stage-49 has a backup stage: Stage-20 > Stage-36 depends on stages: Stage-49 > Stage-20 > {code} > Stage tasks execute log is below, we can see Stage-33 is conditional task and > it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 > and Stage-46 should remove from the dependent tree, Stage-31 is child of > Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the > below log, we find Stage-31 is still in the parent list of Stage-3, this > should not happend. > {code:java} > 2020-12-03T01:09:50,939 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 1 out of 17 > 2020-12-03T01:09:50,940 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-26:MAPRED] in parallel > 2020-12-03T01:09:50,941 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 2 out of 17 > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Starting task [Stage-30:MAPRED] in parallel > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] > ql.Driver: Launching Job 3 out of 17 > 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool:
[jira] [Updated] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem
[ https://issues.apache.org/jira/browse/HIVE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guojh updated HIVE-24467: - Description: When hive execute jobs in parallel(control by “hive.exec.parallel” parameter), ConditionalTasks remove the tasks that not selected in parallel, because there are thread safety issues, some task may not remove from the dependent task tree. This is a very serious bug, which causes some stage task not trigger execution. In our production cluster, the query run three conditional task in parallel, after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit to runnable list for his parent Stage-31 is not done. But Stage-31 should removed for it not selected. Stage dependencies is below: {code:java} STAGE DEPENDENCIES: Stage-41 is a root stage Stage-26 depends on stages: Stage-41 Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, Stage-2 Stage-39 has a backup stage: Stage-2 Stage-23 depends on stages: Stage-39 Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36 Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6 Stage-5 Stage-0 depends on stages: Stage-5, Stage-4, Stage-7 Stage-51 depends on stages: Stage-0 Stage-4 Stage-6 Stage-7 depends on stages: Stage-6 Stage-40 has a backup stage: Stage-2 Stage-24 depends on stages: Stage-40 Stage-2 Stage-44 is a root stage Stage-30 depends on stages: Stage-44 Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, Stage-12 Stage-42 has a backup stage: Stage-12 Stage-27 depends on stages: Stage-42 Stage-43 has a backup stage: Stage-12 Stage-28 depends on stages: Stage-43 Stage-12 Stage-47 is a root stage Stage-34 depends on stages: Stage-47 Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, Stage-16 Stage-45 has a backup stage: Stage-16 Stage-31 depends on stages: Stage-45 Stage-46 has a backup stage: Stage-16 Stage-32 depends on stages: Stage-46 Stage-16 Stage-50 is a root stage Stage-38 depends on stages: Stage-50 Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, Stage-20 Stage-48 has a backup stage: Stage-20 Stage-35 depends on stages: Stage-48 Stage-49 has a backup stage: Stage-20 Stage-36 depends on stages: Stage-49 Stage-20 {code} Stage tasks execute log is below, we can see Stage-33 is conditional task and it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 and Stage-46 should remove from the dependent tree, Stage-31 is child of Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the below log, we find Stage-31 is still in the parent list of Stage-3, this should not happend. {code:java} 2020-12-03T01:09:50,939 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 1 out of 17 2020-12-03T01:09:50,940 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-26:MAPRED] in parallel 2020-12-03T01:09:50,941 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 2 out of 17 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-30:MAPRED] in parallel 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 3 out of 17 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-34:MAPRED] in parallel 2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 4 out of 17 2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-38:MAPRED] in parallel 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-37:CONDITIONAL] in parallel 2020-12-03T01:10:34,946 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 5 out of 17 2020-12-03T01:10:34,947 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-16:MAPRED] in parallel 2020-12-03T01:10:34,948 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 6 out of 17 2020-12-03T01:10:34,948 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-12:MAPRED] in parallel 2020-12-03T01:10:34,949 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 7 out of 17 2020-12-03T01:10:34,950 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task
[jira] [Updated] (HIVE-24467) ConditionalTask remove tasks that not selected exists thread safety problem
[ https://issues.apache.org/jira/browse/HIVE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guojh updated HIVE-24467: - Description: When hive execute jobs in parallel(control by “hive.exec.parallel” parameter), ConditionalTasks remove the tasks that not selected in parallel, because there are thread safety issues, some task may not remove from the dependent task tree. This is a very serious bug, which causes some stage task not trigger execution. In our production cluster, the query run three conditional task in parallel, after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit to runnable list for his parent Stage-31 is not done. But Stage-31 should removed for it not selected. Stage dependencies is below: {code:java} STAGE DEPENDENCIES: Stage-41 is a root stage Stage-26 depends on stages: Stage-41 Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, Stage-2 Stage-39 has a backup stage: Stage-2 Stage-23 depends on stages: Stage-39 Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36 Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6 Stage-5 Stage-0 depends on stages: Stage-5, Stage-4, Stage-7 Stage-51 depends on stages: Stage-0 Stage-4 Stage-6 Stage-7 depends on stages: Stage-6 Stage-40 has a backup stage: Stage-2 Stage-24 depends on stages: Stage-40 Stage-2 Stage-44 is a root stage Stage-30 depends on stages: Stage-44 Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, Stage-12 Stage-42 has a backup stage: Stage-12 Stage-27 depends on stages: Stage-42 Stage-43 has a backup stage: Stage-12 Stage-28 depends on stages: Stage-43 Stage-12 Stage-47 is a root stage Stage-34 depends on stages: Stage-47 Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, Stage-16 Stage-45 has a backup stage: Stage-16 Stage-31 depends on stages: Stage-45 Stage-46 has a backup stage: Stage-16 Stage-32 depends on stages: Stage-46 Stage-16 Stage-50 is a root stage Stage-38 depends on stages: Stage-50 Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, Stage-20 Stage-48 has a backup stage: Stage-20 Stage-35 depends on stages: Stage-48 Stage-49 has a backup stage: Stage-20 Stage-36 depends on stages: Stage-49 Stage-20 {code} Stage tasks execute log is below, we can see Stage-33 is conditional task and it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 and Stage-46 should remove from the dependent tree, Stage-31 is child of Stage-45 parent of Stage-3, So, Stage-31 should removed too. {code:java} 2020-12-03T01:09:50,939 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 1 out of 17 2020-12-03T01:09:50,940 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-26:MAPRED] in parallel 2020-12-03T01:09:50,941 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 2 out of 17 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-30:MAPRED] in parallel 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 3 out of 17 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-34:MAPRED] in parallel 2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 4 out of 17 2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-38:MAPRED] in parallel 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-37:CONDITIONAL] in parallel 2020-12-03T01:10:34,946 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 5 out of 17 2020-12-03T01:10:34,947 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-16:MAPRED] in parallel 2020-12-03T01:10:34,948 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 6 out of 17 2020-12-03T01:10:34,948 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-12:MAPRED] in parallel 2020-12-03T01:10:34,949 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 7 out of 17 2020-12-03T01:10:34,950 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-20:MAPRED] in parallel 2020-12-03T01:10:34,950 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver:
[jira] [Updated] (HIVE-24484) Upgrade Hadoop to 3.2.1
[ https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24484: -- Labels: pull-request-available (was: ) > Upgrade Hadoop to 3.2.1 > --- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.2.1
[ https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=519956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519956 ] ASF GitHub Bot logged work on HIVE-24484: - Author: ASF GitHub Bot Created on: 04/Dec/20 00:48 Start Date: 04/Dec/20 00:48 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1742: URL: https://github.com/apache/hive/pull/1742 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519956) Remaining Estimate: 0h Time Spent: 10m > Upgrade Hadoop to 3.2.1 > --- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23891) Using UNION sql clause and speculative execution can cause file duplication in Tez
[ https://issues.apache.org/jira/browse/HIVE-23891?focusedWorklogId=519955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519955 ] ASF GitHub Bot logged work on HIVE-23891: - Author: ASF GitHub Bot Created on: 04/Dec/20 00:47 Start Date: 04/Dec/20 00:47 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #1294: URL: https://github.com/apache/hive/pull/1294 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519955) Time Spent: 2h 20m (was: 2h 10m) > Using UNION sql clause and speculative execution can cause file duplication > in Tez > -- > > Key: HIVE-23891 > URL: https://issues.apache.org/jira/browse/HIVE-23891 > Project: Hive > Issue Type: Bug >Reporter: George Pachitariu >Assignee: George Pachitariu >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23891.1.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Hello, > the specific scenario when this can happen: > - the execution engine is Tez; > - speculative execution is on; > - the query inserts into a table and the last step is a UNION sql clause; > The problem is that Tez creates an extra layer of subdirectories when there > is a UNION. Later, when deduplicating, Hive doesn't take that into account > and only deduplicates folders but not the files inside. > So for a query like this: > {code:sql} > insert overwrite table union_all > select * from union_first_part > union all > select * from union_second_part; > {code} > The folder structure afterwards will be like this (a possible example): > {code:java} > .../union_all/HIVE_UNION_SUBDIR_1/00_0 > .../union_all/HIVE_UNION_SUBDIR_1/00_1 > .../union_all/HIVE_UNION_SUBDIR_2/00_1 > {code} > The attached patch increases the number of folder levels that Hive will check > recursively for duplicates when we have a UNION in Tez. > Feel free to reach out if you have any questions :). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24484) Upgrade Hadoop to 3.2.1
[ https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor reassigned HIVE-24484: - > Upgrade Hadoop to 3.2.1 > --- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21588) Remove HBase dependency from hive-metastore
[ https://issues.apache.org/jira/browse/HIVE-21588?focusedWorklogId=519923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519923 ] ASF GitHub Bot logged work on HIVE-21588: - Author: ASF GitHub Bot Created on: 04/Dec/20 00:02 Start Date: 04/Dec/20 00:02 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #1723: URL: https://github.com/apache/hive/pull/1723#issuecomment-738461352 Merged. Thanks @wangyum ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519923) Time Spent: 1h 40m (was: 1.5h) > Remove HBase dependency from hive-metastore > --- > > Key: HIVE-21588 > URL: https://issues.apache.org/jira/browse/HIVE-21588 > Project: Hive > Issue Type: Task > Components: HBase Metastore >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21588.01.patch, HIVE-21588.02.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > HIVE-17234 has removed HBase metastore from master. But maven dependency have > not been removed. We should remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21588) Remove HBase dependency from hive-metastore
[ https://issues.apache.org/jira/browse/HIVE-21588?focusedWorklogId=519922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519922 ] ASF GitHub Bot logged work on HIVE-21588: - Author: ASF GitHub Bot Created on: 04/Dec/20 00:01 Start Date: 04/Dec/20 00:01 Worklog Time Spent: 10m Work Description: sunchao merged pull request #1723: URL: https://github.com/apache/hive/pull/1723 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519922) Time Spent: 1.5h (was: 1h 20m) > Remove HBase dependency from hive-metastore > --- > > Key: HIVE-21588 > URL: https://issues.apache.org/jira/browse/HIVE-21588 > Project: Hive > Issue Type: Task > Components: HBase Metastore >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21588.01.patch, HIVE-21588.02.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > HIVE-17234 has removed HBase metastore from master. But maven dependency have > not been removed. We should remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21588) Remove HBase dependency from hive-metastore
[ https://issues.apache.org/jira/browse/HIVE-21588?focusedWorklogId=519921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519921 ] ASF GitHub Bot logged work on HIVE-21588: - Author: ASF GitHub Bot Created on: 04/Dec/20 00:01 Start Date: 04/Dec/20 00:01 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #1723: URL: https://github.com/apache/hive/pull/1723#discussion_r535735693 ## File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestWorker.java ## @@ -17,7 +17,6 @@ */ package org.apache.hadoop.hive.ql.txn.compactor; -import it.unimi.dsi.fastutil.booleans.AbstractBooleanBidirectionalIterator; Review comment: Gotcha, cool. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519921) Time Spent: 1h 20m (was: 1h 10m) > Remove HBase dependency from hive-metastore > --- > > Key: HIVE-21588 > URL: https://issues.apache.org/jira/browse/HIVE-21588 > Project: Hive > Issue Type: Task > Components: HBase Metastore >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21588.01.patch, HIVE-21588.02.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > HIVE-17234 has removed HBase metastore from master. But maven dependency have > not been removed. We should remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21588) Remove HBase dependency from hive-metastore
[ https://issues.apache.org/jira/browse/HIVE-21588?focusedWorklogId=519918=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519918 ] ASF GitHub Bot logged work on HIVE-21588: - Author: ASF GitHub Bot Created on: 03/Dec/20 23:36 Start Date: 03/Dec/20 23:36 Worklog Time Spent: 10m Work Description: wangyum commented on a change in pull request #1723: URL: https://github.com/apache/hive/pull/1723#discussion_r535725208 ## File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestWorker.java ## @@ -17,7 +17,6 @@ */ package org.apache.hadoop.hive.ql.txn.compactor; -import it.unimi.dsi.fastutil.booleans.AbstractBooleanBidirectionalIterator; Review comment: Yes, this need `it.unimi.dsi:fastutil`, we have removed this dependency: ![image](https://user-images.githubusercontent.com/5399861/100756200-f6910480-3427-11eb-9919-af782b870b9e.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519918) Time Spent: 1h 10m (was: 1h) > Remove HBase dependency from hive-metastore > --- > > Key: HIVE-21588 > URL: https://issues.apache.org/jira/browse/HIVE-21588 > Project: Hive > Issue Type: Task > Components: HBase Metastore >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21588.01.patch, HIVE-21588.02.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > HIVE-17234 has removed HBase metastore from master. But maven dependency have > not been removed. We should remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24220) Unable to reopen a closed bug report
[ https://issues.apache.org/jira/browse/HIVE-24220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Tagra resolved HIVE-24220. Resolution: Won't Fix > Unable to reopen a closed bug report > > > Key: HIVE-24220 > URL: https://issues.apache.org/jira/browse/HIVE-24220 > Project: Hive > Issue Type: Bug >Reporter: Ankur Tagra >Assignee: Ankur Tagra >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HIVE-24220) Unable to reopen a closed bug report
[ https://issues.apache.org/jira/browse/HIVE-24220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Tagra reopened HIVE-24220: > Unable to reopen a closed bug report > > > Key: HIVE-24220 > URL: https://issues.apache.org/jira/browse/HIVE-24220 > Project: Hive > Issue Type: Bug >Reporter: Ankur Tagra >Assignee: Ankur Tagra >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24220) Unable to reopen a closed bug report
[ https://issues.apache.org/jira/browse/HIVE-24220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Tagra resolved HIVE-24220. Resolution: Fixed > Unable to reopen a closed bug report > > > Key: HIVE-24220 > URL: https://issues.apache.org/jira/browse/HIVE-24220 > Project: Hive > Issue Type: Bug >Reporter: Ankur Tagra >Assignee: Ankur Tagra >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HIVE-24220) Unable to reopen a closed bug report
[ https://issues.apache.org/jira/browse/HIVE-24220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Tagra reopened HIVE-24220: > Unable to reopen a closed bug report > > > Key: HIVE-24220 > URL: https://issues.apache.org/jira/browse/HIVE-24220 > Project: Hive > Issue Type: Bug >Reporter: Ankur Tagra >Assignee: Ankur Tagra >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches
[ https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=519874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519874 ] ASF GitHub Bot logged work on HIVE-24432: - Author: ASF GitHub Bot Created on: 03/Dec/20 21:21 Start Date: 03/Dec/20 21:21 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1710: URL: https://github.com/apache/hive/pull/1710#issuecomment-738326173 @nrg4878 Review please for HMS work? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519874) Time Spent: 1h 20m (was: 1h 10m) > Delete Notification Events in Batches > - > > Key: HIVE-24432 > URL: https://issues.apache.org/jira/browse/HIVE-24432 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Notification events are loaded in batches (reduces memory pressure on the > HMS), but all of the deletes happen under a single transactions and, when > deleting many records, can put a lot of pressure on the backend database. > Instead, delete events in batches (in different transactions) as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.1
[ https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=519872=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519872 ] ASF GitHub Bot logged work on HIVE-21737: - Author: ASF GitHub Bot Created on: 03/Dec/20 21:12 Start Date: 03/Dec/20 21:12 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #1635: URL: https://github.com/apache/hive/pull/1635#issuecomment-738315389 Thanks @sunchao eager to see this finally happening ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519872) Time Spent: 3h 40m (was: 3.5h) > Upgrade Avro to version 1.10.1 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24220) Unable to reopen a closed bug report
[ https://issues.apache.org/jira/browse/HIVE-24220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Tagra resolved HIVE-24220. Resolution: Fixed > Unable to reopen a closed bug report > > > Key: HIVE-24220 > URL: https://issues.apache.org/jira/browse/HIVE-24220 > Project: Hive > Issue Type: Bug >Reporter: Ankur Tagra >Assignee: Ankur Tagra >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry
[ https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=519861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519861 ] ASF GitHub Bot logged work on HIVE-24468: - Author: ASF GitHub Bot Created on: 03/Dec/20 20:39 Start Date: 03/Dec/20 20:39 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1728: URL: https://github.com/apache/hive/pull/1728#issuecomment-738293741 @pvary Just so we are clear "now()" is not an SQL function. It's implemented on this DBListener class: ``` private int now() { long millis = System.currentTimeMillis(); millis /= 1000; if (millis > Integer.MAX_VALUE) { LOG.warn("We've passed max int value in seconds since the epoch, " + "all notification times will be the same!"); return Integer.MAX_VALUE; } return (int)millis; } ``` https://github.com/apache/hive/blob/master/hcatalog/server-extensions/src/main/java/org/apache/hive/hcatalog/listener/DbNotificationListener.java#L941-L950 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519861) Time Spent: 1.5h (was: 1h 20m) > Use Event Time instead of Current Time in Notification Log DB Entry > --- > > Key: HIVE-24468 > URL: https://issues.apache.org/jira/browse/HIVE-24468 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry
[ https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=519849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519849 ] ASF GitHub Bot logged work on HIVE-24468: - Author: ASF GitHub Bot Created on: 03/Dec/20 20:01 Start Date: 03/Dec/20 20:01 Worklog Time Spent: 10m Work Description: belugabehr edited a comment on pull request #1728: URL: https://github.com/apache/hive/pull/1728#issuecomment-738274162 @pvary I agree with your understanding that the SELECT FOR UPDATE is a lock, and therefore the timestamps should be always increasing, but imagine if the HMS clock on two instances were off by 5s (or more). The HMS with the slower clock would generate events that were earlier in time, but with a higher ID. So there is not strong enforcement of the time being sequential. It's all based on the HMS clocks being in sync and trusting those clocks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519849) Time Spent: 1h 20m (was: 1h 10m) > Use Event Time instead of Current Time in Notification Log DB Entry > --- > > Key: HIVE-24468 > URL: https://issues.apache.org/jira/browse/HIVE-24468 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry
[ https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=519847=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519847 ] ASF GitHub Bot logged work on HIVE-24468: - Author: ASF GitHub Bot Created on: 03/Dec/20 20:00 Start Date: 03/Dec/20 20:00 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1728: URL: https://github.com/apache/hive/pull/1728#issuecomment-738274162 @pvary I agree with your understanding that the SELECT FOR UPDATE is a lock, and therefore the time is the same, but imagine if the HMS clock on two instances were off by 5s (or more). The HMS with the slower clock would generate events that were earlier in time, but with a higher ID. So there is not strong enforcement of the time being sequential. It's all based on the HMS clocks being in sync and trusting those clocks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519847) Time Spent: 1h 10m (was: 1h) > Use Event Time instead of Current Time in Notification Log DB Entry > --- > > Key: HIVE-24468 > URL: https://issues.apache.org/jira/browse/HIVE-24468 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21588) Remove HBase dependency from hive-metastore
[ https://issues.apache.org/jira/browse/HIVE-21588?focusedWorklogId=519825=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519825 ] ASF GitHub Bot logged work on HIVE-21588: - Author: ASF GitHub Bot Created on: 03/Dec/20 19:03 Start Date: 03/Dec/20 19:03 Worklog Time Spent: 10m Work Description: sunchao commented on a change in pull request #1723: URL: https://github.com/apache/hive/pull/1723#discussion_r535502356 ## File path: ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestWorker.java ## @@ -17,7 +17,6 @@ */ package org.apache.hadoop.hive.ql.txn.compactor; -import it.unimi.dsi.fastutil.booleans.AbstractBooleanBidirectionalIterator; Review comment: is this related? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519825) Time Spent: 1h (was: 50m) > Remove HBase dependency from hive-metastore > --- > > Key: HIVE-21588 > URL: https://issues.apache.org/jira/browse/HIVE-21588 > Project: Hive > Issue Type: Task > Components: HBase Metastore >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21588.01.patch, HIVE-21588.02.patch > > Time Spent: 1h > Remaining Estimate: 0h > > HIVE-17234 has removed HBase metastore from master. But maven dependency have > not been removed. We should remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic
[ https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=519813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519813 ] ASF GitHub Bot logged work on HIVE-24470: - Author: ASF GitHub Bot Created on: 03/Dec/20 18:44 Start Date: 03/Dec/20 18:44 Worklog Time Spent: 10m Work Description: fenglu-g commented on pull request #1740: URL: https://github.com/apache/hive/pull/1740#issuecomment-738212050 @nrg4878 and others, PTAL, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519813) Time Spent: 20m (was: 10m) > Separate HiveMetastore Thrift and Driver logic > -- > > Key: HIVE-24470 > URL: https://issues.apache.org/jira/browse/HIVE-24470 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Cameron Moberg >Assignee: Cameron Moberg >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > In the file HiveMetastore.java the majority of the code is a thrift interface > rather than the actual logic behind starting hive metastore, this should be > moved out into a separate file to clean up the file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry
[ https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=519812=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519812 ] ASF GitHub Bot logged work on HIVE-24468: - Author: ASF GitHub Bot Created on: 03/Dec/20 18:42 Start Date: 03/Dec/20 18:42 Worklog Time Spent: 10m Work Description: pvary commented on pull request #1728: URL: https://github.com/apache/hive/pull/1728#issuecomment-738211024 > So, the answer is yes. The timestamps could be out of order. Before this patch the timestamps were in order as we locked the NEXT_EVENT_ID table with SELECT FOR UPDATE, so the timestamp was aligned with the EVENT_ID. (There might be some exceptions if some backend RDBMS reuses the value returned by the function now() in a single transaction, but I think we should overlook this for now ) After this PR the timestamps could become out of order. Which is IMHO an API change even if the order requirement is not documented. So the users should be aware of this change and we should seriously consider this before proceeding. Good to have you back and starting to cleaning up these stuff! Thanks, Peter This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519812) Time Spent: 1h (was: 50m) > Use Event Time instead of Current Time in Notification Log DB Entry > --- > > Key: HIVE-24468 > URL: https://issues.apache.org/jira/browse/HIVE-24468 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic
[ https://issues.apache.org/jira/browse/HIVE-24470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24470: -- Labels: pull-request-available (was: ) > Separate HiveMetastore Thrift and Driver logic > -- > > Key: HIVE-24470 > URL: https://issues.apache.org/jira/browse/HIVE-24470 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Cameron Moberg >Assignee: Cameron Moberg >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In the file HiveMetastore.java the majority of the code is a thrift interface > rather than the actual logic behind starting hive metastore, this should be > moved out into a separate file to clean up the file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic
[ https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=519811=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519811 ] ASF GitHub Bot logged work on HIVE-24470: - Author: ASF GitHub Bot Created on: 03/Dec/20 18:41 Start Date: 03/Dec/20 18:41 Worklog Time Spent: 10m Work Description: Noremac201 opened a new pull request #1740: URL: https://github.com/apache/hive/pull/1740 ### What changes were proposed in this pull request? 1. Refactor HiveMetastore.HMSHandler into its own class ### Why are the changes needed? This will pave the way for cleaner changes since now we don't have the driver class nested with 10,000 line HMSHandler file so there is a clearer separation of duties. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing unit tests, building/running manually Not additional tests were added since this was a pure refactoring This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519811) Remaining Estimate: 0h Time Spent: 10m > Separate HiveMetastore Thrift and Driver logic > -- > > Key: HIVE-24470 > URL: https://issues.apache.org/jira/browse/HIVE-24470 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Cameron Moberg >Assignee: Cameron Moberg >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > In the file HiveMetastore.java the majority of the code is a thrift interface > rather than the actual logic behind starting hive metastore, this should be > moved out into a separate file to clean up the file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24281) Unable to reopen a closed bug report
[ https://issues.apache.org/jira/browse/HIVE-24281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Tagra resolved HIVE-24281. Resolution: Fixed > Unable to reopen a closed bug report > > > Key: HIVE-24281 > URL: https://issues.apache.org/jira/browse/HIVE-24281 > Project: Hive > Issue Type: Bug > Components: API >Affects Versions: 1.2.0 >Reporter: Ankur Tagra >Assignee: Ankur Tagra >Priority: Trivial > > Unable to reopen a closed bug report -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24394) Enable printing explain to console at query start
[ https://issues.apache.org/jira/browse/HIVE-24394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johan Gustavsson reassigned HIVE-24394: --- Assignee: Jesus Camacho Rodriguez > Enable printing explain to console at query start > - > > Key: HIVE-24394 > URL: https://issues.apache.org/jira/browse/HIVE-24394 > Project: Hive > Issue Type: Improvement > Components: Hive, Query Processor >Affects Versions: 2.3.7, 3.1.2 >Reporter: Johan Gustavsson >Assignee: Jesus Camacho Rodriguez >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently there is a hive.log.explain.output option that prints extended > explain to log. While this is helpful for internal investigations, it limits > the information that is available to users. So we should add options to make > this print non-extended explain to console,. for general user consumption, to > make it easier for users to debug queries and workflows without having to > resubmit queries with explain. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HIVE-24281) Unable to reopen a closed bug report
[ https://issues.apache.org/jira/browse/HIVE-24281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Tagra reopened HIVE-24281: > Unable to reopen a closed bug report > > > Key: HIVE-24281 > URL: https://issues.apache.org/jira/browse/HIVE-24281 > Project: Hive > Issue Type: Bug > Components: API >Affects Versions: 1.2.0 >Reporter: Ankur Tagra >Assignee: Ankur Tagra >Priority: Trivial > Fix For: 0.11.1 > > > Unable to reopen a closed bug report -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-17709) remove sun.misc.Cleaner references
[ https://issues.apache.org/jira/browse/HIVE-17709?focusedWorklogId=519785=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519785 ] ASF GitHub Bot logged work on HIVE-17709: - Author: ASF GitHub Bot Created on: 03/Dec/20 17:29 Start Date: 03/Dec/20 17:29 Worklog Time Spent: 10m Work Description: abstractdog opened a new pull request #1739: URL: https://github.com/apache/hive/pull/1739 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519785) Remaining Estimate: 0h Time Spent: 10m > remove sun.misc.Cleaner references > -- > > Key: HIVE-17709 > URL: https://issues.apache.org/jira/browse/HIVE-17709 > Project: Hive > Issue Type: Sub-task > Components: Build Infrastructure >Reporter: Zoltan Haindrich >Assignee: László Bodor >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > according to: > https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36 > HADOOP-12760 will be the long term fix -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-17709) remove sun.misc.Cleaner references
[ https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-17709: -- Labels: pull-request-available (was: ) > remove sun.misc.Cleaner references > -- > > Key: HIVE-17709 > URL: https://issues.apache.org/jira/browse/HIVE-17709 > Project: Hive > Issue Type: Sub-task > Components: Build Infrastructure >Reporter: Zoltan Haindrich >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > according to: > https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36 > HADOOP-12760 will be the long term fix -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24481) Skipped compaction can cause data corruption with streaming
[ https://issues.apache.org/jira/browse/HIVE-24481?focusedWorklogId=519781=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519781 ] ASF GitHub Bot logged work on HIVE-24481: - Author: ASF GitHub Bot Created on: 03/Dec/20 17:13 Start Date: 03/Dec/20 17:13 Worklog Time Spent: 10m Work Description: pvargacl opened a new pull request #1738: URL: https://github.com/apache/hive/pull/1738 ### What changes were proposed in this pull request? See the details in HIVE-24481 ### Why are the changes needed? Fix the data corruption issue. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519781) Remaining Estimate: 0h Time Spent: 10m > Skipped compaction can cause data corruption with streaming > --- > > Key: HIVE-24481 > URL: https://issues.apache.org/jira/browse/HIVE-24481 > Project: Hive > Issue Type: Bug >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: Compaction > Time Spent: 10m > Remaining Estimate: 0h > > Timeline: > 1. create a partitioned table, add one static partition > 2. transaction 1 writes delta_1, and aborts > 3. create streaming connection, with batch 3, withStaticPartitionValues with > the existing partition > 4. beginTransaction, write, commitTransaction > 5. beginTransaction, write, abortTransaction > 6. beingTransaction, write, commitTransaction > 7. close connection, count of the table is 2 > 8. run manual minor compaction on the partition. it will skip compaction, > because deltacount =1 but clean, because there is aborted txn1 > 9. cleaner will remove both aborted record from txn_components > 10. wait for acidhousekeeper to remove empty aborted txns > 11. select * from table return *3* records, reading the aborted record -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24481) Skipped compaction can cause data corruption with streaming
[ https://issues.apache.org/jira/browse/HIVE-24481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24481: -- Labels: Compaction pull-request-available (was: Compaction) > Skipped compaction can cause data corruption with streaming > --- > > Key: HIVE-24481 > URL: https://issues.apache.org/jira/browse/HIVE-24481 > Project: Hive > Issue Type: Bug >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: Compaction, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Timeline: > 1. create a partitioned table, add one static partition > 2. transaction 1 writes delta_1, and aborts > 3. create streaming connection, with batch 3, withStaticPartitionValues with > the existing partition > 4. beginTransaction, write, commitTransaction > 5. beginTransaction, write, abortTransaction > 6. beingTransaction, write, commitTransaction > 7. close connection, count of the table is 2 > 8. run manual minor compaction on the partition. it will skip compaction, > because deltacount =1 but clean, because there is aborted txn1 > 9. cleaner will remove both aborted record from txn_components > 10. wait for acidhousekeeper to remove empty aborted txns > 11. select * from table return *3* records, reading the aborted record -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-17709) remove sun.misc.Cleaner references
[ https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243363#comment-17243363 ] László Bodor commented on HIVE-17709: - super-cool, let me pick it from there! that's urgent for java11 llap runtime, and you can take care of the rest in HIVE-22415, does it make sense to you? > remove sun.misc.Cleaner references > -- > > Key: HIVE-17709 > URL: https://issues.apache.org/jira/browse/HIVE-17709 > Project: Hive > Issue Type: Sub-task > Components: Build Infrastructure >Reporter: Zoltan Haindrich >Assignee: László Bodor >Priority: Major > > according to: > https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36 > HADOOP-12760 will be the long term fix -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-17709) remove sun.misc.Cleaner references
[ https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243358#comment-17243358 ] David Mollitor commented on HIVE-17709: --- https://github.com/apache/hive/pull/1624/commits/efbdf2ab17d6f1504cfadd2a02ac9b53673b83a6 > remove sun.misc.Cleaner references > -- > > Key: HIVE-17709 > URL: https://issues.apache.org/jira/browse/HIVE-17709 > Project: Hive > Issue Type: Sub-task > Components: Build Infrastructure >Reporter: Zoltan Haindrich >Assignee: László Bodor >Priority: Major > > according to: > https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36 > HADOOP-12760 will be the long term fix -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.1
[ https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=519778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519778 ] ASF GitHub Bot logged work on HIVE-21737: - Author: ASF GitHub Bot Created on: 03/Dec/20 17:02 Start Date: 03/Dec/20 17:02 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #1635: URL: https://github.com/apache/hive/pull/1635#issuecomment-738140417 > Do you have an estimate of where a possible vote to get the previous changes (and hopefully this one) released as 2.3.8? Thinking about Spark could take it in. I think @wangyum is still testing the combination in https://github.com/apache/spark/pull/30517 and we are also doing testing internally. Once that is done, I'll prepare for the release and start a vote. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519778) Time Spent: 3.5h (was: 3h 20m) > Upgrade Avro to version 1.10.1 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 3.5h > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-17709) remove sun.misc.Cleaner references
[ https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243356#comment-17243356 ] László Bodor commented on HIVE-17709: - thanks, let me check! there are a bunch of commits, which one contains Cleaner related stuff? > remove sun.misc.Cleaner references > -- > > Key: HIVE-17709 > URL: https://issues.apache.org/jira/browse/HIVE-17709 > Project: Hive > Issue Type: Sub-task > Components: Build Infrastructure >Reporter: Zoltan Haindrich >Assignee: László Bodor >Priority: Major > > according to: > https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36 > HADOOP-12760 will be the long term fix -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-17709) remove sun.misc.Cleaner references
[ https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243355#comment-17243355 ] David Mollitor commented on HIVE-17709: --- [~abstractdog] Take a look at my PR where I have already done this work: https://github.com/apache/hive/pull/1624/files > remove sun.misc.Cleaner references > -- > > Key: HIVE-17709 > URL: https://issues.apache.org/jira/browse/HIVE-17709 > Project: Hive > Issue Type: Sub-task > Components: Build Infrastructure >Reporter: Zoltan Haindrich >Assignee: László Bodor >Priority: Major > > according to: > https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36 > HADOOP-12760 will be the long term fix -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-17709) remove sun.misc.Cleaner references
[ https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243347#comment-17243347 ] László Bodor commented on HIVE-17709: - thanks [~belugabehr], glad to hear that! please let's do them independently, I'm actively working on JDK11 in-house (CLDR), I don't want to be blocked anymore, I'm open to dirty solutions at the moment and we'll clean later (copying a util class) :) > remove sun.misc.Cleaner references > -- > > Key: HIVE-17709 > URL: https://issues.apache.org/jira/browse/HIVE-17709 > Project: Hive > Issue Type: Sub-task > Components: Build Infrastructure >Reporter: Zoltan Haindrich >Assignee: László Bodor >Priority: Major > > according to: > https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36 > HADOOP-12760 will be the long term fix -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-17709) remove sun.misc.Cleaner references
[ https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243346#comment-17243346 ] David Mollitor commented on HIVE-17709: --- [~abstractdog] If I recall, I already dealt with this issue in my PR. The issue at hand is that Hadoop 3.x does not itself support JDK 11 until 3.2 > remove sun.misc.Cleaner references > -- > > Key: HIVE-17709 > URL: https://issues.apache.org/jira/browse/HIVE-17709 > Project: Hive > Issue Type: Sub-task > Components: Build Infrastructure >Reporter: Zoltan Haindrich >Assignee: László Bodor >Priority: Major > > according to: > https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36 > HADOOP-12760 will be the long term fix -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL
[ https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243343#comment-17243343 ] Kishen Das edited comment on HIVE-24482 at 12/3/20, 4:51 PM: - [~dkuzmenko] That's the idea, once we implement all the subtasks. Rather than going directly to DB, CachedStore is supposed to refresh the latest data from DB, before serving. [~ashish-kumar-sharma] is driving the CachedStore changes. was (Author: kishendas): [~dkuzmenko] That's the idea, once we implement all the subtasks. [~ashish-kumar-sharma] is driving the CachedStore changes. > Advance write Id during AlterTableAddConstraint DDL > --- > > Key: HIVE-24482 > URL: https://issues.apache.org/jira/browse/HIVE-24482 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > For AlterTableAddConstraint related DDL tasks, although we might be advancing > the write ID, looks like it's not updated correctly during the Analyzer > phase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL
[ https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243343#comment-17243343 ] Kishen Das commented on HIVE-24482: --- [~dkuzmenko] That's the idea, once we implement all the subtasks. [~ashish-kumar-sharma] is driving the CachedStore changes. > Advance write Id during AlterTableAddConstraint DDL > --- > > Key: HIVE-24482 > URL: https://issues.apache.org/jira/browse/HIVE-24482 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > For AlterTableAddConstraint related DDL tasks, although we might be advancing > the write ID, looks like it's not updated correctly during the Analyzer > phase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-17709) remove sun.misc.Cleaner references
[ https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243338#comment-17243338 ] László Bodor commented on HIVE-17709: - I think we should not block jdk11 effort on the hadoop 3.2 upgrade just because hadoop introduced a CleanerUtil class, let's create a copy of that and use it, and then we'll turn back to hadoop's implementation once we upgraded we have only 2 class references on Cleaner at the moment: {code} grep -iRH "import sun.misc.Cleaner" llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java:import sun.misc.Cleaner; ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java:import sun.misc.Cleaner; {code} > remove sun.misc.Cleaner references > -- > > Key: HIVE-17709 > URL: https://issues.apache.org/jira/browse/HIVE-17709 > Project: Hive > Issue Type: Sub-task > Components: Build Infrastructure >Reporter: Zoltan Haindrich >Assignee: László Bodor >Priority: Major > > according to: > https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36 > HADOOP-12760 will be the long term fix -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL
[ https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243335#comment-17243335 ] Denys Kuzmenko commented on HIVE-24482: --- Does it mean that AlterTableAddConstraint on HMS1 is going to invalidate ValidWriteIdList in the CachedStore on HMS 2 and following select statement from this table should go db directly? > Advance write Id during AlterTableAddConstraint DDL > --- > > Key: HIVE-24482 > URL: https://issues.apache.org/jira/browse/HIVE-24482 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > For AlterTableAddConstraint related DDL tasks, although we might be advancing > the write ID, looks like it's not updated correctly during the Analyzer > phase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL
[ https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243335#comment-17243335 ] Denys Kuzmenko edited comment on HIVE-24482 at 12/3/20, 4:45 PM: - Does it mean that AlterTableAddConstraint on HMS1 is going to invalidate ValidWriteIdList in the CachedStore on HMS2 and following select statement from this table should go db directly? was (Author: dkuzmenko): Does it mean that AlterTableAddConstraint on HMS1 is going to invalidate ValidWriteIdList in the CachedStore on HMS 2 and following select statement from this table should go db directly? > Advance write Id during AlterTableAddConstraint DDL > --- > > Key: HIVE-24482 > URL: https://issues.apache.org/jira/browse/HIVE-24482 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > For AlterTableAddConstraint related DDL tasks, although we might be advancing > the write ID, looks like it's not updated correctly during the Analyzer > phase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-17709) remove sun.misc.Cleaner references
[ https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243330#comment-17243330 ] David Mollitor commented on HIVE-17709: --- I am looking at Hadoop 3.2 upgrade in Hive right now actually. Working on a PR. > remove sun.misc.Cleaner references > -- > > Key: HIVE-17709 > URL: https://issues.apache.org/jira/browse/HIVE-17709 > Project: Hive > Issue Type: Sub-task > Components: Build Infrastructure >Reporter: Zoltan Haindrich >Assignee: László Bodor >Priority: Major > > according to: > https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36 > HADOOP-12760 will be the long term fix -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-17709) remove sun.misc.Cleaner references
[ https://issues.apache.org/jira/browse/HIVE-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Bodor reassigned HIVE-17709: --- Assignee: László Bodor > remove sun.misc.Cleaner references > -- > > Key: HIVE-17709 > URL: https://issues.apache.org/jira/browse/HIVE-17709 > Project: Hive > Issue Type: Sub-task > Components: Build Infrastructure >Reporter: Zoltan Haindrich >Assignee: László Bodor >Priority: Major > > according to: > https://github.com/apache/hive/blob/188f7fb47aec3f98ef53965ba6ae84e23bd26f59/llap-server/src/java/org/apache/hadoop/hive/llap/cache/SimpleAllocator.java#L36 > HADOOP-12760 will be the long term fix -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS
[ https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=519762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519762 ] ASF GitHub Bot logged work on HIVE-2: - Author: ASF GitHub Bot Created on: 03/Dec/20 16:12 Start Date: 03/Dec/20 16:12 Worklog Time Spent: 10m Work Description: deniskuzZ commented on a change in pull request #1716: URL: https://github.com/apache/hive/pull/1716#discussion_r535372633 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -316,6 +314,30 @@ private boolean removeFiles(String location, ValidWriteIdList writeIdList, Compa } fs.delete(dead, true); } -return true; +// Check if there will be more obsolete directories to clean when possible. We will only mark cleaned when this +// number reaches 0. +return getNumEventuallyObsoleteDirs(location, dirSnapshots) == 0; + } + + /** + * Get the number of base/delta directories the Cleaner should remove eventually. If we check this after cleaning + * we can see if the Cleaner has further work to do in this table/partition directory that it hasn't been able to + * finish, e.g. because of an open transaction at the time of compaction. + * We do this by assuming that there are no open transactions anywhere and then calling getAcidState. If there are + * obsolete directories, then the Cleaner has more work to do. + * @param location location of table + * @return number of dirs left for the cleaner to clean – eventually + * @throws IOException + */ + private int getNumEventuallyObsoleteDirs(String location, Map dirSnapshots) + throws IOException { +ValidTxnList validTxnList = new ValidReadTxnList(); +//save it so that getAcidState() sees it +conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.writeToString()); +ValidReaderWriteIdList validWriteIdList = new ValidReaderWriteIdList(); +Path locPath = new Path(location); +AcidUtils.Directory dir = AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, validWriteIdList, +Ref.from(false), false, dirSnapshots); +return dir.getObsolete().size(); Review comment: if it's only for versions without HIVE-23107, should it be behind the feature flag/schema version check? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519762) Time Spent: 7h 10m (was: 7h) > compactor.Cleaner should not set state "mark cleaned" if there are obsolete > files in the FS > --- > > Key: HIVE-2 > URL: https://issues.apache.org/jira/browse/HIVE-2 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 7h 10m > Remaining Estimate: 0h > > This is an improvement on HIVE-24314, in which markCleaned() is called only > if +any+ files are deleted by the cleaner. This could cause a problem in the > following case: > Say for table_1 compaction1 cleaning was blocked by an open txn, and > compaction is run again on the same table (compaction2). Both compaction1 and > compaction2 could be in "ready for cleaning" at the same time. By this time > the blocking open txn could be committed. When the cleaner runs, one of > compaction1 and compaction2 will remain in the "ready for cleaning" state: > Say compaction2 is picked up by the cleaner first. The Cleaner deletes all > obsolete files. Then compaction1 is picked up by the cleaner; the cleaner > doesn't remove any files and compaction1 will stay in the queue in a "ready > for cleaning" state. > HIVE-24291 already solves this issue but if it isn't usable (for example if > HMS schema changes are out the question) then HIVE-24314 + this change will > fix the issue of the Cleaner not removing all obsolete files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL
[ https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243303#comment-17243303 ] Kishen Das commented on HIVE-24482: --- [~dkuzmenko] Please go through -> [https://cwiki.apache.org/confluence/display/Hive/Synchronized+Metastore+Cache] . > Advance write Id during AlterTableAddConstraint DDL > --- > > Key: HIVE-24482 > URL: https://issues.apache.org/jira/browse/HIVE-24482 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > For AlterTableAddConstraint related DDL tasks, although we might be advancing > the write ID, looks like it's not updated correctly during the Analyzer > phase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL
[ https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243302#comment-17243302 ] Denys Kuzmenko commented on HIVE-24482: --- [~kishendas], qq, why should we advance write ID in this case? We are not changing the data. > Advance write Id during AlterTableAddConstraint DDL > --- > > Key: HIVE-24482 > URL: https://issues.apache.org/jira/browse/HIVE-24482 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > For AlterTableAddConstraint related DDL tasks, although we might be advancing > the write ID, looks like it's not updated correctly during the Analyzer > phase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24471) Add support for combiner in hash mode group aggregation
[ https://issues.apache.org/jira/browse/HIVE-24471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24471: -- Labels: pull-request-available (was: ) > Add support for combiner in hash mode group aggregation > > > Key: HIVE-24471 > URL: https://issues.apache.org/jira/browse/HIVE-24471 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In map side group aggregation, partial grouped aggregation is calculated to > reduce the data written to disk by map task. In case of hash aggregation, > where the input data is not sorted, hash table is used. If the hash table > size increases beyond configurable limit, data is flushed to disk and new > hash table is generated. If the reduction by hash table is less than min hash > aggregation reduction calculated during compile time, the map side > aggregation is converted to streaming mode. So if the first few batch of > records does not result into significant reduction, then the mode is switched > to streaming mode. This may have impact on performance, if the subsequent > batch of records have less number of distinct values. To mitigate this > situation, a combiner can be added to the map task after the keys are sorted. > This will make sure that the aggregation is done if possible and reduce the > data written to disk. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24471) Add support for combiner in hash mode group aggregation
[ https://issues.apache.org/jira/browse/HIVE-24471?focusedWorklogId=519750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519750 ] ASF GitHub Bot logged work on HIVE-24471: - Author: ASF GitHub Bot Created on: 03/Dec/20 15:57 Start Date: 03/Dec/20 15:57 Worklog Time Spent: 10m Work Description: maheshk114 opened a new pull request #1736: URL: https://github.com/apache/hive/pull/1736 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519750) Remaining Estimate: 0h Time Spent: 10m > Add support for combiner in hash mode group aggregation > > > Key: HIVE-24471 > URL: https://issues.apache.org/jira/browse/HIVE-24471 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: mahesh kumar behera >Assignee: mahesh kumar behera >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In map side group aggregation, partial grouped aggregation is calculated to > reduce the data written to disk by map task. In case of hash aggregation, > where the input data is not sorted, hash table is used. If the hash table > size increases beyond configurable limit, data is flushed to disk and new > hash table is generated. If the reduction by hash table is less than min hash > aggregation reduction calculated during compile time, the map side > aggregation is converted to streaming mode. So if the first few batch of > records does not result into significant reduction, then the mode is switched > to streaming mode. This may have impact on performance, if the subsequent > batch of records have less number of distinct values. To mitigate this > situation, a combiner can be added to the map task after the keys are sorted. > This will make sure that the aggregation is done if possible and reduce the > data written to disk. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24481) Skipped compaction can cause data corruption with streaming
[ https://issues.apache.org/jira/browse/HIVE-24481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-24481: -- Labels: Compaction (was: ) > Skipped compaction can cause data corruption with streaming > --- > > Key: HIVE-24481 > URL: https://issues.apache.org/jira/browse/HIVE-24481 > Project: Hive > Issue Type: Bug >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > Labels: Compaction > > Timeline: > 1. create a partitioned table, add one static partition > 2. transaction 1 writes delta_1, and aborts > 3. create streaming connection, with batch 3, withStaticPartitionValues with > the existing partition > 4. beginTransaction, write, commitTransaction > 5. beginTransaction, write, abortTransaction > 6. beingTransaction, write, commitTransaction > 7. close connection, count of the table is 2 > 8. run manual minor compaction on the partition. it will skip compaction, > because deltacount =1 but clean, because there is aborted txn1 > 9. cleaner will remove both aborted record from txn_components > 10. wait for acidhousekeeper to remove empty aborted txns > 11. select * from table return *3* records, reading the aborted record -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-21737) Upgrade Avro to version 1.10.1
[ https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated HIVE-21737: Summary: Upgrade Avro to version 1.10.1 (was: Upgrade Avro to version 1.10.0) > Upgrade Avro to version 1.10.1 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=519740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519740 ] ASF GitHub Bot logged work on HIVE-21737: - Author: ASF GitHub Bot Created on: 03/Dec/20 15:52 Start Date: 03/Dec/20 15:52 Worklog Time Spent: 10m Work Description: iemejia commented on a change in pull request #1635: URL: https://github.com/apache/hive/pull/1635#discussion_r535356472 ## File path: llap-tez/pom.xml ## @@ -104,6 +104,11 @@ hadoop-yarn-registry true + + org.xerial.snappy + snappy-java Review comment: Snappy and zstd are now optional on Avro, you probably have these already defined somewhere else but the tests in this module were complaining. I don't know if this could imply some runtime issue in other parts of the codebase that tests may not match, so worth to think about this if it is eventually the case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519740) Time Spent: 3h 20m (was: 3h 10m) > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=519736=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519736 ] ASF GitHub Bot logged work on HIVE-21737: - Author: ASF GitHub Bot Created on: 03/Dec/20 15:50 Start Date: 03/Dec/20 15:50 Worklog Time Spent: 10m Work Description: iemejia commented on a change in pull request #1635: URL: https://github.com/apache/hive/pull/1635#discussion_r535354656 ## File path: ql/pom.xml ## @@ -220,7 +220,7 @@ org.apache.avro avro-mapred - hadoop2 + ${avro.version} Review comment: Updated thx This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519736) Time Spent: 3h 10m (was: 3h) > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL
[ https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishen Das reassigned HIVE-24482: - Assignee: Kishen Das > Advance write Id during AlterTableAddConstraint DDL > --- > > Key: HIVE-24482 > URL: https://issues.apache.org/jira/browse/HIVE-24482 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > For AlterTableAddConstraint related DDL tasks, although we might be advancing > the write ID, looks like it's not updated correctly during the Analyzer > phase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-24482) Advance write Id during AlterTableAddConstraint DDL
[ https://issues.apache.org/jira/browse/HIVE-24482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-24482 started by Kishen Das. - > Advance write Id during AlterTableAddConstraint DDL > --- > > Key: HIVE-24482 > URL: https://issues.apache.org/jira/browse/HIVE-24482 > Project: Hive > Issue Type: Sub-task >Reporter: Kishen Das >Assignee: Kishen Das >Priority: Major > > For AlterTableAddConstraint related DDL tasks, although we might be advancing > the write ID, looks like it's not updated correctly during the Analyzer > phase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24481) Skipped compaction can cause data corruption with streaming
[ https://issues.apache.org/jira/browse/HIVE-24481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Varga reassigned HIVE-24481: -- > Skipped compaction can cause data corruption with streaming > --- > > Key: HIVE-24481 > URL: https://issues.apache.org/jira/browse/HIVE-24481 > Project: Hive > Issue Type: Bug >Reporter: Peter Varga >Assignee: Peter Varga >Priority: Major > > Timeline: > 1. create a partitioned table, add one static partition > 2. transaction 1 writes delta_1, and aborts > 3. create streaming connection, with batch 3, withStaticPartitionValues with > the existing partition > 4. beginTransaction, write, commitTransaction > 5. beginTransaction, write, abortTransaction > 6. beingTransaction, write, commitTransaction > 7. close connection, count of the table is 2 > 8. run manual minor compaction on the partition. it will skip compaction, > because deltacount =1 but clean, because there is aborted txn1 > 9. cleaner will remove both aborted record from txn_components > 10. wait for acidhousekeeper to remove empty aborted txns > 11. select * from table return *3* records, reading the aborted record -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=519730=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519730 ] ASF GitHub Bot logged work on HIVE-21737: - Author: ASF GitHub Bot Created on: 03/Dec/20 15:36 Start Date: 03/Dec/20 15:36 Worklog Time Spent: 10m Work Description: wangyum commented on a change in pull request #1635: URL: https://github.com/apache/hive/pull/1635#discussion_r535340015 ## File path: ql/pom.xml ## @@ -220,7 +220,7 @@ org.apache.avro avro-mapred - hadoop2 + ${avro.version} Review comment: Do not need version here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519730) Time Spent: 3h (was: 2h 50m) > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 3h > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519718=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519718 ] ASF GitHub Bot logged work on HIVE-23965: - Author: ASF GitHub Bot Created on: 03/Dec/20 15:19 Start Date: 03/Dec/20 15:19 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #1714: URL: https://github.com/apache/hive/pull/1714#issuecomment-738072615 > @kgyrtkirk The previous issue was not due to flakiness. The schema of the metastore changed between the time that pre-commit tests were run and the time that this PR was merged to master. To avoid a similar situation the PR should be merged ASAP after running the pre-commit with the tip of the master. okay; the last precommit run for this changeset was executed on 11.27 - which was a few days ago ;lets wait for the 3 runs I've scheduled and merge it right after those This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519718) Time Spent: 7.5h (was: 7h 20m) > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: master355.tgz > > Time Spent: 7.5h > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519717 ] ASF GitHub Bot logged work on HIVE-23965: - Author: ASF GitHub Bot Created on: 03/Dec/20 15:18 Start Date: 03/Dec/20 15:18 Worklog Time Spent: 10m Work Description: zabetak commented on a change in pull request #1714: URL: https://github.com/apache/hive/pull/1714#discussion_r535322152 ## File path: standalone-metastore/metastore-server/src/test/resources/sql/postgres/upgrade-3.1.3000-to-4.0.0.postgres.sql ## @@ -0,0 +1,77 @@ +-- The file has some overlapping with upgrade-3.2.0-to-4.0.0.postgres.sql +SELECT 'Upgrading MetaStore schema from 3.1.3000 to 4.0.0'; Review comment: I created HIVE-24480 for that purpose. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519717) Time Spent: 7h 20m (was: 7h 10m) > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: master355.tgz > > Time Spent: 7h 20m > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17243250#comment-17243250 ] David Mollitor commented on HIVE-21737: --- Also, some of the work I've done: # AVRO-2335: Drop dependency on JODA Time # AVRO-2333: Drop commons-codec dependency # AVRO-2333: Drop commons-logging dependency # AVRO-2061: Better error messages # AVRO-2056: Better performance with Double types # AVRO-2696: Better performance for Doubles and Floats # AVRO-2801: Better performance when using Strings in Maps # Lots of other small improvements In particular, AVRO-2335, AVRO-2333, AVRO-2061 was based on my experience with Hive and Avro integration. > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24474) Failed compaction always logs TxnAbortedException (again)
[ https://issues.apache.org/jira/browse/HIVE-24474?focusedWorklogId=519710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519710 ] ASF GitHub Bot logged work on HIVE-24474: - Author: ASF GitHub Bot Created on: 03/Dec/20 15:06 Start Date: 03/Dec/20 15:06 Worklog Time Spent: 10m Work Description: klcopp opened a new pull request #1735: URL: https://github.com/apache/hive/pull/1735 ### What changes were proposed in this pull request? ### Why are the changes needed? See HIVE-24474 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually, since the TxnAbortedException only appears in the logs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519710) Remaining Estimate: 0h Time Spent: 10m > Failed compaction always logs TxnAbortedException (again) > - > > Key: HIVE-24474 > URL: https://issues.apache.org/jira/browse/HIVE-24474 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Re-introduced with HIVE-24096. > If there is an error during compaction, the compaction's txn is aborted but > in the finally clause, we try to commit it (commitTxnIfSet), so Worker throws > a TxnAbortedException. > We should set compactorTxnId to TXN_ID_NOT_SET if the compaction's txn is > aborted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24474) Failed compaction always logs TxnAbortedException (again)
[ https://issues.apache.org/jira/browse/HIVE-24474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karen Coppage updated HIVE-24474: - Summary: Failed compaction always logs TxnAbortedException (again) (was: Failed compaction always throws TxnAbortedException (again)) > Failed compaction always logs TxnAbortedException (again) > - > > Key: HIVE-24474 > URL: https://issues.apache.org/jira/browse/HIVE-24474 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Fix For: 4.0.0 > > > Re-introduced with HIVE-24096. > If there is an error during compaction, the compaction's txn is aborted but > in the finally clause, we try to commit it (commitTxnIfSet), so Worker throws > a TxnAbortedException. > We should set compactorTxnId to TXN_ID_NOT_SET if the compaction's txn is > aborted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24474) Failed compaction always logs TxnAbortedException (again)
[ https://issues.apache.org/jira/browse/HIVE-24474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24474: -- Labels: pull-request-available (was: ) > Failed compaction always logs TxnAbortedException (again) > - > > Key: HIVE-24474 > URL: https://issues.apache.org/jira/browse/HIVE-24474 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Re-introduced with HIVE-24096. > If there is an error during compaction, the compaction's txn is aborted but > in the finally clause, we try to commit it (commitTxnIfSet), so Worker throws > a TxnAbortedException. > We should set compactorTxnId to TXN_ID_NOT_SET if the compaction's txn is > aborted. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24479) Introduce setting to set lower bound of hash aggregation reduction.
[ https://issues.apache.org/jira/browse/HIVE-24479?focusedWorklogId=519709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519709 ] ASF GitHub Bot logged work on HIVE-24479: - Author: ASF GitHub Bot Created on: 03/Dec/20 15:05 Start Date: 03/Dec/20 15:05 Worklog Time Spent: 10m Work Description: kasakrisz opened a new pull request #1734: URL: https://github.com/apache/hive/pull/1734 ### What changes were proposed in this pull request? Introduce a new HiveConf setting to set lower bound of hash aggregation reduction. The default value is 0.5. During query compilation hash aggregation reduction is adjusted by calculating its effectiveness. With this patch if the adjusted reduction value is less than the configured lower bound the lower bound value will be used. ### Why are the changes needed? Some cases we end up with 0 values forcing the Group by operator to skip Hash aggregates and chooses streaming mode. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519709) Remaining Estimate: 0h Time Spent: 10m > Introduce setting to set lower bound of hash aggregation reduction. > --- > > Key: HIVE-24479 > URL: https://issues.apache.org/jira/browse/HIVE-24479 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 4.0.0 >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > * Default setting of hash group by min reduction % is 0.99. > * During compilation, we check its effectiveness and adjust it accordingly in > {{SetHashGroupByMinReduction}}: > {code} > float defaultMinReductionHashAggrFactor = desc.getMinReductionHashAggr(); > float minReductionHashAggrFactor = 1f - ((float) ndvProduct / numRows); > if (minReductionHashAggrFactor < defaultMinReductionHashAggrFactor) { > desc.setMinReductionHashAggr(minReductionHashAggrFactor); > } > {code} > For certain queries, this computation turns out to be "0". > This forces operator to skip HashAggregates completely and always ends up > choosing streaming mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24479) Introduce setting to set lower bound of hash aggregation reduction.
[ https://issues.apache.org/jira/browse/HIVE-24479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24479: -- Labels: pull-request-available (was: ) > Introduce setting to set lower bound of hash aggregation reduction. > --- > > Key: HIVE-24479 > URL: https://issues.apache.org/jira/browse/HIVE-24479 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 4.0.0 >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > * Default setting of hash group by min reduction % is 0.99. > * During compilation, we check its effectiveness and adjust it accordingly in > {{SetHashGroupByMinReduction}}: > {code} > float defaultMinReductionHashAggrFactor = desc.getMinReductionHashAggr(); > float minReductionHashAggrFactor = 1f - ((float) ndvProduct / numRows); > if (minReductionHashAggrFactor < defaultMinReductionHashAggrFactor) { > desc.setMinReductionHashAggr(minReductionHashAggrFactor); > } > {code} > For certain queries, this computation turns out to be "0". > This forces operator to skip HashAggregates completely and always ends up > choosing streaming mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24346) Store HPL/SQL packages into HMS
[ https://issues.apache.org/jira/browse/HIVE-24346?focusedWorklogId=519706=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519706 ] ASF GitHub Bot logged work on HIVE-24346: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:56 Start Date: 03/Dec/20 14:56 Worklog Time Spent: 10m Work Description: zeroflag opened a new pull request #1733: URL: https://github.com/apache/hive/pull/1733 HPLSQL procedures are already stored in HMS but packages wasn't. This patch addresses this and makes use of HMS as a storage backend for HPLSQL packages. The whole package code is stored in the RDBMS as a text. When the client references a hplsql will look it up in HMS via the thrift API. PL/SQL allows us to define the package header and the implementation separately, or change the body later, therefore there are 2 columns in the table, one for the header and one for the body. cc: @kgyrtkirk This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519706) Remaining Estimate: 0h Time Spent: 10m > Store HPL/SQL packages into HMS > --- > > Key: HIVE-24346 > URL: https://issues.apache.org/jira/browse/HIVE-24346 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24346) Store HPL/SQL packages into HMS
[ https://issues.apache.org/jira/browse/HIVE-24346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24346: -- Labels: pull-request-available (was: ) > Store HPL/SQL packages into HMS > --- > > Key: HIVE-24346 > URL: https://issues.apache.org/jira/browse/HIVE-24346 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=519705=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519705 ] ASF GitHub Bot logged work on HIVE-21737: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:54 Start Date: 03/Dec/20 14:54 Worklog Time Spent: 10m Work Description: iemejia edited a comment on pull request #1635: URL: https://github.com/apache/hive/pull/1635#issuecomment-738045721 Do you have an estimate of where a possible vote to get the previous changes (and hopefully this one) released as 2.3.8? Thinking about Spark could take it in. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519705) Time Spent: 2h 50m (was: 2h 40m) > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=519704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519704 ] ASF GitHub Bot logged work on HIVE-21737: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:54 Start Date: 03/Dec/20 14:54 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #1635: URL: https://github.com/apache/hive/pull/1635#issuecomment-738045721 Do you have an estimate of where a possible vote to get the previous changes (and hopefully this one) released as 2.3.8? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519704) Time Spent: 2h 40m (was: 2.5h) > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24479) Introduce setting to set lower bound of hash aggregation reduction.
[ https://issues.apache.org/jira/browse/HIVE-24479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa reassigned HIVE-24479: - > Introduce setting to set lower bound of hash aggregation reduction. > --- > > Key: HIVE-24479 > URL: https://issues.apache.org/jira/browse/HIVE-24479 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 4.0.0 >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Fix For: 4.0.0 > > > * Default setting of hash group by min reduction % is 0.99. > * During compilation, we check its effectiveness and adjust it accordingly in > {{SetHashGroupByMinReduction}}: > {code} > float defaultMinReductionHashAggrFactor = desc.getMinReductionHashAggr(); > float minReductionHashAggrFactor = 1f - ((float) ndvProduct / numRows); > if (minReductionHashAggrFactor < defaultMinReductionHashAggrFactor) { > desc.setMinReductionHashAggr(minReductionHashAggrFactor); > } > {code} > For certain queries, this computation turns out to be "0". > This forces operator to skip HashAggregates completely and always ends up > choosing streaming mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.0
[ https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=519702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519702 ] ASF GitHub Bot logged work on HIVE-21737: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:52 Start Date: 03/Dec/20 14:52 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #1635: URL: https://github.com/apache/hive/pull/1635#issuecomment-738044346 PR updated to the last version of Avro let's see if this gets us more tests passing now. :crossed_fingers: @sunchao @wangyum This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519702) Time Spent: 2.5h (was: 2h 20m) > Upgrade Avro to version 1.10.0 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 2.5h > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24478) Inner GroupBy with Distinct SemanticException: Invalid column reference
[ https://issues.apache.org/jira/browse/HIVE-24478?focusedWorklogId=519699=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519699 ] ASF GitHub Bot logged work on HIVE-24478: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:48 Start Date: 03/Dec/20 14:48 Worklog Time Spent: 10m Work Description: pgaref opened a new pull request #1732: URL: https://github.com/apache/hive/pull/1732 Change-Id: I29000afd1c47e59d07db74a212a7629e2b5afe73 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519699) Remaining Estimate: 0h Time Spent: 10m > Inner GroupBy with Distinct SemanticException: Invalid column reference > --- > > Key: HIVE-24478 > URL: https://issues.apache.org/jira/browse/HIVE-24478 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > {code:java} > CREATE TABLE tmp_src1( > `npp` string, > `nsoc` string) stored as orc; > INSERT INTO tmp_src1 (npp,nsoc) VALUES ('1-1000CG61', '7273111'); > SELECT `min_nsoc` > FROM > (SELECT `npp`, > MIN(`nsoc`) AS `min_nsoc`, > COUNT(DISTINCT `nsoc`) AS `nb_nsoc` > FROM tmp_src1 > GROUP BY `npp`) `a` > WHERE `nb_nsoc` > 0; > {code} > Issue: > {code:java} > org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid column > reference 'nsoc' at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator1(SemanticAnalyzer.java:5405) > {code} > Query runs fine when we include `nb_nsoc` in the Select expression -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24478) Inner GroupBy with Distinct SemanticException: Invalid column reference
[ https://issues.apache.org/jira/browse/HIVE-24478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24478: -- Labels: pull-request-available (was: ) > Inner GroupBy with Distinct SemanticException: Invalid column reference > --- > > Key: HIVE-24478 > URL: https://issues.apache.org/jira/browse/HIVE-24478 > Project: Hive > Issue Type: Bug >Reporter: Panagiotis Garefalakis >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {code:java} > CREATE TABLE tmp_src1( > `npp` string, > `nsoc` string) stored as orc; > INSERT INTO tmp_src1 (npp,nsoc) VALUES ('1-1000CG61', '7273111'); > SELECT `min_nsoc` > FROM > (SELECT `npp`, > MIN(`nsoc`) AS `min_nsoc`, > COUNT(DISTINCT `nsoc`) AS `nb_nsoc` > FROM tmp_src1 > GROUP BY `npp`) `a` > WHERE `nb_nsoc` > 0; > {code} > Issue: > {code:java} > org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Invalid column > reference 'nsoc' at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator1(SemanticAnalyzer.java:5405) > {code} > Query runs fine when we include `nb_nsoc` in the Select expression -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24230) Integrate HPL/SQL into HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-24230?focusedWorklogId=519698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519698 ] ASF GitHub Bot logged work on HIVE-24230: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:44 Start Date: 03/Dec/20 14:44 Worklog Time Spent: 10m Work Description: kgyrtkirk merged pull request #1633: URL: https://github.com/apache/hive/pull/1633 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519698) Time Spent: 3h (was: 2h 50m) > Integrate HPL/SQL into HiveServer2 > -- > > Key: HIVE-24230 > URL: https://issues.apache.org/jira/browse/HIVE-24230 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > HPL/SQL is a standalone command line program that can store and load scripts > from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL > depends on Hive and not the other way around. > Changing the dependency order between HPL/SQL and HiveServer would open up > some possibilities which are currently not feasable to implement. For example > one might want to use a third party SQL tool to run selects on stored > procedure (or rather function in this case) outputs. > {code:java} > SELECT * from myStoredProcedure(1, 2); {code} > HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not > work with the current architecture. > Another important factor is performance. Declarative SQL commands are sent to > Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC > and use HiveSever’s internal API for compilation and execution. > The third factor is that existing tools like Beeline or Hue cannot be used > with HPL/SQL since it has its own, separated CLI. > > To make it easier to implement, we keep things separated in the inside at > first, by introducing a hive session level JDBC parameter. > {code:java} > jdbc:hive2://localhost:1/default;hplsqlMode=true {code} > > The hplsqlMode indicates that we are in procedural SQL mode where the user > can create and call stored procedures. HPLSQL allows you to write any kind of > procedural statement at the top level. This patch doesn't limit this but it > might be better to eventually restrict what statements are allowed outside of > stored procedures. > > Since HPLSQL and Hive are running in the same process there is no need to use > the JDBC driver between them. The patch adds an abstraction with 2 different > implementations, one for executing queries on JDBC (for keeping the existing > behaviour) and another one for directly calling Hive's compiler. In HPLSQL > mode the latter is used. > In the inside a new operation (HplSqlOperation) and operation type > (PROCEDURAL_SQL) was added which works similar to the SQLOperation but it > uses the hplsql interpreter to execute arbitrary scripts. This operation > might spawns new SQLOpertions. > For example consider the following statement: > {code:java} > FOR i in 1..10 LOOP > SELECT * FROM table > END LOOP;{code} > We send this to beeline while we'er in hplsql mode. Hive will create a hplsql > interpreter and store it in the session state. A new HplSqlOperation is > created to run the script on the interpreter. > HPLSQL knows how to execute the for loop, but i'll call Hive to run the > select expression. The HplSqlOperation is notified when the select reads a > row and accumulates the rows into a RowSet (memory consumption need to be > considered here) which can be retrieved via thrift from the client side. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24230) Integrate HPL/SQL into HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich resolved HIVE-24230. - Fix Version/s: 4.0.0 Resolution: Fixed merged into master. Thank you [~amagyar]! > Integrate HPL/SQL into HiveServer2 > -- > > Key: HIVE-24230 > URL: https://issues.apache.org/jira/browse/HIVE-24230 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 3h > Remaining Estimate: 0h > > HPL/SQL is a standalone command line program that can store and load scripts > from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL > depends on Hive and not the other way around. > Changing the dependency order between HPL/SQL and HiveServer would open up > some possibilities which are currently not feasable to implement. For example > one might want to use a third party SQL tool to run selects on stored > procedure (or rather function in this case) outputs. > {code:java} > SELECT * from myStoredProcedure(1, 2); {code} > HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not > work with the current architecture. > Another important factor is performance. Declarative SQL commands are sent to > Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC > and use HiveSever’s internal API for compilation and execution. > The third factor is that existing tools like Beeline or Hue cannot be used > with HPL/SQL since it has its own, separated CLI. > > To make it easier to implement, we keep things separated in the inside at > first, by introducing a hive session level JDBC parameter. > {code:java} > jdbc:hive2://localhost:1/default;hplsqlMode=true {code} > > The hplsqlMode indicates that we are in procedural SQL mode where the user > can create and call stored procedures. HPLSQL allows you to write any kind of > procedural statement at the top level. This patch doesn't limit this but it > might be better to eventually restrict what statements are allowed outside of > stored procedures. > > Since HPLSQL and Hive are running in the same process there is no need to use > the JDBC driver between them. The patch adds an abstraction with 2 different > implementations, one for executing queries on JDBC (for keeping the existing > behaviour) and another one for directly calling Hive's compiler. In HPLSQL > mode the latter is used. > In the inside a new operation (HplSqlOperation) and operation type > (PROCEDURAL_SQL) was added which works similar to the SQLOperation but it > uses the hplsql interpreter to execute arbitrary scripts. This operation > might spawns new SQLOpertions. > For example consider the following statement: > {code:java} > FOR i in 1..10 LOOP > SELECT * FROM table > END LOOP;{code} > We send this to beeline while we'er in hplsql mode. Hive will create a hplsql > interpreter and store it in the session state. A new HplSqlOperation is > created to run the script on the interpreter. > HPLSQL knows how to execute the for loop, but i'll call Hive to run the > select expression. The HplSqlOperation is notified when the select reads a > row and accumulates the rows into a RowSet (memory consumption need to be > considered here) which can be retrieved via thrift from the client side. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519687 ] ASF GitHub Bot logged work on HIVE-23965: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:37 Start Date: 03/Dec/20 14:37 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1714: URL: https://github.com/apache/hive/pull/1714#discussion_r535281104 ## File path: standalone-metastore/metastore-server/src/test/resources/sql/postgres/upgrade-3.1.3000-to-4.0.0.postgres.sql ## @@ -0,0 +1,77 @@ +-- The file has some overlapping with upgrade-3.2.0-to-4.0.0.postgres.sql +SELECT 'Upgrading MetaStore schema from 3.1.3000 to 4.0.0'; Review comment: You go with this, but I think we would need the follow up soon, this two upgrade path will cause some confusions :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519687) Time Spent: 7h 10m (was: 7h) > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: master355.tgz > > Time Spent: 7h 10m > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519682 ] ASF GitHub Bot logged work on HIVE-23965: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:33 Start Date: 03/Dec/20 14:33 Worklog Time Spent: 10m Work Description: zabetak commented on a change in pull request #1714: URL: https://github.com/apache/hive/pull/1714#discussion_r535276481 ## File path: standalone-metastore/metastore-server/src/test/resources/sql/postgres/upgrade-3.1.3000-to-4.0.0.postgres.sql ## @@ -0,0 +1,77 @@ +-- The file has some overlapping with upgrade-3.2.0-to-4.0.0.postgres.sql +SELECT 'Upgrading MetaStore schema from 3.1.3000 to 4.0.0'; Review comment: Sure we can try do this but given that it might take some time I would rather leave it as a follow up. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519682) Time Spent: 7h (was: 6h 50m) > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: master355.tgz > > Time Spent: 7h > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519678=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519678 ] ASF GitHub Bot logged work on HIVE-23965: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:25 Start Date: 03/Dec/20 14:25 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1714: URL: https://github.com/apache/hive/pull/1714#discussion_r535268741 ## File path: standalone-metastore/metastore-server/src/test/resources/sql/postgres/upgrade-3.1.3000-to-4.0.0.postgres.sql ## @@ -0,0 +1,77 @@ +-- The file has some overlapping with upgrade-3.2.0-to-4.0.0.postgres.sql +SELECT 'Upgrading MetaStore schema from 3.1.3000 to 4.0.0'; Review comment: How big is the differences between 3.2.0 and 3.1.3000? Is it not possible to manually apply changes to the schema to be 3.2.0 in the image and then use 3.2.0 -> 4.0.0 update path? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519678) Time Spent: 6h 50m (was: 6h 40m) > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: master355.tgz > > Time Spent: 6h 50m > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24460) Refactor Get Next Event ID for DbNotificationListener
[ https://issues.apache.org/jira/browse/HIVE-24460?focusedWorklogId=519672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519672 ] ASF GitHub Bot logged work on HIVE-24460: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:22 Start Date: 03/Dec/20 14:22 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1725: URL: https://github.com/apache/hive/pull/1725#issuecomment-738024630 @pvary @nrg4878 Can you please take a look at this one too? I am doing quite a bit of work within this class. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519672) Time Spent: 40m (was: 0.5h) > Refactor Get Next Event ID for DbNotificationListener > - > > Key: HIVE-24460 > URL: https://issues.apache.org/jira/browse/HIVE-24460 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Refactor event ID generation to match notification log ID generation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24460) Refactor Get Next Event ID for DbNotificationListener
[ https://issues.apache.org/jira/browse/HIVE-24460?focusedWorklogId=519674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519674 ] ASF GitHub Bot logged work on HIVE-24460: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:22 Start Date: 03/Dec/20 14:22 Worklog Time Spent: 10m Work Description: belugabehr edited a comment on pull request #1725: URL: https://github.com/apache/hive/pull/1725#issuecomment-738024630 @pvary @nrg4878 Can you please take a look at this one too? I am doing quite a bit of work within this class in this, and other, PRs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519674) Time Spent: 50m (was: 40m) > Refactor Get Next Event ID for DbNotificationListener > - > > Key: HIVE-24460 > URL: https://issues.apache.org/jira/browse/HIVE-24460 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Refactor event ID generation to match notification log ID generation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches
[ https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=519673=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519673 ] ASF GitHub Bot logged work on HIVE-24432: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:22 Start Date: 03/Dec/20 14:22 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #1710: URL: https://github.com/apache/hive/pull/1710 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519673) Time Spent: 1h 10m (was: 1h) > Delete Notification Events in Batches > - > > Key: HIVE-24432 > URL: https://issues.apache.org/jira/browse/HIVE-24432 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Notification events are loaded in batches (reduces memory pressure on the > HMS), but all of the deletes happen under a single transactions and, when > deleting many records, can put a lot of pressure on the backend database. > Instead, delete events in batches (in different transactions) as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519669=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519669 ] ASF GitHub Bot logged work on HIVE-23965: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:20 Start Date: 03/Dec/20 14:20 Worklog Time Spent: 10m Work Description: zabetak commented on a change in pull request #1714: URL: https://github.com/apache/hive/pull/1714#discussion_r535264287 ## File path: standalone-metastore/metastore-server/src/test/resources/sql/postgres/upgrade-3.1.3000-to-4.0.0.postgres.sql ## @@ -0,0 +1,77 @@ +-- The file has some overlapping with upgrade-3.2.0-to-4.0.0.postgres.sql +SELECT 'Upgrading MetaStore schema from 3.1.3000 to 4.0.0'; Review comment: Yes, that's the idea for now; I couldn't think of a better alternative at the moment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519669) Time Spent: 6h 40m (was: 6.5h) > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: master355.tgz > > Time Spent: 6h 40m > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24432) Delete Notification Events in Batches
[ https://issues.apache.org/jira/browse/HIVE-24432?focusedWorklogId=519668=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519668 ] ASF GitHub Bot logged work on HIVE-24432: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:19 Start Date: 03/Dec/20 14:19 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #1710: URL: https://github.com/apache/hive/pull/1710 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519668) Time Spent: 1h (was: 50m) > Delete Notification Events in Batches > - > > Key: HIVE-24432 > URL: https://issues.apache.org/jira/browse/HIVE-24432 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Notification events are loaded in batches (reduces memory pressure on the > HMS), but all of the deletes happen under a single transactions and, when > deleting many records, can put a lot of pressure on the backend database. > Instead, delete events in batches (in different transactions) as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24468) Use Event Time instead of Current Time in Notification Log DB Entry
[ https://issues.apache.org/jira/browse/HIVE-24468?focusedWorklogId=519667=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519667 ] ASF GitHub Bot logged work on HIVE-24468: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:18 Start Date: 03/Dec/20 14:18 Worklog Time Spent: 10m Work Description: belugabehr commented on pull request #1728: URL: https://github.com/apache/hive/pull/1728#issuecomment-738021939 @pvary Thanks for the review. So, the answer is yes. The timestamps could be out of order. If two instances of HMS are running at the same time, and let's say they both create events at times T and T+1. The HMS which generates the event at time T could experience a long GC and then try to submit it to the DB. At that point, the event at T+1 is going to be submitted first to the table, and receive a lower ID. However, there does not seem to be any documentation around this constraint. 1. Is there docs somewhere that state that the event times will always be increasing from one record to the next? 2. Isn't it a bit confusing that they are assigned an arbitrary time that masks that true event time (debugging, audit issues)? 3. The time stamps are generated using each HMS's "now" time, which could possibly not be adequately synced across HMS instances and this issue of in-order timestamps is in jeopardy. If in-order timestamps are a requirement, they should be generated using the `now()` of the SQL server itself as a single source of "now" truth. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519667) Time Spent: 50m (was: 40m) > Use Event Time instead of Current Time in Notification Log DB Entry > --- > > Key: HIVE-24468 > URL: https://issues.apache.org/jira/browse/HIVE-24468 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519665 ] ASF GitHub Bot logged work on HIVE-23965: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:12 Start Date: 03/Dec/20 14:12 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1714: URL: https://github.com/apache/hive/pull/1714#discussion_r535258754 ## File path: standalone-metastore/metastore-server/src/test/resources/sql/postgres/upgrade-3.1.3000-to-4.0.0.postgres.sql ## @@ -0,0 +1,77 @@ +-- The file has some overlapping with upgrade-3.2.0-to-4.0.0.postgres.sql +SELECT 'Upgrading MetaStore schema from 3.1.3000 to 4.0.0'; Review comment: @zabetak qq: will this file be maintained from now on parallel with upgrade-3.2.0-to-4.0.0.postgres.sql ? Is this needed for upstream? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519665) Time Spent: 6.5h (was: 6h 20m) > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: master355.tgz > > Time Spent: 6.5h > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24475) Generalize fixacidkeyindex utility
[ https://issues.apache.org/jira/browse/HIVE-24475?focusedWorklogId=519658=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519658 ] ASF GitHub Bot logged work on HIVE-24475: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:03 Start Date: 03/Dec/20 14:03 Worklog Time Spent: 10m Work Description: pvargacl commented on a change in pull request #1730: URL: https://github.com/apache/hive/pull/1730#discussion_r535251574 ## File path: ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFixAcidKeyIndex.java ## @@ -243,6 +233,12 @@ public void testInvalidKeyIndex() throws Exception { checkInvalidKeyIndex(testFilePath); // Try fixing, this should result in new fixed file. fixInvalidIndex(testFilePath); + +// Multiple stripes +createTestAcidFile(testFilePath, 12000, new FaultyKeyIndexBuilder()); +checkInvalidKeyIndex(testFilePath); +// Try fixing, this should result in new fixed file. +fixInvalidIndex(testFilePath); Review comment: Ah ok, missed that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519658) Time Spent: 40m (was: 0.5h) > Generalize fixacidkeyindex utility > -- > > Key: HIVE-24475 > URL: https://issues.apache.org/jira/browse/HIVE-24475 > Project: Hive > Issue Type: Improvement > Components: ORC, Transactions >Affects Versions: 3.0.0 >Reporter: Antal Sinkovits >Assignee: Antal Sinkovits >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > There is a utility in hive which can validate/fix corrupted > hive.acid.key.index. > hive --service fixacidkeyindex > Unfortunately it is only tailored for a specific problem > (https://issues.apache.org/jira/browse/HIVE-18907), instead of generally > validating and recovering the hive.acid.key.index from the stripe data itself. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519657 ] ASF GitHub Bot logged work on HIVE-23965: - Author: ASF GitHub Bot Created on: 03/Dec/20 14:03 Start Date: 03/Dec/20 14:03 Worklog Time Spent: 10m Work Description: zabetak commented on pull request #1714: URL: https://github.com/apache/hive/pull/1714#issuecomment-738013284 > How do we know that the previous issue doesnt happen again? > I'll run the check on the PR a few more times...just in case @kgyrtkirk The previous issue was not due to flakiness. The schema of the metastore changed between the time that pre-commit tests were run and the time that this PR was merged to master. To avoid a similar situation the PR should be merged ASAP after running the pre-commit with the tip of the master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519657) Time Spent: 6h 20m (was: 6h 10m) > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: master355.tgz > > Time Spent: 6h 20m > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519649 ] ASF GitHub Bot logged work on HIVE-23965: - Author: ASF GitHub Bot Created on: 03/Dec/20 13:42 Start Date: 03/Dec/20 13:42 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #1714: URL: https://github.com/apache/hive/pull/1714#issuecomment-738001343 How do we know that the previous issue doesnt happen again? I'll run the check on the PR a few more times...just in case This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519649) Time Spent: 6h 10m (was: 6h) > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: master355.tgz > > Time Spent: 6h 10m > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=519644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519644 ] ASF GitHub Bot logged work on HIVE-23965: - Author: ASF GitHub Bot Created on: 03/Dec/20 13:39 Start Date: 03/Dec/20 13:39 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on pull request #1714: URL: https://github.com/apache/hive/pull/1714#issuecomment-737999631 cool This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519644) Time Spent: 6h (was: 5h 50m) > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: master355.tgz > > Time Spent: 6h > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24444) compactor.Cleaner should not set state "mark cleaned" if there are obsolete files in the FS
[ https://issues.apache.org/jira/browse/HIVE-2?focusedWorklogId=519643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-519643 ] ASF GitHub Bot logged work on HIVE-2: - Author: ASF GitHub Bot Created on: 03/Dec/20 13:37 Start Date: 03/Dec/20 13:37 Worklog Time Spent: 10m Work Description: klcopp commented on a change in pull request #1716: URL: https://github.com/apache/hive/pull/1716#discussion_r535232912 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java ## @@ -316,6 +314,30 @@ private boolean removeFiles(String location, ValidWriteIdList writeIdList, Compa } fs.delete(dead, true); } -return true; +// Check if there will be more obsolete directories to clean when possible. We will only mark cleaned when this +// number reaches 0. +return getNumEventuallyObsoleteDirs(location, dirSnapshots) == 0; + } + + /** + * Get the number of base/delta directories the Cleaner should remove eventually. If we check this after cleaning + * we can see if the Cleaner has further work to do in this table/partition directory that it hasn't been able to + * finish, e.g. because of an open transaction at the time of compaction. + * We do this by assuming that there are no open transactions anywhere and then calling getAcidState. If there are + * obsolete directories, then the Cleaner has more work to do. + * @param location location of table + * @return number of dirs left for the cleaner to clean – eventually + * @throws IOException + */ + private int getNumEventuallyObsoleteDirs(String location, Map dirSnapshots) + throws IOException { +ValidTxnList validTxnList = new ValidReadTxnList(); +//save it so that getAcidState() sees it +conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.writeToString()); +ValidReaderWriteIdList validWriteIdList = new ValidReaderWriteIdList(); +Path locPath = new Path(location); +AcidUtils.Directory dir = AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, validWriteIdList, +Ref.from(false), false, dirSnapshots); +return dir.getObsolete().size(); Review comment: No, this isn't necessary upstream, this change is for versions without HIVE-23107 etc. But I don't want to hurt upstream functionality with it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 519643) Time Spent: 7h (was: 6h 50m) > compactor.Cleaner should not set state "mark cleaned" if there are obsolete > files in the FS > --- > > Key: HIVE-2 > URL: https://issues.apache.org/jira/browse/HIVE-2 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Major > Labels: pull-request-available > Time Spent: 7h > Remaining Estimate: 0h > > This is an improvement on HIVE-24314, in which markCleaned() is called only > if +any+ files are deleted by the cleaner. This could cause a problem in the > following case: > Say for table_1 compaction1 cleaning was blocked by an open txn, and > compaction is run again on the same table (compaction2). Both compaction1 and > compaction2 could be in "ready for cleaning" at the same time. By this time > the blocking open txn could be committed. When the cleaner runs, one of > compaction1 and compaction2 will remain in the "ready for cleaning" state: > Say compaction2 is picked up by the cleaner first. The Cleaner deletes all > obsolete files. Then compaction1 is picked up by the cleaner; the cleaner > doesn't remove any files and compaction1 will stay in the queue in a "ready > for cleaning" state. > HIVE-24291 already solves this issue but if it isn't usable (for example if > HMS schema changes are out the question) then HIVE-24314 + this change will > fix the issue of the Cleaner not removing all obsolete files. -- This message was sent by Atlassian Jira (v8.3.4#803005)