[jira] [Updated] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore
[ https://issues.apache.org/jira/browse/HIVE-24032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-24032: --- Attachment: HIVE-24032.03.patch Status: Patch Available (was: In Progress) > Remove hadoop shims dependency and use FileSystem Api directly from > standalone metastore > > > Key: HIVE-24032 > URL: https://issues.apache.org/jira/browse/HIVE-24032 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24032.01.patch, HIVE-24032.02.patch, > HIVE-24032.03.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore
[ https://issues.apache.org/jira/browse/HIVE-24032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-24032: --- Status: In Progress (was: Patch Available) > Remove hadoop shims dependency and use FileSystem Api directly from > standalone metastore > > > Key: HIVE-24032 > URL: https://issues.apache.org/jira/browse/HIVE-24032 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24032.01.patch, HIVE-24032.02.patch, > HIVE-24032.03.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23539) Optimize data copy during repl load operation for HDFS based staging location
[ https://issues.apache.org/jira/browse/HIVE-23539?focusedWorklogId=473732=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473732 ] ASF GitHub Bot logged work on HIVE-23539: - Author: ASF GitHub Bot Created on: 24/Aug/20 04:24 Start Date: 24/Aug/20 04:24 Worklog Time Spent: 10m Work Description: pkumarsinha closed pull request #1084: URL: https://github.com/apache/hive/pull/1084 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473732) Time Spent: 2h 50m (was: 2h 40m) > Optimize data copy during repl load operation for HDFS based staging location > - > > Key: HIVE-23539 > URL: https://issues.apache.org/jira/browse/HIVE-23539 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23539.01.patch > > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23539) Optimize data copy during repl load operation for HDFS based staging location
[ https://issues.apache.org/jira/browse/HIVE-23539?focusedWorklogId=473715=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473715 ] ASF GitHub Bot logged work on HIVE-23539: - Author: ASF GitHub Bot Created on: 24/Aug/20 00:41 Start Date: 24/Aug/20 00:41 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #1084: URL: https://github.com/apache/hive/pull/1084#issuecomment-678849350 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473715) Time Spent: 2h 40m (was: 2.5h) > Optimize data copy during repl load operation for HDFS based staging location > - > > Key: HIVE-23539 > URL: https://issues.apache.org/jira/browse/HIVE-23539 > Project: Hive > Issue Type: Improvement >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23539.01.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23926) Flaky test TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion
[ https://issues.apache.org/jira/browse/HIVE-23926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182872#comment-17182872 ] Arko Sharma commented on HIVE-23926: Running this test on the hive-flaky-check framework did not give any errors([link|http://ci.hive.apache.org/job/hive-flaky-check/88/]). Hence this test has been re-enabled. > Flaky test > TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion > > > Key: HIVE-23926 > URL: https://issues.apache.org/jira/browse/HIVE-23926 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23926.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-precommit/job/master/123/testReport/org.apache.hadoop.hive.ql.parse/TestTableLevelReplicationScenarios/Testing___split_18___Archive___testRenameTableScenariosWithReplacePolicyDMLOperattion/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23926) Flaky test TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion
[ https://issues.apache.org/jira/browse/HIVE-23926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arko Sharma updated HIVE-23926: --- Attachment: HIVE-23926.01.patch Status: Patch Available (was: Open) > Flaky test > TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion > > > Key: HIVE-23926 > URL: https://issues.apache.org/jira/browse/HIVE-23926 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23926.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-precommit/job/master/123/testReport/org.apache.hadoop.hive.ql.parse/TestTableLevelReplicationScenarios/Testing___split_18___Archive___testRenameTableScenariosWithReplacePolicyDMLOperattion/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23926) Flaky test TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion
[ https://issues.apache.org/jira/browse/HIVE-23926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23926: -- Labels: pull-request-available (was: ) > Flaky test > TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion > > > Key: HIVE-23926 > URL: https://issues.apache.org/jira/browse/HIVE-23926 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Arko Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-precommit/job/master/123/testReport/org.apache.hadoop.hive.ql.parse/TestTableLevelReplicationScenarios/Testing___split_18___Archive___testRenameTableScenariosWithReplacePolicyDMLOperattion/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23926) Flaky test TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion
[ https://issues.apache.org/jira/browse/HIVE-23926?focusedWorklogId=473709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473709 ] ASF GitHub Bot logged work on HIVE-23926: - Author: ASF GitHub Bot Created on: 23/Aug/20 23:07 Start Date: 23/Aug/20 23:07 Worklog Time Spent: 10m Work Description: ArkoSharma opened a new pull request #1420: URL: https://github.com/apache/hive/pull/1420 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473709) Remaining Estimate: 0h Time Spent: 10m > Flaky test > TestTableLevelReplicationScenarios.testRenameTableScenariosWithReplacePolicyDMLOperattion > > > Key: HIVE-23926 > URL: https://issues.apache.org/jira/browse/HIVE-23926 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Arko Sharma >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > http://ci.hive.apache.org/job/hive-precommit/job/master/123/testReport/org.apache.hadoop.hive.ql.parse/TestTableLevelReplicationScenarios/Testing___split_18___Archive___testRenameTableScenariosWithReplacePolicyDMLOperattion/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22782) Consolidate metastore call to fetch constraints
[ https://issues.apache.org/jira/browse/HIVE-22782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182783#comment-17182783 ] Ashish Sharma commented on HIVE-22782: -- [~vgarg] I have raised a PR with changes and added some test cases. Please have a look on it. Mean while i am thinking what all more test case I can add in it. > Consolidate metastore call to fetch constraints > --- > > Key: HIVE-22782 > URL: https://issues.apache.org/jira/browse/HIVE-22782 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently separate calls are made to metastore to fetch constraints like Pk, > fk, not null etc. Since planner always retrieve these constraints we should > retrieve all of them in one call. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24059) Llap external client - Initial changes for running in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182746#comment-17182746 ] Shubham Chaurasia commented on HIVE-24059: -- [~prasanth_j] [~jdere] Can you please review ? > Llap external client - Initial changes for running in cloud environment > --- > > Key: HIVE-24059 > URL: https://issues.apache.org/jira/browse/HIVE-24059 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Please see problem description in > https://issues.apache.org/jira/browse/HIVE-24058 > Initial changes include - > 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) > side. > 2. Opening additional RPC port in LLAP Daemon. > 3. JWT Based authentication on this port. > cc [~prasanth_j] [~jdere] [~anishek] [~thejas] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24059) Llap external client - Initial changes for running in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-24059: - Description: Please see problem description in https://issues.apache.org/jira/browse/HIVE-24058 Initial changes include - 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) side. 2. Opening additional RPC port in LLAP Daemon. 3. JWT Based authentication on this port. cc [~prasanth_j] [~jdere] [~anishek] [~thejas] was: Please see problem description in https://issues.apache.org/jira/browse/HIVE-24058 Initial changes include - 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) side. 2. Opening additional RPC port in LLAP Daemon. 3. JWT Based authentication on this port. > Llap external client - Initial changes for running in cloud environment > --- > > Key: HIVE-24059 > URL: https://issues.apache.org/jira/browse/HIVE-24059 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Please see problem description in > https://issues.apache.org/jira/browse/HIVE-24058 > Initial changes include - > 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) > side. > 2. Opening additional RPC port in LLAP Daemon. > 3. JWT Based authentication on this port. > cc [~prasanth_j] [~jdere] [~anishek] [~thejas] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24059) Llap external client - Initial changes for running in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182744#comment-17182744 ] Shubham Chaurasia commented on HIVE-24059: -- This patch uses two env variables - {{IS_CLOUD_DEPLOYMENT}} - if we HS2 and LLAP are running in cloud env. {{PUBLIC_HOSTNAME}} - public hostname which can be reached from outside cloud. Both these variables need to be set on HS2 and LLAP machines for this patch to work correctly. > Llap external client - Initial changes for running in cloud environment > --- > > Key: HIVE-24059 > URL: https://issues.apache.org/jira/browse/HIVE-24059 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Please see problem description in > https://issues.apache.org/jira/browse/HIVE-24058 > Initial changes include - > 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) > side. > 2. Opening additional RPC port in LLAP Daemon. > 3. JWT Based authentication on this port. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24020) Automatic Compaction not working in existing partitions for Streaming Ingest with Dynamic Partition
[ https://issues.apache.org/jira/browse/HIVE-24020?focusedWorklogId=473644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473644 ] ASF GitHub Bot logged work on HIVE-24020: - Author: ASF GitHub Bot Created on: 23/Aug/20 13:22 Start Date: 23/Aug/20 13:22 Worklog Time Spent: 10m Work Description: vpnvishv commented on a change in pull request #1382: URL: https://github.com/apache/hive/pull/1382#discussion_r475219031 ## File path: streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java ## @@ -581,16 +582,9 @@ protected RecordUpdater getRecordUpdater(List partitionValues, int bucke destLocation = new Path(table.getSd().getLocation()); } else { PartitionInfo partitionInfo = conn.createPartitionIfNotExists(partitionValues); - // collect the newly added partitions. connection.commitTransaction() will report the dynamically added - // partitions to TxnHandler - if (!partitionInfo.isExists()) { -addedPartitions.add(partitionInfo.getName()); - } else { -if (LOG.isDebugEnabled()) { - LOG.debug("Partition {} already exists for table {}", - partitionInfo.getName(), fullyQualifiedTableName); -} - } + // collect the newly added/updated partitions. connection.commitTransaction() will report the dynamically + // added partitions to TxnHandler + addedPartitions.add(partitionInfo.getName()); Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473644) Time Spent: 40m (was: 0.5h) > Automatic Compaction not working in existing partitions for Streaming Ingest > with Dynamic Partition > --- > > Key: HIVE-24020 > URL: https://issues.apache.org/jira/browse/HIVE-24020 > Project: Hive > Issue Type: Bug > Components: Streaming, Transactions >Affects Versions: 4.0.0, 3.1.2 >Reporter: Vipin Vishvkarma >Assignee: Vipin Vishvkarma >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > This issue happens when we try to do streaming ingest with dynamic partition > on already existing partitions. I checked in the code, we have following > check in the AbstractRecordWriter. > > {code:java} > PartitionInfo partitionInfo = > conn.createPartitionIfNotExists(partitionValues); > // collect the newly added partitions. connection.commitTransaction() will > report the dynamically added > // partitions to TxnHandler > if (!partitionInfo.isExists()) { > addedPartitions.add(partitionInfo.getName()); > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("Partition {} already exists for table {}", > partitionInfo.getName(), fullyQualifiedTableName); > } > } > {code} > Above *addedPartitions* is passed to *addDynamicPartitions* during > TransactionBatch commit. So in case of already existing partitions, > *addedPartitions* will be empty and *addDynamicPartitions* **will not move > entries from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. This results in > Initiator not able to trigger auto compaction. > Another issue which has been observed is, we are not clearing > *addedPartitions* on writer close, which results in information flowing > across transactions. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22782) Consolidate metastore call to fetch constraints
[ https://issues.apache.org/jira/browse/HIVE-22782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-22782: -- Labels: pull-request-available (was: ) > Consolidate metastore call to fetch constraints > --- > > Key: HIVE-22782 > URL: https://issues.apache.org/jira/browse/HIVE-22782 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Ashish Sharma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently separate calls are made to metastore to fetch constraints like Pk, > fk, not null etc. Since planner always retrieve these constraints we should > retrieve all of them in one call. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-22782) Consolidate metastore call to fetch constraints
[ https://issues.apache.org/jira/browse/HIVE-22782?focusedWorklogId=473636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473636 ] ASF GitHub Bot logged work on HIVE-22782: - Author: ASF GitHub Bot Created on: 23/Aug/20 11:23 Start Date: 23/Aug/20 11:23 Worklog Time Spent: 10m Work Description: ashish-kumar-sharma opened a new pull request #1419: URL: https://github.com/apache/hive/pull/1419 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473636) Remaining Estimate: 0h Time Spent: 10m > Consolidate metastore call to fetch constraints > --- > > Key: HIVE-22782 > URL: https://issues.apache.org/jira/browse/HIVE-22782 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Ashish Sharma >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently separate calls are made to metastore to fetch constraints like Pk, > fk, not null etc. Since planner always retrieve these constraints we should > retrieve all of them in one call. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24059) Llap external client - Initial changes for running in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24059?focusedWorklogId=473627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473627 ] ASF GitHub Bot logged work on HIVE-24059: - Author: ASF GitHub Bot Created on: 23/Aug/20 09:34 Start Date: 23/Aug/20 09:34 Worklog Time Spent: 10m Work Description: ShubhamChaurasia opened a new pull request #1418: URL: https://github.com/apache/hive/pull/1418 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473627) Remaining Estimate: 0h Time Spent: 10m > Llap external client - Initial changes for running in cloud environment > --- > > Key: HIVE-24059 > URL: https://issues.apache.org/jira/browse/HIVE-24059 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Please see problem description in > https://issues.apache.org/jira/browse/HIVE-24058 > Initial changes include - > 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) > side. > 2. Opening additional RPC port in LLAP Daemon. > 3. JWT Based authentication on this port. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24059) Llap external client - Initial changes for running in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24059: -- Labels: pull-request-available (was: ) > Llap external client - Initial changes for running in cloud environment > --- > > Key: HIVE-24059 > URL: https://issues.apache.org/jira/browse/HIVE-24059 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Please see problem description in > https://issues.apache.org/jira/browse/HIVE-24058 > Initial changes include - > 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) > side. > 2. Opening additional RPC port in LLAP Daemon. > 3. JWT Based authentication on this port. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24059) Llap external client - Initial changes for running in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-24059: > Llap external client - Initial changes for running in cloud environment > --- > > Key: HIVE-24059 > URL: https://issues.apache.org/jira/browse/HIVE-24059 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Please see problem description in > https://issues.apache.org/jira/browse/HIVE-24058 > Initial changes include - > 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) > side. > 2. Opening additional RPC port in LLAP Daemon. > 3. JWT Based authentication on this port. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24058) Llap external client - Enhancements for running in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-24058: > Llap external client - Enhancements for running in cloud environment > > > Key: HIVE-24058 > URL: https://issues.apache.org/jira/browse/HIVE-24058 > Project: Hive > Issue Type: Task > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > When we query using llap external client library, following happens currently > - > 1. We first need to get splits using > [LlapBaseInputFormat#getSplits()|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L226], > this just needs Hive server JDBC url. > 2. We then submit those splits to llap and obtain record reader to read data > using > [LlapBaseInputFormat#getRecordReader()|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L140]. > In this step we need following at client side - > - {{hive.zookeeper.quorum}} > -{{hive.llap.daemon.service.hosts}} > We need to connect to zk to discover llap daemons. > 3. Record reader so obtained needs to [initiate a TCP connection from client > to LLAP Daemon to submit the > split|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L185]. > 4. It also needs to [initiate another TCP connection from client to output > format port in LLAP Daemon to read the > data|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L201]. > In cloud based deployments, we may not be able to make direct connections to > Zk registry and LLAP daemons from client as it might run outside vpc. > For 2, we can move daemon discovery logic to get_splits UDF itself which will > run in HS2. > For scenarios like 3 and 4, we can expose additional ports on LLAP with > additional auth mechanism. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24040) Slightly odd behaviour with CHAR comparisons and string literals
[ https://issues.apache.org/jira/browse/HIVE-24040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182633#comment-17182633 ] Peter Vary commented on HIVE-24040: --- [~kuczoram]: You might want to know about this, as I remember you were working to have a standard way to handle this through Parquet and ORC. > Slightly odd behaviour with CHAR comparisons and string literals > > > Key: HIVE-24040 > URL: https://issues.apache.org/jira/browse/HIVE-24040 > Project: Hive > Issue Type: Bug >Reporter: Tim Armstrong >Priority: Major > > If t is a char column, this statement behaves a bit strangely - since the RHS > is a STRING, I would have expected it to behave consistently with other > CHAR/STRING comparisons, where the CHAR column has its trailing spaces > removed and the STRING does not have its trailing spaces removed. > {noformat} > select count(*) from ax where t = cast('a ' as string); > {noformat} > Instead it seems to be treated the same as if it was a plain literal, > interpreted as CHAR, i.e. > {noformat} > select count(*) from ax where t = 'a '; > {noformat} > Here are some more experiments I did based on > https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/in_typecheck_char.q > that seem to show some inconsistencies. > {noformat} > -- Hive version 3.1.3000.7.2.1.0-287 r4e72e59f1c2a51a64e0ff37b14bd396cd4e97b98 > create table ax(s char(1),t char(10)); > insert into ax values ('a','a'),('a','a '),('b','bb'); > -- varchar literal preserves trailing space > select count(*) from ax where t = cast('a ' as varchar(50)); > +--+ > | _c0 | > +--+ > | 0| > +--+ > -- explicit cast of literal to string removes trailing space > select count(*) from ax where t = cast('a ' as string); > +--+ > | _c0 | > +--+ > | 2| > +--+ > -- other string expressions preserve trailing space > select count(*) from ax where t = concat('a', ' '); > +--+ > | _c0 | > +--+ > | 0| > +--+ > -- varchar col preserves trailing space > create table stringv as select cast('a ' as varchar(50)); > select count(*) from ax, stringv where t = `_c0`; > +--+ > | _c0 | > +--+ > | 0| > +--+ > -- string col preserves trailing space > create table stringa as select 'a '; > select count(*) from ax, stringa where t = `_c0`; > +--+ > | _c0 | > +--+ > | 0| > +--+ > {noformat} > [~jcamachorodriguez] [~kgyrtkirk] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore
[ https://issues.apache.org/jira/browse/HIVE-24032?focusedWorklogId=473626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473626 ] ASF GitHub Bot logged work on HIVE-24032: - Author: ASF GitHub Bot Created on: 23/Aug/20 08:34 Start Date: 23/Aug/20 08:34 Worklog Time Spent: 10m Work Description: pvary commented on pull request #1396: URL: https://github.com/apache/hive/pull/1396#issuecomment-678746263 Why is this a good thing? AFAIK we introduced the Shims to be able to abstract out Hadoop dependencies and being able to work with different versions of Hadoop altogether. Removing shims will again fix as to a specific Hadoop version. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473626) Time Spent: 1h 20m (was: 1h 10m) > Remove hadoop shims dependency and use FileSystem Api directly from > standalone metastore > > > Key: HIVE-24032 > URL: https://issues.apache.org/jira/browse/HIVE-24032 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24032.01.patch, HIVE-24032.02.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24020) Automatic Compaction not working in existing partitions for Streaming Ingest with Dynamic Partition
[ https://issues.apache.org/jira/browse/HIVE-24020?focusedWorklogId=473617=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473617 ] ASF GitHub Bot logged work on HIVE-24020: - Author: ASF GitHub Bot Created on: 23/Aug/20 08:19 Start Date: 23/Aug/20 08:19 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1382: URL: https://github.com/apache/hive/pull/1382#discussion_r475189153 ## File path: streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java ## @@ -581,16 +582,9 @@ protected RecordUpdater getRecordUpdater(List partitionValues, int bucke destLocation = new Path(table.getSd().getLocation()); } else { PartitionInfo partitionInfo = conn.createPartitionIfNotExists(partitionValues); - // collect the newly added partitions. connection.commitTransaction() will report the dynamically added - // partitions to TxnHandler - if (!partitionInfo.isExists()) { -addedPartitions.add(partitionInfo.getName()); - } else { -if (LOG.isDebugEnabled()) { - LOG.debug("Partition {} already exists for table {}", - partitionInfo.getName(), fullyQualifiedTableName); -} - } + // collect the newly added/updated partitions. connection.commitTransaction() will report the dynamically + // added partitions to TxnHandler + addedPartitions.add(partitionInfo.getName()); Review comment: Can we please rename the 'addedPartitions' to something like more resembling the actual usage, like 'updatedPartitions', or 'changedPartitions', or whatever? Maybe a comment on the attribute/and getter should be good as well. Otherwise LGTM. Thanks, Peter This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473617) Time Spent: 0.5h (was: 20m) > Automatic Compaction not working in existing partitions for Streaming Ingest > with Dynamic Partition > --- > > Key: HIVE-24020 > URL: https://issues.apache.org/jira/browse/HIVE-24020 > Project: Hive > Issue Type: Bug > Components: Streaming, Transactions >Affects Versions: 4.0.0, 3.1.2 >Reporter: Vipin Vishvkarma >Assignee: Vipin Vishvkarma >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > This issue happens when we try to do streaming ingest with dynamic partition > on already existing partitions. I checked in the code, we have following > check in the AbstractRecordWriter. > > {code:java} > PartitionInfo partitionInfo = > conn.createPartitionIfNotExists(partitionValues); > // collect the newly added partitions. connection.commitTransaction() will > report the dynamically added > // partitions to TxnHandler > if (!partitionInfo.isExists()) { > addedPartitions.add(partitionInfo.getName()); > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("Partition {} already exists for table {}", > partitionInfo.getName(), fullyQualifiedTableName); > } > } > {code} > Above *addedPartitions* is passed to *addDynamicPartitions* during > TransactionBatch commit. So in case of already existing partitions, > *addedPartitions* will be empty and *addDynamicPartitions* **will not move > entries from TXN_COMPONENTS to COMPLETED_TXN_COMPONENTS. This results in > Initiator not able to trigger auto compaction. > Another issue which has been observed is, we are not clearing > *addedPartitions* on writer close, which results in information flowing > across transactions. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24000) Put exclusive MERGE INSERT under the feature flag
[ https://issues.apache.org/jira/browse/HIVE-24000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182622#comment-17182622 ] Peter Vary commented on HIVE-24000: --- [~dkuzmenko]: I am not against the change, but next time please wait for a +1 from a committer as per [https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-ReviewProcess] {quote} Committers: for non-trivial changes, it is best to get another committer to review your patches before commit. Use the Submit Patch link like other contributors, and then wait for a "+1" from another committer before committing. Please also try to frequently review things in the patch queue. {quote} > Put exclusive MERGE INSERT under the feature flag > - > > Key: HIVE-24000 > URL: https://issues.apache.org/jira/browse/HIVE-24000 > Project: Hive > Issue Type: Bug >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization
[ https://issues.apache.org/jira/browse/HIVE-23890?focusedWorklogId=473610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473610 ] ASF GitHub Bot logged work on HIVE-23890: - Author: ASF GitHub Bot Created on: 23/Aug/20 06:27 Start Date: 23/Aug/20 06:27 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1330: URL: https://github.com/apache/hive/pull/1330#discussion_r475177713 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java ## @@ -5685,6 +5706,67 @@ private void alter_table_core(String catName, String dbname, String name, Table } } +@Override +public GetFileListResponse get_file_list(GetFileListRequest req) throws MetaException { + String catName = req.isSetCatName() ? req.getCatName() : getDefaultCatalog(conf); + String dbName = req.getDbName(); + String tblName = req.getTableName(); + List partitions = req.getPartVals(); + // Will be used later, when cache is introduced + String validWriteIdList = req.getValidWriteIdList(); + + startFunction("get_file_list", ": " + TableName.getQualified(catName, dbName, tblName) + + ", partitions: " + partitions.toString()); + + + GetFileListResponse response = new GetFileListResponse(); + + boolean success = false; + Exception ex = null; + try { +Partition p = getMS().getPartition(catName, dbName, tblName, partitions); +Path path = new Path(p.getSd().getLocation()); + +FileSystem fs = path.getFileSystem(conf); +RemoteIterator itr = fs.listFiles(path, true); +while (itr.hasNext()) { + FileStatus fStatus = itr.next(); + Reader reader = OrcFile.createReader(fStatus.getPath(), OrcFile.readerOptions(fs.getConf())); + boolean isRawFormat = !CollectionUtils.isEqualCollection(reader.getSchema().getFieldNames(), ALL_ACID_ROW_NAMES); + int fileFormat = isRawFormat ? 0 : 2; Review comment: Please use enum on the Java side, but on thrift side stick to strings/ints whatever. Again, when upgrading HMS API enum in the thrift can cause issues This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473610) Time Spent: 2h 40m (was: 2.5h) > Create HMS endpoint for querying file lists using FlatBuffers as serialization > -- > > Key: HIVE-23890 > URL: https://issues.apache.org/jira/browse/HIVE-23890 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Barnabas Maidics >Assignee: Barnabas Maidics >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > New thrift objects would be: > {code:java} > struct GetFileListRequest { > 1: optional string catName, > 2: required string dbName, > 3: required string tableName, > 4: required list partVals, > 6: optional string validWriteIdList > } > struct GetFileListResponse { > 1: required binary fileListData > } > {code} > Where GetFileListResponse contains a binary field, which would be a > FlatBuffer object -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization
[ https://issues.apache.org/jira/browse/HIVE-23890?focusedWorklogId=473609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-473609 ] ASF GitHub Bot logged work on HIVE-23890: - Author: ASF GitHub Bot Created on: 23/Aug/20 06:25 Start Date: 23/Aug/20 06:25 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #1330: URL: https://github.com/apache/hive/pull/1330#discussion_r475177465 ## File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift ## @@ -1861,6 +1861,19 @@ struct ScheduledQueryProgressInfo{ 4: optional string errorMessage, } +struct GetFileListRequest { + 1: optional string catName, + 2: optional string dbName, + 3: optional string tableName, + 4: optional list partVals, + 6: optional string validWriteIdList +} + +struct GetFileListResponse { + 1: optional list fileListData, + 2: optional i32 fbVersionNumber Review comment: AFAIK we try to keep every field optional in the HMS APIs so we will be able to change them if needed. I specifically asked Barna to do this. Please Vihang correct me if I was mistaken. Thanks, Peter This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 473609) Time Spent: 2.5h (was: 2h 20m) > Create HMS endpoint for querying file lists using FlatBuffers as serialization > -- > > Key: HIVE-23890 > URL: https://issues.apache.org/jira/browse/HIVE-23890 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Barnabas Maidics >Assignee: Barnabas Maidics >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > New thrift objects would be: > {code:java} > struct GetFileListRequest { > 1: optional string catName, > 2: required string dbName, > 3: required string tableName, > 4: required list partVals, > 6: optional string validWriteIdList > } > struct GetFileListResponse { > 1: required binary fileListData > } > {code} > Where GetFileListResponse contains a binary field, which would be a > FlatBuffer object -- This message was sent by Atlassian Jira (v8.3.4#803005)