[jira] [Updated] (HIVE-25178) Reduce number of getPartition calls during loadDynamicPartitions
[ https://issues.apache.org/jira/browse/HIVE-25178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-25178: Labels: performance (was: ) > Reduce number of getPartition calls during loadDynamicPartitions > > > Key: HIVE-25178 > URL: https://issues.apache.org/jira/browse/HIVE-25178 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Rajesh Balamohan >Priority: Major > Labels: performance > > When dynamic partitions are loaded, Hive::loadDynamicPartition loads all > partitions from HMS causing heavy load on it. This becomes worse when large > number of partitions are present in tables. > Only relevant partitions being loaded in dynamic partitions can be queried > from HMS for partition existence. > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2958] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25176) Print DAG ID to Console
[ https://issues.apache.org/jira/browse/HIVE-25176?focusedWorklogId=603700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-603700 ] ASF GitHub Bot logged work on HIVE-25176: - Author: ASF GitHub Bot Created on: 28/May/21 20:44 Start Date: 28/May/21 20:44 Worklog Time Spent: 10m Work Description: abstractdog commented on pull request #2328: URL: https://github.com/apache/hive/pull/2328#issuecomment-850660916 I would love to see this happening. This clearly saves +1 step which is finding the dag id in hs2 logs for a given hive query id (what i literally do all the time while troubleshooting) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 603700) Time Spent: 40m (was: 0.5h) > Print DAG ID to Console > --- > > Key: HIVE-25176 > URL: https://issues.apache.org/jira/browse/HIVE-25176 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Would be helpful when troubleshooting. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25176) Print DAG ID to Console
[ https://issues.apache.org/jira/browse/HIVE-25176?focusedWorklogId=603692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-603692 ] ASF GitHub Bot logged work on HIVE-25176: - Author: ASF GitHub Bot Created on: 28/May/21 20:17 Start Date: 28/May/21 20:17 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #2328: URL: https://github.com/apache/hive/pull/2328 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 603692) Time Spent: 0.5h (was: 20m) > Print DAG ID to Console > --- > > Key: HIVE-25176 > URL: https://issues.apache.org/jira/browse/HIVE-25176 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Would be helpful when troubleshooting. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25176) Print DAG ID to Console
[ https://issues.apache.org/jira/browse/HIVE-25176?focusedWorklogId=603688=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-603688 ] ASF GitHub Bot logged work on HIVE-25176: - Author: ASF GitHub Bot Created on: 28/May/21 20:15 Start Date: 28/May/21 20:15 Worklog Time Spent: 10m Work Description: belugabehr closed pull request #2328: URL: https://github.com/apache/hive/pull/2328 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 603688) Time Spent: 20m (was: 10m) > Print DAG ID to Console > --- > > Key: HIVE-25176 > URL: https://issues.apache.org/jira/browse/HIVE-25176 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Would be helpful when troubleshooting. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25177) Add Additional Debugging Help for HBase Reader
[ https://issues.apache.org/jira/browse/HIVE-25177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25177: -- Labels: pull-request-available (was: ) > Add Additional Debugging Help for HBase Reader > -- > > Key: HIVE-25177 > URL: https://issues.apache.org/jira/browse/HIVE-25177 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I recently was wishing I had this data available to me. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25177) Add Additional Debugging Help for HBase Reader
[ https://issues.apache.org/jira/browse/HIVE-25177?focusedWorklogId=603633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-603633 ] ASF GitHub Bot logged work on HIVE-25177: - Author: ASF GitHub Bot Created on: 28/May/21 17:28 Start Date: 28/May/21 17:28 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #2329: URL: https://github.com/apache/hive/pull/2329 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 603633) Remaining Estimate: 0h Time Spent: 10m > Add Additional Debugging Help for HBase Reader > -- > > Key: HIVE-25177 > URL: https://issues.apache.org/jira/browse/HIVE-25177 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > I recently was wishing I had this data available to me. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25177) Add Additional Debugging Help for HBase Reader
[ https://issues.apache.org/jira/browse/HIVE-25177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HIVE-25177: -- Description: I recently was wishing I had this data available to me. > Add Additional Debugging Help for HBase Reader > -- > > Key: HIVE-25177 > URL: https://issues.apache.org/jira/browse/HIVE-25177 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > > I recently was wishing I had this data available to me. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25177) Add Additional Debugging Help for HBase Reader
[ https://issues.apache.org/jira/browse/HIVE-25177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor reassigned HIVE-25177: - > Add Additional Debugging Help for HBase Reader > -- > > Key: HIVE-25177 > URL: https://issues.apache.org/jira/browse/HIVE-25177 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.
[ https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=603611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-603611 ] ASF GitHub Bot logged work on HIVE-25154: - Author: ASF GitHub Bot Created on: 28/May/21 16:28 Start Date: 28/May/21 16:28 Worklog Time Spent: 10m Work Description: hmangla98 commented on a change in pull request #2311: URL: https://github.com/apache/hive/pull/2311#discussion_r641674237 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java ## @@ -162,6 +164,10 @@ public void run() { setupMsckPathInvalidation(); Configuration msckConf = Msck.getMsckConf(conf); for (Table table : candidateTables) { + if (MetaStoreUtils.isDbBeingFailedOver(msc.getDatabase(table.getCatName(), table.getDbName( { Review comment: Might not be doable. Say somehow we cached db failover property and after this repl.failover.enbaled prop is set to true for that db. Now, how will we figure out that cached data is outdated and needs to be fetched again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 603611) Time Spent: 50m (was: 40m) > Disable StatsUpdaterThread and PartitionManagementTask for db that is being > failoved over. > -- > > Key: HIVE-25154 > URL: https://issues.apache.org/jira/browse/HIVE-25154 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.
[ https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=603609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-603609 ] ASF GitHub Bot logged work on HIVE-25154: - Author: ASF GitHub Bot Created on: 28/May/21 16:25 Start Date: 28/May/21 16:25 Worklog Time Spent: 10m Work Description: hmangla98 commented on a change in pull request #2311: URL: https://github.com/apache/hive/pull/2311#discussion_r641672545 ## File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestPartitionManagement.java ## @@ -620,6 +620,59 @@ public void testNoPartitionDiscoveryForReplTable() throws Exception { assertEquals(3, partitions.size()); } + @Test + public void testNoPartitionDiscoveryForFailoverDb() throws Exception { +String dbName = "db_failover"; +String tableName = "tbl_failover"; +Map colMap = buildAllColumns(); +List partKeys = Lists.newArrayList("state", "dt"); +List partKeyTypes = Lists.newArrayList("string", "date"); +List> partVals = Lists.newArrayList( +Lists.newArrayList("__HIVE_DEFAULT_PARTITION__", "1990-01-01"), +Lists.newArrayList("CA", "1986-04-28"), +Lists.newArrayList("MN", "2018-11-31")); +createMetadata(DEFAULT_CATALOG_NAME, dbName, tableName, partKeys, partKeyTypes, partVals, colMap, false); +Table table = client.getTable(dbName, tableName); +List partitions = client.listPartitions(dbName, tableName, (short) -1); +assertEquals(3, partitions.size()); +String tableLocation = table.getSd().getLocation(); +URI location = URI.create(tableLocation); +Path tablePath = new Path(location); +FileSystem fs = FileSystem.get(location, conf); +Path newPart1 = new Path(tablePath, "state=WA/dt=2018-12-01"); +Path newPart2 = new Path(tablePath, "state=UT/dt=2018-12-02"); +fs.mkdirs(newPart1); +fs.mkdirs(newPart2); +assertEquals(5, fs.listStatus(tablePath).length); +partitions = client.listPartitions(dbName, tableName, (short) -1); +assertEquals(3, partitions.size()); + +// table property is set to true, but the table is marked as replication target. The new +// partitions should not be created + table.getParameters().put(PartitionManagementTask.DISCOVER_PARTITIONS_TBLPROPERTY, "true"); +Database db = client.getDatabase(table.getDbName()); +db.putToParameters(ReplConst.REPL_FAILOVER_ENABLED, "true"); +client.alterDatabase(table.getDbName(), db); +client.alter_table(dbName, tableName, table); Review comment: Alter table will enable discover partition property for the table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 603609) Time Spent: 40m (was: 0.5h) > Disable StatsUpdaterThread and PartitionManagementTask for db that is being > failoved over. > -- > > Key: HIVE-25154 > URL: https://issues.apache.org/jira/browse/HIVE-25154 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.
[ https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=603605=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-603605 ] ASF GitHub Bot logged work on HIVE-25154: - Author: ASF GitHub Bot Created on: 28/May/21 16:19 Start Date: 28/May/21 16:19 Worklog Time Spent: 10m Work Description: hmangla98 commented on a change in pull request #2311: URL: https://github.com/apache/hive/pull/2311#discussion_r641668822 ## File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java ## @@ -228,6 +230,15 @@ public static boolean isExternalTable(Table table) { return isExternal(params); } + public static boolean isDbBeingFailedOver(Database db) { +assert (db != null); +Map dbParameters = db.getParameters(); +if ((dbParameters != null) && (dbParameters.containsKey(ReplConst.REPL_FAILOVER_ENABLED))) { + return !StringUtils.isEmpty(dbParameters.get(ReplConst.REPL_FAILOVER_ENABLED)); Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 603605) Time Spent: 0.5h (was: 20m) > Disable StatsUpdaterThread and PartitionManagementTask for db that is being > failoved over. > -- > > Key: HIVE-25154 > URL: https://issues.apache.org/jira/browse/HIVE-25154 > Project: Hive > Issue Type: Improvement >Reporter: Haymant Mangla >Assignee: Haymant Mangla >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25102) Cache Iceberg table objects within same query
[ https://issues.apache.org/jira/browse/HIVE-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér resolved HIVE-25102. -- Resolution: Fixed > Cache Iceberg table objects within same query > - > > Key: HIVE-25102 > URL: https://issues.apache.org/jira/browse/HIVE-25102 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 9h > Remaining Estimate: 0h > > We run Catalogs.loadTable(configuration, props) plenty of times which is > costly. > We should: > - Cache it maybe even globally based on the queryId > - Make sure that the query uses one snapshot during the whole execution of a > single query -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25102) Cache Iceberg table objects within same query
[ https://issues.apache.org/jira/browse/HIVE-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353437#comment-17353437 ] László Pintér commented on HIVE-25102: -- Merged into master. Thanks, [~Marton Bod] and [~pvary] for the review! > Cache Iceberg table objects within same query > - > > Key: HIVE-25102 > URL: https://issues.apache.org/jira/browse/HIVE-25102 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 9h > Remaining Estimate: 0h > > We run Catalogs.loadTable(configuration, props) plenty of times which is > costly. > We should: > - Cache it maybe even globally based on the queryId > - Make sure that the query uses one snapshot during the whole execution of a > single query -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25102) Cache Iceberg table objects within same query
[ https://issues.apache.org/jira/browse/HIVE-25102?focusedWorklogId=603596=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-603596 ] ASF GitHub Bot logged work on HIVE-25102: - Author: ASF GitHub Bot Created on: 28/May/21 16:07 Start Date: 28/May/21 16:07 Worklog Time Spent: 10m Work Description: lcspinter merged pull request #2261: URL: https://github.com/apache/hive/pull/2261 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 603596) Time Spent: 9h (was: 8h 50m) > Cache Iceberg table objects within same query > - > > Key: HIVE-25102 > URL: https://issues.apache.org/jira/browse/HIVE-25102 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 9h > Remaining Estimate: 0h > > We run Catalogs.loadTable(configuration, props) plenty of times which is > costly. > We should: > - Cache it maybe even globally based on the queryId > - Make sure that the query uses one snapshot during the whole execution of a > single query -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25154) Disable StatsUpdaterThread and PartitionManagementTask for db that is being failoved over.
[ https://issues.apache.org/jira/browse/HIVE-25154?focusedWorklogId=603584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-603584 ] ASF GitHub Bot logged work on HIVE-25154: - Author: ASF GitHub Bot Created on: 28/May/21 15:51 Start Date: 28/May/21 15:51 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #2311: URL: https://github.com/apache/hive/pull/2311#discussion_r640356534 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java ## @@ -99,6 +99,15 @@ public boolean isTargetOfReplication(Database db) { return false; } + public static boolean isBeingFailovedOver(Database db) { Review comment: We can move this to some util class to avoid duplication ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java ## @@ -162,6 +164,10 @@ public void run() { setupMsckPathInvalidation(); Configuration msckConf = Msck.getMsckConf(conf); for (Table table : candidateTables) { + if (MetaStoreUtils.isDbBeingFailedOver(msc.getDatabase(table.getCatName(), table.getDbName( { Review comment: This is going to be costly. One HMS call per table. Can we maintain a cache? ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java ## @@ -99,6 +99,15 @@ public boolean isTargetOfReplication(Database db) { return false; } + public static boolean isBeingFailovedOver(Database db) { Review comment: nit: Typo ## File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestPartitionManagement.java ## @@ -659,6 +712,45 @@ public void testNoPartitionRetentionForReplTarget() throws TException, Interrupt assertEquals(3, partitions.size()); } + @Test + public void testNoPartitionRetentionForFailoverDb() throws TException, InterruptedException { +String dbName = "db_failover"; +String tableName = "tbl_failover"; +Map colMap = buildAllColumns(); +List partKeys = Lists.newArrayList("state", "dt"); +List partKeyTypes = Lists.newArrayList("string", "date"); +List> partVals = Lists.newArrayList( +Lists.newArrayList("__HIVE_DEFAULT_PARTITION__", "1990-01-01"), +Lists.newArrayList("CA", "1986-04-28"), +Lists.newArrayList("MN", "2018-11-31")); +// Check for the existence of partitions 10 seconds after the partition retention period has +// elapsed. Gives enough time for the partition retention task to work. +long partitionRetentionPeriodMs = 2; +long waitingPeriodForTest = partitionRetentionPeriodMs + 10 * 1000; +createMetadata(DEFAULT_CATALOG_NAME, dbName, tableName, partKeys, partKeyTypes, partVals, colMap, false); +Table table = client.getTable(dbName, tableName); +List partitions = client.listPartitions(dbName, tableName, (short) -1); +assertEquals(3, partitions.size()); + + table.getParameters().put(PartitionManagementTask.DISCOVER_PARTITIONS_TBLPROPERTY, "true"); + table.getParameters().put(PartitionManagementTask.PARTITION_RETENTION_PERIOD_TBLPROPERTY, +partitionRetentionPeriodMs + "ms"); +client.alter_table(dbName, tableName, table); +Database db = client.getDatabase(table.getDbName()); +db.putToParameters(ReplConst.REPL_FAILOVER_ENABLED, "true"); Review comment: May be we. can have two both cases covered, with and without ReplConst.REPL_FAILOVER_ENABLED in the same test. ## File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestPartitionManagement.java ## @@ -620,6 +620,59 @@ public void testNoPartitionDiscoveryForReplTable() throws Exception { assertEquals(3, partitions.size()); } + @Test + public void testNoPartitionDiscoveryForFailoverDb() throws Exception { +String dbName = "db_failover"; +String tableName = "tbl_failover"; +Map colMap = buildAllColumns(); +List partKeys = Lists.newArrayList("state", "dt"); +List partKeyTypes = Lists.newArrayList("string", "date"); +List> partVals = Lists.newArrayList( +Lists.newArrayList("__HIVE_DEFAULT_PARTITION__", "1990-01-01"), +Lists.newArrayList("CA", "1986-04-28"), +Lists.newArrayList("MN", "2018-11-31")); +createMetadata(DEFAULT_CATALOG_NAME, dbName, tableName, partKeys, partKeyTypes, partVals, colMap, false); +Table table = client.getTable(dbName, tableName); +List partitions = client.listPartitions(dbName, tableName, (short) -1); +assertEquals(3, partitions.size()); +String tableLocation =
[jira] [Work logged] (HIVE-25176) Print DAG ID to Console
[ https://issues.apache.org/jira/browse/HIVE-25176?focusedWorklogId=603568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-603568 ] ASF GitHub Bot logged work on HIVE-25176: - Author: ASF GitHub Bot Created on: 28/May/21 15:26 Start Date: 28/May/21 15:26 Worklog Time Spent: 10m Work Description: belugabehr opened a new pull request #2328: URL: https://github.com/apache/hive/pull/2328 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 603568) Remaining Estimate: 0h Time Spent: 10m > Print DAG ID to Console > --- > > Key: HIVE-25176 > URL: https://issues.apache.org/jira/browse/HIVE-25176 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Would be helpful when troubleshooting. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25176) Print DAG ID to Console
[ https://issues.apache.org/jira/browse/HIVE-25176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25176: -- Labels: pull-request-available (was: ) > Print DAG ID to Console > --- > > Key: HIVE-25176 > URL: https://issues.apache.org/jira/browse/HIVE-25176 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Would be helpful when troubleshooting. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25176) Print DAG ID to Console
[ https://issues.apache.org/jira/browse/HIVE-25176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor updated HIVE-25176: -- Description: Would be helpful when troubleshooting. > Print DAG ID to Console > --- > > Key: HIVE-25176 > URL: https://issues.apache.org/jira/browse/HIVE-25176 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > > Would be helpful when troubleshooting. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25176) Print DAG ID to Console
[ https://issues.apache.org/jira/browse/HIVE-25176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mollitor reassigned HIVE-25176: - > Print DAG ID to Console > --- > > Key: HIVE-25176 > URL: https://issues.apache.org/jira/browse/HIVE-25176 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-287) support count(*) and count distinct on multiple columns
[ https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353307#comment-17353307 ] Dave Seth edited comment on HIVE-287 at 5/28/21, 11:58 AM: --- [Carters Coupon Code|https://couponsagent.com/front/store-profile/carters-coupon-codes] Find discounts on more than 10,000 quirky graphic tees for toddlers, sweaters and hoodies, shoes, outerwear and much more for your little ones at the lowest prices online and hassle-free. was (Author: daveseth9682): *[Carters Coupon Code|* [*https://couponsagent.com/front/store-profile/carters-coupon-codes*] *]* *Find discounts on more than 10,000 quirky graphic tees for toddlers, sweaters and hoodies, shoes, outerwear and much more for your little ones at the lowest prices online and hassle-free.* ** > support count(*) and count distinct on multiple columns > --- > > Key: HIVE-287 > URL: https://issues.apache.org/jira/browse/HIVE-287 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Namit Jain >Assignee: Arvind Prabhakar >Priority: Major > Fix For: 0.6.0 > > Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, > HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, > HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch > > > The following query does not work: > select count(distinct col1, col2) from Tbl -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-287) support count(*) and count distinct on multiple columns
[ https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353307#comment-17353307 ] Dave Seth edited comment on HIVE-287 at 5/28/21, 11:57 AM: --- *[Carters Coupon Code|* [*https://couponsagent.com/front/store-profile/carters-coupon-codes*] *]* *Find discounts on more than 10,000 quirky graphic tees for toddlers, sweaters and hoodies, shoes, outerwear and much more for your little ones at the lowest prices online and hassle-free.* ** was (Author: daveseth9682): *[Carters Coupon Code|*[*https://couponsagent.com/front/store-profile/carters-coupon-codes*]*]* *Find discounts on more than 10,000 quirky graphic tees for toddlers, sweaters and hoodies, shoes, outerwear and much more for your little ones at the lowest prices online and hassle-free.* ** > support count(*) and count distinct on multiple columns > --- > > Key: HIVE-287 > URL: https://issues.apache.org/jira/browse/HIVE-287 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Namit Jain >Assignee: Arvind Prabhakar >Priority: Major > Fix For: 0.6.0 > > Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, > HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, > HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch > > > The following query does not work: > select count(distinct col1, col2) from Tbl -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-287) support count(*) and count distinct on multiple columns
[ https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353307#comment-17353307 ] Dave Seth commented on HIVE-287: *[Carters Coupon Code|*[*https://couponsagent.com/front/store-profile/carters-coupon-codes*]*]* *Find discounts on more than 10,000 quirky graphic tees for toddlers, sweaters and hoodies, shoes, outerwear and much more for your little ones at the lowest prices online and hassle-free.* ** > support count(*) and count distinct on multiple columns > --- > > Key: HIVE-287 > URL: https://issues.apache.org/jira/browse/HIVE-287 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.6.0 >Reporter: Namit Jain >Assignee: Arvind Prabhakar >Priority: Major > Fix For: 0.6.0 > > Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, > HIVE-287-4.patch, HIVE-287-5-branch-0.6.patch, HIVE-287-5-trunk.patch, > HIVE-287-6-branch-0.6.patch, HIVE-287-6-trunk.patch > > > The following query does not work: > select count(distinct col1, col2) from Tbl -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25169) using coalesce via vector,source column type is int and target column type is bigint,the result of target is zero
[ https://issues.apache.org/jira/browse/HIVE-25169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353306#comment-17353306 ] Panagiotis Garefalakis commented on HIVE-25169: --- Hey [~junnan.yang] thanks for reporting this! Would it make sense to backport the ticket that resolved this from master? On a general note it would be much easier to review this with a github PR and a test case. Cheers > using coalesce via vector,source column type is int and target column type is > bigint,the result of target is zero > - > > Key: HIVE-25169 > URL: https://issues.apache.org/jira/browse/HIVE-25169 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.1.2 >Reporter: junnan.yang >Priority: Major > Attachments: HIVE-25169.01.patch > > > sourceTable: > product_id int; > ### > targetTable: > product_id bigint; > ## > sql: > insert overwrite table targetTable: > select > .. > coalesce(product_id,-1), > .. > from sourceTable; > ## > explain sql : > UDFToLong(COALESCE(product_id,-1)) (type: bigint) > ## > result : > the column product_id in targetTable is zero, this is wrong result > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25174) HiveMetastoreAuthorizer didn't check URI permission for AlterTableEvent
[ https://issues.apache.org/jira/browse/HIVE-25174?focusedWorklogId=603418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-603418 ] ASF GitHub Bot logged work on HIVE-25174: - Author: ASF GitHub Bot Created on: 28/May/21 07:39 Start Date: 28/May/21 07:39 Worklog Time Spent: 10m Work Description: symious opened a new pull request #2327: URL: https://github.com/apache/hive/pull/2327 ### What changes were proposed in this pull request? When Using Ranger on Hive MetaStore, we met an issue that users without permission to table's HDFS path succeeded in running "msck repair table TABLENAME". This command is not authorized when we use `StorageBasedAuthorizer`, after checking the code, we found `StorageBasedAuthorizer` would check the permission of table's HDFS path, while `HiveMetastoreAuthorizer` used by Ranger won't when dealing with the event of `AlterTableEvent`. This ticket is to add the URI permission check on AlterTableEvent for `HiveMetastoreAuthorizer`. ### Why are the changes needed? When using `StorageBasedAuthorizer`, the command of `msck repair table` would fail if the user don't have write permission to the table's path. But when using `HiveMetastoreAuthorizer` with Ranger, the command would succeed even the user don't have write permission to the table's path. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Can be manually test with `alter table` command. Need to set Ranger as Authorizer for Hive MetaStore. Before the test, we need to ensure test user doesn't have write permission on the table's path. * before applying patch ``` spark-sql> > alter table yiyang_people add columns(id int); Time taken: 2.379 seconds 21/05/28 15:33:17 INFO SparkSQLCLIDriver: Time taken: 2.379 seconds spark-sql> ``` * after applying patch ``` spark-sql> > > alter table yiyang_people add columns(id int); 21/05/28 15:30:59 WARN HiveExternalCatalog: Could not alter schema of table `default`.`yiyang_people` in a Hive compatible way. Updating Hive metastore in Spark SQL specific format. java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.sql.hive.client.Shim_v0_12.alterTable(HiveShim.scala:400) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$alterTableDataSchema$1.apply$mcV$sp(HiveClientImpl.scala:536) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$alterTableDataSchema$1.apply(HiveClientImpl.scala:515) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$alterTableDataSchema$1.apply(HiveClientImpl.scala:515) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:277) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:215) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:214) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:260) at org.apache.spark.sql.hive.client.HiveClientImpl.alterTableDataSchema(HiveClientImpl.scala:515) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableDataSchema$1.apply$mcV$sp(HiveExternalCatalog.scala:664) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableDataSchema$1.apply(HiveExternalCatalog.scala:650) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableDataSchema$1.apply(HiveExternalCatalog.scala:650) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) at org.apache.spark.sql.hive.HiveExternalCatalog.alterTableDataSchema(HiveExternalCatalog.scala:650) at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.alterTableDataSchema(ExternalCatalogWithListener.scala:124) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableDataSchema(SessionCatalog.scala:391) at org.apache.spark.sql.execution.command.AlterTableAddColumnsCommand.run(tables.scala:203) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at
[jira] [Updated] (HIVE-25174) HiveMetastoreAuthorizer didn't check URI permission for AlterTableEvent
[ https://issues.apache.org/jira/browse/HIVE-25174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25174: -- Labels: pull-request-available (was: ) > HiveMetastoreAuthorizer didn't check URI permission for AlterTableEvent > --- > > Key: HIVE-25174 > URL: https://issues.apache.org/jira/browse/HIVE-25174 > Project: Hive > Issue Type: Improvement >Reporter: Janus Chow >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When Using Ranger on Hive MetaStore, we met an issue that users without > permission to table's HDFS path succeeded in running "msck repair table > TABLENAME". > This command is not authorized when we use `StorageBasedAuthorizer`, after > checking the code, we found `StorageBasedAuthorizer` would check the > permission of table's HDFS path, while `HiveMetastoreAuthorizer` used by > Ranger won't when dealing with the event of `AlterTableEvent`. > This ticket is to add the URI permission check on AlterTableEvent for > `HiveMetastoreAuthorizer`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-25173) Fix build failure of hive-pre-upgrade due to missing dependency on pentaho-aggdesigner-algorithm
[ https://issues.apache.org/jira/browse/HIVE-25173?focusedWorklogId=603404=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-603404 ] ASF GitHub Bot logged work on HIVE-25173: - Author: ASF GitHub Bot Created on: 28/May/21 06:01 Start Date: 28/May/21 06:01 Worklog Time Spent: 10m Work Description: iwasakims commented on pull request #2326: URL: https://github.com/apache/hive/pull/2326#issuecomment-850159676 There are 3 test failures. * Testing / split-18 / PostProcess / testForcedLocalityMultiplePreemptionsSameHost1 – org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService * Testing / split-10 / PostProcess / testExternalDefaultPaths – org.apache.hadoop.hive.ql.TestWarehouseExternalDir * Testing / split-10 / PostProcess / – org.apache.hadoop.hive.ql.TestWarehouseExternalDir I could not reproduce the failure on my local. It looks unrelated to the patch. ``` $ cd ~/srcs/hive/llap-tez $ mvn test -Dtest=TestLlapTaskSchedulerService ... [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService [INFO] Tests run: 34, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.162 s - in org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService ``` ``` $ cd ~/srcs/hive/itests $ mvn install -DskipTests $ cd hive-unit $ mvn test -Dtest=TestWarehouseExternalDir ... [INFO] Running org.apache.hadoop.hive.ql.TestWarehouseExternalDir [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 44.955 s - in org.apache.hadoop.hive.ql.TestWarehouseExternalDir ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 603404) Time Spent: 20m (was: 10m) > Fix build failure of hive-pre-upgrade due to missing dependency on > pentaho-aggdesigner-algorithm > > > Key: HIVE-25173 > URL: https://issues.apache.org/jira/browse/HIVE-25173 > Project: Hive > Issue Type: Improvement >Affects Versions: 3.1.2 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > {noformat} > [ERROR] Failed to execute goal on project hive-pre-upgrade: Could not resolve > dependencies for project org.apache.hive:hive-pre-upgrade:jar:4.0.0-SNAPSHOT: > Failure to find org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde in > https://repo.maven.apache.org/maven2 was cached in the local repository, > resolution will not be reattempted until the update interval of central has > elapsed or updates are forced > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)