[jira] [Created] (HIVE-25076) Get number of write tasks from jobConf for Iceberg commits
Marton Bod created HIVE-25076: - Summary: Get number of write tasks from jobConf for Iceberg commits Key: HIVE-25076 URL: https://issues.apache.org/jira/browse/HIVE-25076 Project: Hive Issue Type: Improvement Reporter: Marton Bod Assignee: Marton Bod When writing empty data into Iceberg tables, we can end up with 0 succeeded task count number. With the current logic, we might then erroneously end up taking the number of mapper tasks in the commit logic, which would result in failures. We should instead save the number of succeeded task count into the JobConf under a specified key and retrieve it from there. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25077) Direct SQL to fetch column privileges in refreshPrivileges may be broken in postgres
Rajesh Balamohan created HIVE-25077: --- Summary: Direct SQL to fetch column privileges in refreshPrivileges may be broken in postgres Key: HIVE-25077 URL: https://issues.apache.org/jira/browse/HIVE-25077 Project: Hive Issue Type: Improvement Reporter: Rajesh Balamohan HIVE-22512 tried to fix direct-sql for col privileges. However, "GRANT_OPTION" field in "TBL_COL_PRIVS" is marked as smallint in postgres. In code, it is retrieved as boolean. Ref: [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1533] {code:java} boolean grantOption = MetastoreDirectSqlUtils.extractSqlBoolean(privLine[grantOptionIndex]); {code} [https://github.com/apache/hive/blob/048336bd0c21163920557a60c88135b1d5b42d3d/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java#L530] MetastoreDirectSqlUtils::extractSqlBoolean should handle integers to support directSQL in postgres. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25071) Number of reducers limited to fixed 1 when updating/deleting
Krisztian Kasa created HIVE-25071: - Summary: Number of reducers limited to fixed 1 when updating/deleting Key: HIVE-25071 URL: https://issues.apache.org/jira/browse/HIVE-25071 Project: Hive Issue Type: Bug Reporter: Krisztian Kasa Assignee: Krisztian Kasa When updating/deleting bucketed tables an extra ReduceSink operator is created to enforce bucketing. After HIVE-22538 number of reducers limited to fixed 1 in these RS operators. This can lead to performance degradation. Prior HIVE-22538 multiple reducers was available such cases. The reason for limiting the number of reducers is to ensure RowId ascending order in delete delta files produced by the update/delete statements. This is the plan of delete statement like: {code} DELETE FROM t1 WHERE a = 1; {code} {code} TS[0]-FIL[8]-SEL[2]-RS[3]-SEL[4]-RS[5]-SEL[6]-FS[7] {code} RowId order is ensured by RS[3] and bucketing is enforced by RS[5]: number of reducers were limited to bucket number in the table or hive.exec.reducers.max. However RS[5] does not provide any ordering so above plan may generate unsorted deleted deltas which leads to corrupted data reads. Prior HIVE-22538 these RS operators were merged by ReduceSinkDeduplication and the resulting RS kept the ordering and enabled multiple reducers. It could do because ReduceSinkDeduplication was prepared for ACID writes. This was removed by HIVE-22538 to get a more generic ReduceSinkDeduplication. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25073) Optimise HiveAlterHandler::alterPartitions
Rajesh Balamohan created HIVE-25073: --- Summary: Optimise HiveAlterHandler::alterPartitions Key: HIVE-25073 URL: https://issues.apache.org/jira/browse/HIVE-25073 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Rajesh Balamohan Table details are populated again and again for each partition, which can be avoided. https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java#L5892 https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L808 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25074) Remove Metastore flushCache usage
Miklos Szurap created HIVE-25074: Summary: Remove Metastore flushCache usage Key: HIVE-25074 URL: https://issues.apache.org/jira/browse/HIVE-25074 Project: Hive Issue Type: Improvement Components: Metastore, Standalone Metastore Affects Versions: 4.0.0 Reporter: Miklos Szurap The "flushCache" in HiveMetaStore with the ObjectStore implementation is currently a NOOP: {code:java} public void flushCache() { // NOP as there's no caching } {code} The HBaseStore (HBaseReadWrite) had some logic in it, however it has been removed in HIVE-17234. As I see the calls are going like this: HiveMetaStoreClient.flushCache() -> CachedStore.flushCache() -> ObjectStore.flushCache() There are significant amount of calls (about 10% of all calls) made from the client to the server - to do nothing. We could spare the call to the server completely, including getting a DB connection which can take 1+ seconds under high load scenarios slowing down Hive queries unnecessarily. Can we: # Deprecate the RawStore.flushCache (if there are other implementations) # Deprecate the HiveMetaStoreClient.flushCache() # Do the NOOP on the client side in HiveMetaStoreClient.flushCache() (while it is not removed in a next version) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25072) Optimise ObjectStore::alterPartitions
Rajesh Balamohan created HIVE-25072: --- Summary: Optimise ObjectStore::alterPartitions Key: HIVE-25072 URL: https://issues.apache.org/jira/browse/HIVE-25072 Project: Hive Issue Type: Improvement Components: Metastore Reporter: Rajesh Balamohan Avoid fetching table details for every partition in the table. Ref: https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L5104 https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L4986 Following stacktrace may be relevant for apache master as well. {noformat} at org.datanucleus.store.query.Query.executeWithArray(Query.java:1744) at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:368) at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:255) at org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:2113) at org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:2152) at org.apache.hadoop.hive.metastore.ObjectStore.alterPartitionNoTxn(ObjectStore.java:4951) at org.apache.hadoop.hive.metastore.ObjectStore.alterPartitions(ObjectStore.java:5057) at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) at com.sun.proxy.$Proxy27.alterPartitions(Unknown Source) at org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:798) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:5695) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_req(HiveMetaStore.java:5647) at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) at com.sun.proxy.$Proxy28.alter_partitions_req(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions_req.getResult(ThriftHiveMetastore.java:18557) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions_req.getResult(ThriftHiveMetastore.java:18541) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:643) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:638) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-25075) Hive::loadPartitionInternal establishes HMS connection for every partition for external tables
Rajesh Balamohan created HIVE-25075: --- Summary: Hive::loadPartitionInternal establishes HMS connection for every partition for external tables Key: HIVE-25075 URL: https://issues.apache.org/jira/browse/HIVE-25075 Project: Hive Issue Type: Improvement Reporter: Rajesh Balamohan https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2522 {code} boolean needRecycle = !tbl.isTemporary() && ReplChangeManager.shouldEnableCm(Hive.get().getDatabase(tbl.getDbName()), tbl.getTTable()); {code} Hive.get() breaks the current connection with HMS. Due to this, for external table partition loads, it establishes HMS connection for partition. -- This message was sent by Atlassian Jira (v8.3.4#803005)