[jira] [Created] (HIVE-25076) Get number of write tasks from jobConf for Iceberg commits

2021-04-29 Thread Marton Bod (Jira)
Marton Bod created HIVE-25076:
-

 Summary: Get number of write tasks from jobConf for Iceberg commits
 Key: HIVE-25076
 URL: https://issues.apache.org/jira/browse/HIVE-25076
 Project: Hive
  Issue Type: Improvement
Reporter: Marton Bod
Assignee: Marton Bod


When writing empty data into Iceberg tables, we can end up with 0 succeeded 
task count number. With the current logic, we might then erroneously end up 
taking the number of mapper tasks in the commit logic, which would result in 
failures. We should instead save the number of succeeded task count into the 
JobConf under a specified key and retrieve it from there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25077) Direct SQL to fetch column privileges in refreshPrivileges may be broken in postgres

2021-04-29 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-25077:
---

 Summary: Direct SQL to fetch column privileges in 
refreshPrivileges may be broken in postgres
 Key: HIVE-25077
 URL: https://issues.apache.org/jira/browse/HIVE-25077
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


HIVE-22512 tried to fix direct-sql for col privileges.

 

However, "GRANT_OPTION" field in "TBL_COL_PRIVS" is marked as smallint in 
postgres. In code, it is retrieved as boolean.

Ref: 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1533]

 
{code:java}
boolean grantOption = 
MetastoreDirectSqlUtils.extractSqlBoolean(privLine[grantOptionIndex]);
{code}
 

[https://github.com/apache/hive/blob/048336bd0c21163920557a60c88135b1d5b42d3d/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java#L530]

 

MetastoreDirectSqlUtils::extractSqlBoolean should handle integers to support 
directSQL in postgres.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25071) Number of reducers limited to fixed 1 when updating/deleting

2021-04-29 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-25071:
-

 Summary: Number of reducers limited to fixed 1 when 
updating/deleting
 Key: HIVE-25071
 URL: https://issues.apache.org/jira/browse/HIVE-25071
 Project: Hive
  Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


When updating/deleting bucketed tables an extra ReduceSink operator is created 
to enforce bucketing. After HIVE-22538 number of reducers limited to fixed 1 in 
these RS operators.

This can lead to performance degradation.

Prior HIVE-22538 multiple reducers was available such cases. The reason for 
limiting the number of reducers is to ensure RowId ascending order in delete 
delta files produced by the update/delete statements.

This is the plan of delete statement like:

{code}
DELETE FROM t1 WHERE a = 1;
{code}
{code}
TS[0]-FIL[8]-SEL[2]-RS[3]-SEL[4]-RS[5]-SEL[6]-FS[7]
{code}

RowId order is ensured by RS[3] and bucketing is enforced by RS[5]: number of 
reducers were limited to bucket number in the table or hive.exec.reducers.max. 
However RS[5] does not provide any ordering so above plan may generate unsorted 
deleted deltas which leads to corrupted data reads.

Prior HIVE-22538 these RS operators were merged by ReduceSinkDeduplication and 
the resulting RS kept the ordering and enabled multiple reducers. It could do 
because ReduceSinkDeduplication was prepared for ACID writes. This was removed 
by HIVE-22538 to get a more generic ReduceSinkDeduplication.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25073) Optimise HiveAlterHandler::alterPartitions

2021-04-29 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-25073:
---

 Summary: Optimise HiveAlterHandler::alterPartitions
 Key: HIVE-25073
 URL: https://issues.apache.org/jira/browse/HIVE-25073
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Rajesh Balamohan


Table details are populated again and again for each partition, which can be 
avoided.

https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java#L5892

https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L808



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25074) Remove Metastore flushCache usage

2021-04-29 Thread Miklos Szurap (Jira)
Miklos Szurap created HIVE-25074:


 Summary: Remove Metastore flushCache usage
 Key: HIVE-25074
 URL: https://issues.apache.org/jira/browse/HIVE-25074
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Standalone Metastore
Affects Versions: 4.0.0
Reporter: Miklos Szurap


The "flushCache" in HiveMetaStore with the ObjectStore implementation is 
currently a NOOP:
{code:java}
  public void flushCache() {
// NOP as there's no caching
  } {code}
The HBaseStore (HBaseReadWrite) had some logic in it, however it has been 
removed in HIVE-17234.

As I see the calls are going like this:

HiveMetaStoreClient.flushCache() -> CachedStore.flushCache() -> 
ObjectStore.flushCache()

There are significant amount of calls (about 10% of all calls) made from the 
client to the server - to do nothing. We could spare the call to the server 
completely, including getting a DB connection which can take 1+ seconds under 
high load scenarios slowing down Hive queries unnecessarily.

Can we:
 # Deprecate the RawStore.flushCache (if there are other implementations)
 # Deprecate the HiveMetaStoreClient.flushCache()
 # Do the NOOP on the client side in HiveMetaStoreClient.flushCache() (while it 
is not removed in a next version)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25072) Optimise ObjectStore::alterPartitions

2021-04-29 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-25072:
---

 Summary: Optimise ObjectStore::alterPartitions
 Key: HIVE-25072
 URL: https://issues.apache.org/jira/browse/HIVE-25072
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Rajesh Balamohan


Avoid fetching table details for every partition in the table.

Ref:

 
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L5104

https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L4986



Following stacktrace may be relevant for apache master as well.
{noformat}

at org.datanucleus.store.query.Query.executeWithArray(Query.java:1744)
at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:368)
at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:255)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:2113)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getMTable(ObjectStore.java:2152)
at 
org.apache.hadoop.hive.metastore.ObjectStore.alterPartitionNoTxn(ObjectStore.java:4951)
at 
org.apache.hadoop.hive.metastore.ObjectStore.alterPartitions(ObjectStore.java:5057)
at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
at com.sun.proxy.$Proxy27.alterPartitions(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:798)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:5695)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_req(HiveMetaStore.java:5647)
at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
at com.sun.proxy.$Proxy28.alter_partitions_req(Unknown Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions_req.getResult(ThriftHiveMetastore.java:18557)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions_req.getResult(ThriftHiveMetastore.java:18541)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:643)
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:638)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)

{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25075) Hive::loadPartitionInternal establishes HMS connection for every partition for external tables

2021-04-29 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-25075:
---

 Summary: Hive::loadPartitionInternal establishes HMS connection 
for every partition for external tables
 Key: HIVE-25075
 URL: https://issues.apache.org/jira/browse/HIVE-25075
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2522

{code}
boolean needRecycle = !tbl.isTemporary()
  && 
ReplChangeManager.shouldEnableCm(Hive.get().getDatabase(tbl.getDbName()), 
tbl.getTTable());
{code}

Hive.get() breaks the current connection with HMS. Due to this, for external 
table partition loads, it establishes HMS connection for partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)