[jira] [Created] (HIVE-24075) Optimise KeyValuesInputMerger
Rajesh Balamohan created HIVE-24075: --- Summary: Optimise KeyValuesInputMerger Key: HIVE-24075 URL: https://issues.apache.org/jira/browse/HIVE-24075 Project: Hive Issue Type: Improvement Reporter: Rajesh Balamohan Comparisons in KeyValueInputMerger can be reduced. [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L165|https://github.infra.cloudera.com/CDH/hive/blob/cdpd-master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L165] [https://github.infra.cloudera.com/CDH/hive/blob/cdpd-master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L150|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L150] If the reader comparisons in the queue are same, we could reuse "{{nextKVReaders}}" in next subsequent iteration instead of doing the comparison all over again. [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L178] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24074) Incorrect handling of timestamp in Parquet/Avro when written in certain time zones in versions before Hive 3.x
Jesus Camacho Rodriguez created HIVE-24074: -- Summary: Incorrect handling of timestamp in Parquet/Avro when written in certain time zones in versions before Hive 3.x Key: HIVE-24074 URL: https://issues.apache.org/jira/browse/HIVE-24074 Project: Hive Issue Type: Bug Components: Avro, Parquet Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez The timezone conversion for Parquet and Avro uses new {{java.time.*}} classes, which can lead to incorrect values returned for certain dates in certain timezones if timestamp was computed and converted based on {{java.sql.*}} classes. For instance, the offset used for Singapore timezone in 1900-01-01T00:00:00.000 is UTC+8, while the correct offset for that date should be UTC+6:55:25. Some additional information can be found here: https://stackoverflow.com/a/52152315 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24073) Execution exception in sort-merge semijoin
Jesus Camacho Rodriguez created HIVE-24073: -- Summary: Execution exception in sort-merge semijoin Key: HIVE-24073 URL: https://issues.apache.org/jira/browse/HIVE-24073 Project: Hive Issue Type: Bug Components: Operators Reporter: Jesus Camacho Rodriguez Assignee: mahesh kumar behera Working on HIVE-24001, we trigger an additional SJ conversion that leads to this exception at execution time: {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite nextKeyWritables[1] at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1063) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:685) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:462) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite nextKeyWritables[1] at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1037) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1060) ... 22 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite nextKeyWritables[1] at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.processKey(CommonMergeJoinOperator.java:564) at org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:243) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887) at org.apache.hadoop.hive.ql.exec.TezDummyStoreOperator.process(TezDummyStoreOperator.java:49) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1003) at org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1020) ... 23 more {code} To reproduce, just set {{hive.auto.convert.sortmerge.join}} to {{true}} in the last query in {{auto_sortmerge_join_10.q}} after HIVE-24041 has been merged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24072) HiveAggregateJoinTransposeRule may try to create an invalid transformation
Zoltan Haindrich created HIVE-24072: --- Summary: HiveAggregateJoinTransposeRule may try to create an invalid transformation Key: HIVE-24072 URL: https://issues.apache.org/jira/browse/HIVE-24072 Project: Hive Issue Type: Bug Reporter: Zoltan Haindrich Assignee: Zoltan Haindrich {code} java.lang.AssertionError: Cannot add expression of different type to set: set type is RecordType(INTEGER NOT NULL o_orderkey, DECIMAL(10, 0) o_totalprice, DATE o_orderdate, INTEGER NOT NULL c_custkey, VARCHAR(25) CHARACTER SET "UTF-16LE" c_name, DOUBLE $f5) NOT NULL expression type is RecordType(INTEGER NOT NULL o_orderkey, INTEGER NOT NULL o_custkey, DECIMAL(10, 0) o_totalprice, DATE o_orderdate, INTEGER NOT NULL c_custkey, DOUBLE $f1) NOT NULL set is rel#567:HiveAggregate.HIVE.[].any(input=HepRelVertex#490,group={2, 4, 5, 6, 7},agg#0=sum($1)) expression is HiveProject(o_orderkey=[$2], o_custkey=[$3], o_totalprice=[$4], o_orderdate=[$5], c_custkey=[$6], $f1=[$1]) HiveJoin(condition=[=($2, $0)], joinType=[inner], algorithm=[none], cost=[{2284.5 rows, 0.0 cpu, 0.0 io}]) HiveAggregate(group=[{0}], agg#0=[sum($1)]) HiveProject(l_orderkey=[$0], l_quantity=[$4]) HiveTableScan(table=[[tpch_0_001, lineitem]], table:alias=[l]) HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[none], cost=[{1.9115E15 rows, 0.0 cpu, 0.0 io}]) HiveJoin(condition=[=($4, $1)], joinType=[inner], algorithm=[none], cost=[{1650.0 rows, 0.0 cpu, 0.0 io}]) HiveProject(o_orderkey=[$0], o_custkey=[$1], o_totalprice=[$3], o_orderdate=[$4]) HiveTableScan(table=[[tpch_0_001, orders]], table:alias=[orders]) HiveProject(c_custkey=[$0], c_name=[$1]) HiveTableScan(table=[[tpch_0_001, customer]], table:alias=[customer]) HiveProject($f0=[$0]) HiveFilter(condition=[>($1, 3E2)]) HiveAggregate(group=[{0}], agg#0=[sum($4)]) HiveTableScan(table=[[tpch_0_001, lineitem]], table:alias=[lineitem]) at org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:383) at org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57) at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:236) at org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveAggregateJoinTransposeRule.onMatch(HiveAggregateJoinTransposeRule.java:300) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24071) Continue cleaning the NotificationEvents till we have data greater than TTL
Ramesh Kumar Thangarajan created HIVE-24071: --- Summary: Continue cleaning the NotificationEvents till we have data greater than TTL Key: HIVE-24071 URL: https://issues.apache.org/jira/browse/HIVE-24071 Project: Hive Issue Type: Bug Components: repl Affects Versions: 4.0.0 Reporter: Ramesh Kumar Thangarajan Assignee: Ramesh Kumar Thangarajan Fix For: 4.0.0 Continue cleaning the NotificationEvents till we have data greater than TTL. Currently we only clean the notification events once every 2 hours and also strict 1 every time. We should continue deleting until we clear up all the notification events greater than TTL. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24070) ObjectStore.cleanWriteNotificationEvents OutOfMemory on large number of pending events
Ramesh Kumar Thangarajan created HIVE-24070: --- Summary: ObjectStore.cleanWriteNotificationEvents OutOfMemory on large number of pending events Key: HIVE-24070 URL: https://issues.apache.org/jira/browse/HIVE-24070 Project: Hive Issue Type: Bug Components: repl Affects Versions: 4.0.0 Reporter: Ramesh Kumar Thangarajan Assignee: Ramesh Kumar Thangarajan Fix For: 4.0.0 If there are large number of events that haven't been cleaned up for some reason, then ObjectStore.cleanWriteNotificationEvents() can run out of memory while it loads all the events to be deleted. It should fetch events in batches. -- This message was sent by Atlassian Jira (v8.3.4#803005)