[jira] [Created] (HIVE-24075) Optimise KeyValuesInputMerger

2020-08-25 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-24075:
---

 Summary: Optimise KeyValuesInputMerger
 Key: HIVE-24075
 URL: https://issues.apache.org/jira/browse/HIVE-24075
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


Comparisons in KeyValueInputMerger can be reduced.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L165|https://github.infra.cloudera.com/CDH/hive/blob/cdpd-master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L165]

[https://github.infra.cloudera.com/CDH/hive/blob/cdpd-master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L150|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L150]

If the reader comparisons in the queue are same, we could reuse 
"{{nextKVReaders}}" in next subsequent iteration instead of doing the 
comparison all over again.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/tools/KeyValuesInputMerger.java#L178]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24074) Incorrect handling of timestamp in Parquet/Avro when written in certain time zones in versions before Hive 3.x

2020-08-25 Thread Jesus Camacho Rodriguez (Jira)
Jesus Camacho Rodriguez created HIVE-24074:
--

 Summary: Incorrect handling of timestamp in Parquet/Avro when 
written in certain time zones in versions before Hive 3.x
 Key: HIVE-24074
 URL: https://issues.apache.org/jira/browse/HIVE-24074
 Project: Hive
  Issue Type: Bug
  Components: Avro, Parquet
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


The timezone conversion for Parquet and Avro uses new {{java.time.*}} classes, 
which can lead to incorrect values returned for certain dates in certain 
timezones if timestamp was computed and converted based on {{java.sql.*}} 
classes. For instance, the offset used for Singapore timezone in 
1900-01-01T00:00:00.000 is UTC+8, while the correct offset for that date should 
be UTC+6:55:25. Some additional information can be found here: 
https://stackoverflow.com/a/52152315



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24073) Execution exception in sort-merge semijoin

2020-08-25 Thread Jesus Camacho Rodriguez (Jira)
Jesus Camacho Rodriguez created HIVE-24073:
--

 Summary: Execution exception in sort-merge semijoin
 Key: HIVE-24073
 URL: https://issues.apache.org/jira/browse/HIVE-24073
 Project: Hive
  Issue Type: Bug
  Components: Operators
Reporter: Jesus Camacho Rodriguez
Assignee: mahesh kumar behera


Working on HIVE-24001, we trigger an additional SJ conversion that leads to 
this exception at execution time:

{code}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
nextKeyWritables[1]
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1063)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:685)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:707)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:462)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to overwrite 
nextKeyWritables[1]
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1037)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:1060)
... 22 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Attempting to 
overwrite nextKeyWritables[1]
at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.processKey(CommonMergeJoinOperator.java:564)
at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:243)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
at 
org.apache.hadoop.hive.ql.exec.TezDummyStoreOperator.process(TezDummyStoreOperator.java:49)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:887)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1003)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.flush(GroupByOperator.java:1020)
... 23 more
{code}

To reproduce, just set {{hive.auto.convert.sortmerge.join}} to {{true}} in the 
last query in {{auto_sortmerge_join_10.q}} after HIVE-24041 has been merged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24072) HiveAggregateJoinTransposeRule may try to create an invalid transformation

2020-08-25 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24072:
---

 Summary: HiveAggregateJoinTransposeRule may try to create an 
invalid transformation
 Key: HIVE-24072
 URL: https://issues.apache.org/jira/browse/HIVE-24072
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


{code}
java.lang.AssertionError: 
Cannot add expression of different type to set:
set type is RecordType(INTEGER NOT NULL o_orderkey, DECIMAL(10, 0) 
o_totalprice, DATE o_orderdate, INTEGER NOT NULL c_custkey, VARCHAR(25) 
CHARACTER SET "UTF-16LE" c_name, DOUBLE $f5) NOT NULL
expression type is RecordType(INTEGER NOT NULL o_orderkey, INTEGER NOT NULL 
o_custkey, DECIMAL(10, 0) o_totalprice, DATE o_orderdate, INTEGER NOT NULL 
c_custkey, DOUBLE $f1) NOT NULL
set is rel#567:HiveAggregate.HIVE.[].any(input=HepRelVertex#490,group={2, 4, 5, 
6, 7},agg#0=sum($1))
expression is HiveProject(o_orderkey=[$2], o_custkey=[$3], o_totalprice=[$4], 
o_orderdate=[$5], c_custkey=[$6], $f1=[$1])
  HiveJoin(condition=[=($2, $0)], joinType=[inner], algorithm=[none], 
cost=[{2284.5 rows, 0.0 cpu, 0.0 io}])
HiveAggregate(group=[{0}], agg#0=[sum($1)])
  HiveProject(l_orderkey=[$0], l_quantity=[$4])
HiveTableScan(table=[[tpch_0_001, lineitem]], table:alias=[l])
HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[none], 
cost=[{1.9115E15 rows, 0.0 cpu, 0.0 io}])
  HiveJoin(condition=[=($4, $1)], joinType=[inner], algorithm=[none], 
cost=[{1650.0 rows, 0.0 cpu, 0.0 io}])
HiveProject(o_orderkey=[$0], o_custkey=[$1], o_totalprice=[$3], 
o_orderdate=[$4])
  HiveTableScan(table=[[tpch_0_001, orders]], table:alias=[orders])
HiveProject(c_custkey=[$0], c_name=[$1])
  HiveTableScan(table=[[tpch_0_001, customer]], table:alias=[customer])
  HiveProject($f0=[$0])
HiveFilter(condition=[>($1, 3E2)])
  HiveAggregate(group=[{0}], agg#0=[sum($4)])
HiveTableScan(table=[[tpch_0_001, lineitem]], 
table:alias=[lineitem])

at 
org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:383)
at 
org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
at 
org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:236)
at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveAggregateJoinTransposeRule.onMatch(HiveAggregateJoinTransposeRule.java:300)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24071) Continue cleaning the NotificationEvents till we have data greater than TTL

2020-08-25 Thread Ramesh Kumar Thangarajan (Jira)
Ramesh Kumar Thangarajan created HIVE-24071:
---

 Summary: Continue cleaning the NotificationEvents till we have 
data greater than TTL
 Key: HIVE-24071
 URL: https://issues.apache.org/jira/browse/HIVE-24071
 Project: Hive
  Issue Type: Bug
  Components: repl
Affects Versions: 4.0.0
Reporter: Ramesh Kumar Thangarajan
Assignee: Ramesh Kumar Thangarajan
 Fix For: 4.0.0


Continue cleaning the NotificationEvents till we have data greater than TTL.

Currently we only clean the notification events once every 2 hours and also 
strict 1 every time. We should continue deleting until we clear up all the 
notification events greater than TTL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24070) ObjectStore.cleanWriteNotificationEvents OutOfMemory on large number of pending events

2020-08-25 Thread Ramesh Kumar Thangarajan (Jira)
Ramesh Kumar Thangarajan created HIVE-24070:
---

 Summary: ObjectStore.cleanWriteNotificationEvents OutOfMemory on 
large number of pending events
 Key: HIVE-24070
 URL: https://issues.apache.org/jira/browse/HIVE-24070
 Project: Hive
  Issue Type: Bug
  Components: repl
Affects Versions: 4.0.0
Reporter: Ramesh Kumar Thangarajan
Assignee: Ramesh Kumar Thangarajan
 Fix For: 4.0.0


If there are large number of events that haven't been cleaned up for some 
reason, then ObjectStore.cleanWriteNotificationEvents() can run out of memory 
while it loads all the events to be deleted.
It should fetch events in batches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)