[jira] [Updated] (HIVE-24036) Kryo Exception while serializing plan for getSplits UDF call
[ https://issues.apache.org/jira/browse/HIVE-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24036: -- Labels: pull-request-available (was: ) > Kryo Exception while serializing plan for getSplits UDF call > > > Key: HIVE-24036 > URL: https://issues.apache.org/jira/browse/HIVE-24036 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {code:java} > Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.lang.IllegalArgumentException: Unable to create serializer > "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for > class: org.apache.hadoop.hive.llap.LlapOutputFormat > Serialization trace: > outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc) > tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc) > conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator) > childOperators (org.apache.hadoop.hive.ql.exec.UnionOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators > (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) > childOperators (org.apache.hadoop.hive.ql.exec.PTFOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) >at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700) > >at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571) > >at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560) > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24036) Kryo Exception while serializing plan for getSplits UDF call
[ https://issues.apache.org/jira/browse/HIVE-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-24036: -- Status: Patch Available (was: Open) > Kryo Exception while serializing plan for getSplits UDF call > > > Key: HIVE-24036 > URL: https://issues.apache.org/jira/browse/HIVE-24036 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {code:java} > Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.lang.IllegalArgumentException: Unable to create serializer > "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for > class: org.apache.hadoop.hive.llap.LlapOutputFormat > Serialization trace: > outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc) > tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc) > conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator) > childOperators (org.apache.hadoop.hive.ql.exec.UnionOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators > (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) > childOperators (org.apache.hadoop.hive.ql.exec.PTFOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) >at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700) > >at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571) > >at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560) > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24036) Kryo Exception while serializing plan for getSplits UDF call
[ https://issues.apache.org/jira/browse/HIVE-24036?focusedWorklogId=470074=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-470074 ] ASF GitHub Bot logged work on HIVE-24036: - Author: ASF GitHub Bot Created on: 13/Aug/20 04:41 Start Date: 13/Aug/20 04:41 Worklog Time Spent: 10m Work Description: nareshpr opened a new pull request #1399: URL: https://github.com/apache/hive/pull/1399 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 470074) Remaining Estimate: 0h Time Spent: 10m > Kryo Exception while serializing plan for getSplits UDF call > > > Key: HIVE-24036 > URL: https://issues.apache.org/jira/browse/HIVE-24036 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > {code:java} > Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.lang.IllegalArgumentException: Unable to create serializer > "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for > class: org.apache.hadoop.hive.llap.LlapOutputFormat > Serialization trace: > outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc) > tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc) > conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator) > childOperators (org.apache.hadoop.hive.ql.exec.UnionOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators > (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) > childOperators (org.apache.hadoop.hive.ql.exec.PTFOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) >at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700) > >at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571) > >at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560) > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24036) Kryo Exception while serializing plan for getSplits UDF call
[ https://issues.apache.org/jira/browse/HIVE-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-24036: -- Description: {code:java} Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for class: org.apache.hadoop.hive.llap.LlapOutputFormat Serialization trace: outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc) tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc) conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator) childOperators (org.apache.hadoop.hive.ql.exec.UnionOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) childOperators (org.apache.hadoop.hive.ql.exec.PTFOperator) childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560) {code} was: {code:java} Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for class: org.apache.hadoop.hive.llap.LlapOutputFormat Serialization trace:outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)childOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators (org.apache.hadoop.hive.ql.exec.PTFOperator)childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560) {code} > Kryo Exception while serializing plan for getSplits UDF call > > > Key: HIVE-24036 > URL: https://issues.apache.org/jira/browse/HIVE-24036 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > > {code:java} > Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.lang.IllegalArgumentException: Unable to create serializer > "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for > class: org.apache.hadoop.hive.llap.LlapOutputFormat > Serialization trace: > outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc) > tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc) > conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator) > childOperators (org.apache.hadoop.hive.ql.exec.UnionOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators > (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) > childOperators (org.apache.hadoop.hive.ql.exec.PTFOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) >at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700) > >at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571) > >at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560) > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24036) Kryo Exception while serializing plan for getSplits UDF call
[ https://issues.apache.org/jira/browse/HIVE-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-24036: -- Description: {code:java} Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for class: org.apache.hadoop.hive.llap.LlapOutputFormat Serialization trace:outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)childOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators (org.apache.hadoop.hive.ql.exec.PTFOperator)childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560) {code} was: {code:java} Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for class: org.apache.hadoop.hive.llap.LlapOutputFormatCaused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for class: org.apache.hadoop.hive.llap.LlapOutputFormatSerialization trace:outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)childOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators (org.apache.hadoop.hive.ql.exec.PTFOperator)childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560) {code} > Kryo Exception while serializing plan for getSplits UDF call > > > Key: HIVE-24036 > URL: https://issues.apache.org/jira/browse/HIVE-24036 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > > {code:java} > Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.lang.IllegalArgumentException: Unable to create serializer > "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for > class: org.apache.hadoop.hive.llap.LlapOutputFormat > Serialization trace:outputFileFormatClass > (org.apache.hadoop.hive.ql.plan.TableDesc)tableInfo > (org.apache.hadoop.hive.ql.plan.FileSinkDesc)conf > (org.apache.hadoop.hive.ql.exec.FileSinkOperator)childOperators > (org.apache.hadoop.hive.ql.exec.UnionOperator)childOperators > (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators > (org.apache.hadoop.hive.ql.exec.MapJoinOperator)childOperators > (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators > (org.apache.hadoop.hive.ql.exec.PTFOperator)childOperators > (org.apache.hadoop.hive.ql.exec.SelectOperator) at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700) > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571) > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560) > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24036) Kryo Exception while serializing plan for getSplits UDF call
[ https://issues.apache.org/jira/browse/HIVE-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-24036: -- Description: {code:java} Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for class: org.apache.hadoop.hive.llap.LlapOutputFormatCaused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for class: org.apache.hadoop.hive.llap.LlapOutputFormatSerialization trace:outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)childOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators (org.apache.hadoop.hive.ql.exec.PTFOperator)childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560) {code} was: {code:java} Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for class: org.apache.hadoop.hive.llap.LlapOutputFormatCaused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for class: org.apache.hadoop.hive.llap.LlapOutputFormatSerialization trace:outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)childOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators (org.apache.hadoop.hive.ql.exec.PTFOperator)childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571) at org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560) {code} > Kryo Exception while serializing plan for getSplits UDF call > > > Key: HIVE-24036 > URL: https://issues.apache.org/jira/browse/HIVE-24036 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > > {code:java} > Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.lang.IllegalArgumentException: Unable to create serializer > "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for > class: org.apache.hadoop.hive.llap.LlapOutputFormatCaused by: > org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.lang.IllegalArgumentException: Unable to create serializer > "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for > class: org.apache.hadoop.hive.llap.LlapOutputFormatSerialization > trace:outputFileFormatClass > (org.apache.hadoop.hive.ql.plan.TableDesc)tableInfo > (org.apache.hadoop.hive.ql.plan.FileSinkDesc)conf > (org.apache.hadoop.hive.ql.exec.FileSinkOperator)childOperators > (org.apache.hadoop.hive.ql.exec.UnionOperator)childOperators > (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators > (org.apache.hadoop.hive.ql.exec.MapJoinOperator)childOperators > (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators > (org.apache.hadoop.hive.ql.exec.PTFOperator)childOperators > (org.apache.hadoop.hive.ql.exec.SelectOperator) at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700) > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571) > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560) > >
[jira] [Assigned] (HIVE-24036) Kryo Exception while serializing plan for getSplits UDF call
[ https://issues.apache.org/jira/browse/HIVE-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R reassigned HIVE-24036: - Assignee: Naresh P R > Kryo Exception while serializing plan for getSplits UDF call > > > Key: HIVE-24036 > URL: https://issues.apache.org/jira/browse/HIVE-24036 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > > {code:java} > Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.lang.IllegalArgumentException: Unable to create serializer > "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for > class: org.apache.hadoop.hive.llap.LlapOutputFormatCaused by: > org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.lang.IllegalArgumentException: Unable to create serializer > "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for > class: org.apache.hadoop.hive.llap.LlapOutputFormatSerialization > trace:outputFileFormatClass > (org.apache.hadoop.hive.ql.plan.TableDesc)tableInfo > (org.apache.hadoop.hive.ql.plan.FileSinkDesc)conf > (org.apache.hadoop.hive.ql.exec.FileSinkOperator)childOperators > (org.apache.hadoop.hive.ql.exec.UnionOperator)childOperators > (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators > (org.apache.hadoop.hive.ql.exec.MapJoinOperator)childOperators > (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators > (org.apache.hadoop.hive.ql.exec.PTFOperator)childOperators > (org.apache.hadoop.hive.ql.exec.SelectOperator) at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700) > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571) > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava
[ https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176752#comment-17176752 ] zhengchenyu commented on HIVE-22126: [~euigeun_chung] I found another problem. In deriveRowType function, the change in this patch will result in dead loop, then throw OOM. the variable 'name' is not changed, so the loop never exit. > hive-exec packaging should shade guava > -- > > Key: HIVE-22126 > URL: https://issues.apache.org/jira/browse/HIVE-22126 > Project: Hive > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Eugene Chung >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, > HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, > HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, > HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, > HIVE-22126.09.patch, HIVE-22126.09.patch > > > The ql/pom.xml includes complete guava library into hive-exec.jar > https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a > problems for downstream clients of hive which have hive-exec.jar in their > classpath since they are pinned to the same guava version as that of hive. > We should shade guava classes so that other components which depend on > hive-exec can independently use a different version of guava as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava
[ https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176751#comment-17176751 ] zhengchenyu commented on HIVE-22126: [~kgyrtkirk] Yeah, I decompile the jar, found duplicated calcite, I slove this problem. > hive-exec packaging should shade guava > -- > > Key: HIVE-22126 > URL: https://issues.apache.org/jira/browse/HIVE-22126 > Project: Hive > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Eugene Chung >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, > HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, > HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, > HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, > HIVE-22126.09.patch, HIVE-22126.09.patch > > > The ql/pom.xml includes complete guava library into hive-exec.jar > https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a > problems for downstream clients of hive which have hive-exec.jar in their > classpath since they are pinned to the same guava version as that of hive. > We should shade guava classes so that other components which depend on > hive-exec can independently use a different version of guava as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21488) Update Apache Parquet 1.10.1
[ https://issues.apache.org/jira/browse/HIVE-21488?focusedWorklogId=470020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-470020 ] ASF GitHub Bot logged work on HIVE-21488: - Author: ASF GitHub Bot Created on: 13/Aug/20 00:38 Start Date: 13/Aug/20 00:38 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #576: URL: https://github.com/apache/hive/pull/576#issuecomment-673180505 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 470020) Time Spent: 1h (was: 50m) > Update Apache Parquet 1.10.1 > > > Key: HIVE-21488 > URL: https://issues.apache.org/jira/browse/HIVE-21488 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.3.4 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: parquet, pull-request-available > Attachments: HIVE-21488.patch > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21011) Upgrade MurmurHash 2.0 to 3.0 in vectorized map and reduce operators
[ https://issues.apache.org/jira/browse/HIVE-21011?focusedWorklogId=470019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-470019 ] ASF GitHub Bot logged work on HIVE-21011: - Author: ASF GitHub Bot Created on: 13/Aug/20 00:38 Start Date: 13/Aug/20 00:38 Worklog Time Spent: 10m Work Description: github-actions[bot] commented on pull request #503: URL: https://github.com/apache/hive/pull/503#issuecomment-673180511 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the d...@hive.apache.org list if the patch is in need of reviews. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 470019) Time Spent: 0.5h (was: 20m) > Upgrade MurmurHash 2.0 to 3.0 in vectorized map and reduce operators > > > Key: HIVE-21011 > URL: https://issues.apache.org/jira/browse/HIVE-21011 > Project: Hive > Issue Type: Improvement >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21011.1.patch, HIVE-21011.2.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > HIVE-20873 improved map join performance by using MurmurHash 3.0. However, > there's more operators that can use it. VectorMapJoinCommonOperator and > VectorReduceSinkUniformHashOperator use MurmurHash 2.0, so it can be upgraded > to MurmurHash 3.0. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23958) HiveServer2 should support additional keystore/truststores types besides JKS
[ https://issues.apache.org/jira/browse/HIVE-23958?focusedWorklogId=470006=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-470006 ] ASF GitHub Bot logged work on HIVE-23958: - Author: ASF GitHub Bot Created on: 13/Aug/20 00:08 Start Date: 13/Aug/20 00:08 Worklog Time Spent: 10m Work Description: risdenk commented on pull request #1342: URL: https://github.com/apache/hive/pull/1342#issuecomment-673169549 closed via https://github.com/apache/hive/commit/2b3c689baff857c18164a9610f2854583105734a This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 470006) Time Spent: 1h (was: 50m) > HiveServer2 should support additional keystore/truststores types besides JKS > > > Key: HIVE-23958 > URL: https://issues.apache.org/jira/browse/HIVE-23958 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Kevin Risden >Assignee: Kevin Risden >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Currently HiveServer2 (through Jetty and Thrift) only supports JKS (and > PCKS12 based on JDK fallback) keystore/truststore types. There are additional > keystore/truststore types used for different applications like for FIPS > crypto algorithms. HS2 should support the default keystore type specified for > the JDK and not always use JKS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23958) HiveServer2 should support additional keystore/truststores types besides JKS
[ https://issues.apache.org/jira/browse/HIVE-23958?focusedWorklogId=470007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-470007 ] ASF GitHub Bot logged work on HIVE-23958: - Author: ASF GitHub Bot Created on: 13/Aug/20 00:08 Start Date: 13/Aug/20 00:08 Worklog Time Spent: 10m Work Description: risdenk closed pull request #1342: URL: https://github.com/apache/hive/pull/1342 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 470007) Time Spent: 1h 10m (was: 1h) > HiveServer2 should support additional keystore/truststores types besides JKS > > > Key: HIVE-23958 > URL: https://issues.apache.org/jira/browse/HIVE-23958 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Kevin Risden >Assignee: Kevin Risden >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently HiveServer2 (through Jetty and Thrift) only supports JKS (and > PCKS12 based on JDK fallback) keystore/truststore types. There are additional > keystore/truststore types used for different applications like for FIPS > crypto algorithms. HS2 should support the default keystore type specified for > the JDK and not always use JKS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24035) Add Jenkinsfile for branch-2.3
[ https://issues.apache.org/jira/browse/HIVE-24035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned HIVE-24035: --- > Add Jenkinsfile for branch-2.3 > -- > > Key: HIVE-24035 > URL: https://issues.apache.org/jira/browse/HIVE-24035 > Project: Hive > Issue Type: Test >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > > To enable precommit tests for github PR, we need to have a Jenkinsfile in the > repo. This is already done for master and branch-2. This adds the same for > branch-2.3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23980) Shade guava from existing Hive versions
[ https://issues.apache.org/jira/browse/HIVE-23980?focusedWorklogId=469991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469991 ] ASF GitHub Bot logged work on HIVE-23980: - Author: ASF GitHub Bot Created on: 12/Aug/20 23:30 Start Date: 12/Aug/20 23:30 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #1397: URL: https://github.com/apache/hive/pull/1397#issuecomment-673159411 @viirya it probably means those tests were already failing previously. However I was not able to find a history for this. According to this [comment](https://issues.apache.org/jira/browse/HIVE-21790?focusedCommentId=17134282=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17134282), it seems many tests were failing after we enabled CI for branch-2. cc @belugabehr - wonder if you have any info on this. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469991) Time Spent: 1.5h (was: 1h 20m) > Shade guava from existing Hive versions > --- > > Key: HIVE-23980 > URL: https://issues.apache.org/jira/browse/HIVE-23980 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.7 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23980.01.branch-2.3.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > I'm trying to upgrade Guava version in Spark. The JIRA ticket is SPARK-32502. > Running test hits an error: > {code} > sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: > tried to access method > com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator; > from class org.apache.hadoop.hive.ql.exec.FetchOperator > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) > {code} > I know that hive-exec doesn't shade Guava until HIVE-22126 but that work > targets 4.0.0. I'm wondering if there is a solution for current Hive > versions, e.g. Hive 2.3.7? Any ideas? > Thanks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23980) Shade guava from existing Hive versions
[ https://issues.apache.org/jira/browse/HIVE-23980?focusedWorklogId=469976=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469976 ] ASF GitHub Bot logged work on HIVE-23980: - Author: ASF GitHub Bot Created on: 12/Aug/20 22:41 Start Date: 12/Aug/20 22:41 Worklog Time Spent: 10m Work Description: viirya commented on pull request #1397: URL: https://github.com/apache/hive/pull/1397#issuecomment-673145461 @sunchao But I saw some test failures? `There are 0 new tests failing, 636 existing failing and 231 skipped.` What existing failing means? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469976) Time Spent: 1h 20m (was: 1h 10m) > Shade guava from existing Hive versions > --- > > Key: HIVE-23980 > URL: https://issues.apache.org/jira/browse/HIVE-23980 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.7 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23980.01.branch-2.3.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > I'm trying to upgrade Guava version in Spark. The JIRA ticket is SPARK-32502. > Running test hits an error: > {code} > sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: > tried to access method > com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator; > from class org.apache.hadoop.hive.ql.exec.FetchOperator > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) > {code} > I know that hive-exec doesn't shade Guava until HIVE-22126 but that work > targets 4.0.0. I'm wondering if there is a solution for current Hive > versions, e.g. Hive 2.3.7? Any ideas? > Thanks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23980) Shade guava from existing Hive versions
[ https://issues.apache.org/jira/browse/HIVE-23980?focusedWorklogId=469959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469959 ] ASF GitHub Bot logged work on HIVE-23980: - Author: ASF GitHub Bot Created on: 12/Aug/20 21:55 Start Date: 12/Aug/20 21:55 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #1397: URL: https://github.com/apache/hive/pull/1397#issuecomment-673129939 @viirya seems there is no new testing failing which is good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469959) Time Spent: 1h 10m (was: 1h) > Shade guava from existing Hive versions > --- > > Key: HIVE-23980 > URL: https://issues.apache.org/jira/browse/HIVE-23980 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.7 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23980.01.branch-2.3.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > I'm trying to upgrade Guava version in Spark. The JIRA ticket is SPARK-32502. > Running test hits an error: > {code} > sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: > tried to access method > com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator; > from class org.apache.hadoop.hive.ql.exec.FetchOperator > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) > {code} > I know that hive-exec doesn't shade Guava until HIVE-22126 but that work > targets 4.0.0. I'm wondering if there is a solution for current Hive > versions, e.g. Hive 2.3.7? Any ideas? > Thanks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23980) Shade guava from existing Hive versions
[ https://issues.apache.org/jira/browse/HIVE-23980?focusedWorklogId=469941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469941 ] ASF GitHub Bot logged work on HIVE-23980: - Author: ASF GitHub Bot Created on: 12/Aug/20 21:09 Start Date: 12/Aug/20 21:09 Worklog Time Spent: 10m Work Description: viirya commented on pull request #1397: URL: https://github.com/apache/hive/pull/1397#issuecomment-673112654 cc @sunchao This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469941) Time Spent: 1h (was: 50m) > Shade guava from existing Hive versions > --- > > Key: HIVE-23980 > URL: https://issues.apache.org/jira/browse/HIVE-23980 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.7 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23980.01.branch-2.3.patch > > Time Spent: 1h > Remaining Estimate: 0h > > I'm trying to upgrade Guava version in Spark. The JIRA ticket is SPARK-32502. > Running test hits an error: > {code} > sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: > tried to access method > com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator; > from class org.apache.hadoop.hive.ql.exec.FetchOperator > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) > {code} > I know that hive-exec doesn't shade Guava until HIVE-22126 but that work > targets 4.0.0. I'm wondering if there is a solution for current Hive > versions, e.g. Hive 2.3.7? Any ideas? > Thanks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23980) Shade guava from existing Hive versions
[ https://issues.apache.org/jira/browse/HIVE-23980?focusedWorklogId=469940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469940 ] ASF GitHub Bot logged work on HIVE-23980: - Author: ASF GitHub Bot Created on: 12/Aug/20 21:09 Start Date: 12/Aug/20 21:09 Worklog Time Spent: 10m Work Description: viirya opened a new pull request #1397: URL: https://github.com/apache/hive/pull/1397 ### What changes were proposed in this pull request? This PR proposes to shade Guava from hive-exec in Hive branch-2. This is basically for triggering test for #1356. ### Why are the changes needed? When trying to upgrade Guava in Spark, found the following error. A Guava method became package-private since Guava version 20. So there is incompatibility with Guava versions > 19.0. ``` sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: tried to access method com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator; from class org.apache.hadoop.hive.ql.exec.FetchOperator at org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108) at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) ``` This is a problem for downstream clients. Hive project noticed that problem too in [HIVE-22126](https://issues.apache.org/jira/browse/HIVE-22126), however that only targets 4.0.0. It'd be nicer if we can also shade Guava from current Hive versions, e.g. Hive 2.3 line. ### Does this PR introduce _any_ user-facing change? Yes. Guava will be shaded from hive-exec. ### How was this patch tested? Built Hive locally and checked jar content. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469940) Time Spent: 50m (was: 40m) > Shade guava from existing Hive versions > --- > > Key: HIVE-23980 > URL: https://issues.apache.org/jira/browse/HIVE-23980 > Project: Hive > Issue Type: Bug >Affects Versions: 2.3.7 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23980.01.branch-2.3.patch > > Time Spent: 50m > Remaining Estimate: 0h > > I'm trying to upgrade Guava version in Spark. The JIRA ticket is SPARK-32502. > Running test hits an error: > {code} > sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: > tried to access method > com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator; > from class org.apache.hadoop.hive.ql.exec.FetchOperator > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) > {code} > I know that hive-exec doesn't shade Guava until HIVE-22126 but that work > targets 4.0.0. I'm wondering if there is a solution for current Hive > versions, e.g. Hive 2.3.7? Any ideas? > Thanks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24025) Add getAggrStatsFor to HS2 cache
[ https://issues.apache.org/jira/browse/HIVE-24025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez resolved HIVE-24025. Fix Version/s: 4.0.0 Resolution: Fixed Pushed to master, thanks for your contribution [~soumyakanti.das]! > Add getAggrStatsFor to HS2 cache > > > Key: HIVE-24025 > URL: https://issues.apache.org/jira/browse/HIVE-24025 > Project: Hive > Issue Type: New Feature > Components: HiveServer2 >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > getAggrColStats API takes a long time to run in HMS. Adding it to the HS2 > local cache can reduce the query compilation time significantly. > Local cache was introduced in https://issues.apache.org/jira/browse/HIVE-23949 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24025) Add getAggrStatsFor to HS2 cache
[ https://issues.apache.org/jira/browse/HIVE-24025?focusedWorklogId=469937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469937 ] ASF GitHub Bot logged work on HIVE-24025: - Author: ASF GitHub Bot Created on: 12/Aug/20 21:04 Start Date: 12/Aug/20 21:04 Worklog Time Spent: 10m Work Description: jcamachor merged pull request #1390: URL: https://github.com/apache/hive/pull/1390 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469937) Remaining Estimate: 0h Time Spent: 10m > Add getAggrStatsFor to HS2 cache > > > Key: HIVE-24025 > URL: https://issues.apache.org/jira/browse/HIVE-24025 > Project: Hive > Issue Type: New Feature > Components: HiveServer2 >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > getAggrColStats API takes a long time to run in HMS. Adding it to the HS2 > local cache can reduce the query compilation time significantly. > Local cache was introduced in https://issues.apache.org/jira/browse/HIVE-23949 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24025) Add getAggrStatsFor to HS2 cache
[ https://issues.apache.org/jira/browse/HIVE-24025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24025: -- Labels: pull-request-available (was: ) > Add getAggrStatsFor to HS2 cache > > > Key: HIVE-24025 > URL: https://issues.apache.org/jira/browse/HIVE-24025 > Project: Hive > Issue Type: New Feature > Components: HiveServer2 >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > getAggrColStats API takes a long time to run in HMS. Adding it to the HS2 > local cache can reduce the query compilation time significantly. > Local cache was introduced in https://issues.apache.org/jira/browse/HIVE-23949 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23927) Cast to Timestamp generates different output for Integer & Float values
[ https://issues.apache.org/jira/browse/HIVE-23927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176543#comment-17176543 ] Jesus Camacho Rodriguez commented on HIVE-23927: [~abstractdog], maybe not in the context of ORC-554 then. My point was that the same issue was faced in ORC since the logic was coming from Hive, and some sensible defaults to make the conversion uniform were chosen... We could use same defaults. > Cast to Timestamp generates different output for Integer & Float values > > > Key: HIVE-23927 > URL: https://issues.apache.org/jira/browse/HIVE-23927 > Project: Hive > Issue Type: Bug >Reporter: Renukaprasad C >Priority: Major > > Double consider the input value as SECOND and converts into Millis internally. > Whereas, Integer value will be considered as Millis and produce different > output. > org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(Object, > PrimitiveObjectInspector, boolean) - Handles Integral & Decimal values > differently. This cause the issue. > 0: jdbc:hive2://localhost:1> select cast(1.204135216E9 as timestamp) > Double2TimeStamp, cast(1204135216 as timestamp) Int2TimeStamp from abc > tablesample(1 rows); > OK > INFO : Compiling > command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14): > select cast(1.204135216E9 as timestamp) Double2TimeStamp, cast(1204135216 as > timestamp) Int2TimeStamp from abc tablesample(1 rows) > INFO : Concurrency mode is disabled, not creating a lock manager > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: > Schema(fieldSchemas:[FieldSchema(name:double2timestamp, type:timestamp, > comment:null), FieldSchema(name:int2timestamp, type:timestamp, > comment:null)], properties:null) > INFO : Completed compiling > command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14); > Time taken: 0.175 seconds > INFO : Concurrency mode is disabled, not creating a lock manager > INFO : Executing > command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14): > select cast(1.204135216E9 as timestamp) Double2TimeStamp, cast(1204135216 as > timestamp) Int2TimeStamp from abc tablesample(1 rows) > INFO : Completed executing > command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14); > Time taken: 0.001 seconds > INFO : OK > INFO : Concurrency mode is disabled, not creating a lock manager > ++--+ > |double2timestamp| int2timestamp | > ++--+ > | 2008-02-27 18:00:16.0 | 1970-01-14 22:28:55.216 | > ++--+ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24022) Optimise HiveMetaStoreAuthorizer.createHiveMetaStoreAuthorizer
[ https://issues.apache.org/jira/browse/HIVE-24022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176513#comment-17176513 ] Sam An commented on HIVE-24022: --- I am trying to use ThreadLocal to skip creating HiveConf each and every time createHiveMetastoreAuthorizer gets called. Currently there are 2 Unit Test failures. The root cause is in the TestHiveMetastoreAuthorizer unit test set up, the Config overlay was not visible inside the ThreadLocal hiveconf, so it is not using the DummyHiveAuthorizerFactory as Authorization manager, but instead look for SqlStdAuthorizationFactoryForTest, and it fails to find the class, causing failure. [https://github.com/apache/hive/blob/2b3c689baff857c18164a9610f2854583105734a/ql/src/test/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/TestHiveMetaStoreAuthorizer.java#L83] So I see 2 problem, 1. DummyHiveAuthorizerFactory is not passed in to ThreadLocal<> for unit test. 2. classloader not able to find SQLStdHiveAuthorizerFactoryForTest class in classpath. Without my change, it was using the DummyHiveAuthorizerFactory for unit test. So I should make the overlay happen first, then see if it can find the class DummyHiveAuthorizerFactory. > Optimise HiveMetaStoreAuthorizer.createHiveMetaStoreAuthorizer > -- > > Key: HIVE-24022 > URL: https://issues.apache.org/jira/browse/HIVE-24022 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Minor > Labels: performance, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > For a table with 3000+ partitions, analyze table takes a lot longer time as > HiveMetaStoreAuthorizer tries to create HiveConf for every partition request. > > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/HiveMetaStoreAuthorizer.java#L319] > > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/HiveMetaStoreAuthorizer.java#L447] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24030) Upgrade ORC to 1.5.10
[ https://issues.apache.org/jira/browse/HIVE-24030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved HIVE-24030. -- Fix Version/s: 4.0.0 Assignee: Dongjoon Hyun Resolution: Fixed > Upgrade ORC to 1.5.10 > - > > Key: HIVE-24030 > URL: https://issues.apache.org/jira/browse/HIVE-24030 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters
[ https://issues.apache.org/jira/browse/HIVE-23922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176473#comment-17176473 ] Panagiotis Garefalakis edited comment on HIVE-23922 at 8/12/20, 4:46 PM: - Thanks for contributing to the project [~hao.duan]! was (Author: pgaref): Thanks for your contribution [~hao.duan]! > Improve code quality, UDFArgumentException.getMessage Method requires only > two parameters > - > > Key: HIVE-23922 > URL: https://issues.apache.org/jira/browse/HIVE-23922 > Project: Hive > Issue Type: Improvement >Reporter: hao >Assignee: hao >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > [UDFArgumentException.getMessage] This method only needs two parameters, > message and methods. The rest parameters are not used -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters
[ https://issues.apache.org/jira/browse/HIVE-23922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176473#comment-17176473 ] Panagiotis Garefalakis commented on HIVE-23922: --- Thanks for your contribution [~hao.duan]! > Improve code quality, UDFArgumentException.getMessage Method requires only > two parameters > - > > Key: HIVE-23922 > URL: https://issues.apache.org/jira/browse/HIVE-23922 > Project: Hive > Issue Type: Improvement >Reporter: hao >Assignee: hao >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > [UDFArgumentException.getMessage] This method only needs two parameters, > message and methods. The rest parameters are not used -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters
[ https://issues.apache.org/jira/browse/HIVE-23922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis resolved HIVE-23922. --- Resolution: Fixed > Improve code quality, UDFArgumentException.getMessage Method requires only > two parameters > - > > Key: HIVE-23922 > URL: https://issues.apache.org/jira/browse/HIVE-23922 > Project: Hive > Issue Type: Improvement >Reporter: hao >Assignee: hao >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > [UDFArgumentException.getMessage] This method only needs two parameters, > message and methods. The rest parameters are not used -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters
[ https://issues.apache.org/jira/browse/HIVE-23922?focusedWorklogId=469799=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469799 ] ASF GitHub Bot logged work on HIVE-23922: - Author: ASF GitHub Bot Created on: 12/Aug/20 16:44 Start Date: 12/Aug/20 16:44 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1307: URL: https://github.com/apache/hive/pull/1307#discussion_r469396911 ## File path: udf/src/java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java ## @@ -65,16 +65,13 @@ public UDFArgumentException(String message, Class funcClass, List argTypeInfos, List methods) { -super(getMessage(message, funcClass, argTypeInfos, methods)); +super(getMessage(message, methods)); this.funcClass = funcClass; this.argTypeInfos = argTypeInfos; this.methods = methods; } - - private static String getMessage(String message, - Class funcClass, - List argTypeInfos, - List methods) { + //HIVE-23896 remove unnecessary parameter Review comment: Could you please remove the comment and replace with a new line? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469799) Time Spent: 1h 20m (was: 1h 10m) > Improve code quality, UDFArgumentException.getMessage Method requires only > two parameters > - > > Key: HIVE-23922 > URL: https://issues.apache.org/jira/browse/HIVE-23922 > Project: Hive > Issue Type: Improvement >Reporter: hao >Assignee: hao >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > [UDFArgumentException.getMessage] This method only needs two parameters, > message and methods. The rest parameters are not used -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters
[ https://issues.apache.org/jira/browse/HIVE-23922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Panagiotis Garefalakis reassigned HIVE-23922: - Assignee: hao > Improve code quality, UDFArgumentException.getMessage Method requires only > two parameters > - > > Key: HIVE-23922 > URL: https://issues.apache.org/jira/browse/HIVE-23922 > Project: Hive > Issue Type: Improvement >Reporter: hao >Assignee: hao >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > [UDFArgumentException.getMessage] This method only needs two parameters, > message and methods. The rest parameters are not used -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23958) HiveServer2 should support additional keystore/truststores types besides JKS
[ https://issues.apache.org/jira/browse/HIVE-23958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-23958: - Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) Fix has been pushed to master. Thank you for the patch [~krisden] > HiveServer2 should support additional keystore/truststores types besides JKS > > > Key: HIVE-23958 > URL: https://issues.apache.org/jira/browse/HIVE-23958 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Kevin Risden >Assignee: Kevin Risden >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Currently HiveServer2 (through Jetty and Thrift) only supports JKS (and > PCKS12 based on JDK fallback) keystore/truststore types. There are additional > keystore/truststore types used for different applications like for FIPS > crypto algorithms. HS2 should support the default keystore type specified for > the JDK and not always use JKS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23927) Cast to Timestamp generates different output for Integer & Float values
[ https://issues.apache.org/jira/browse/HIVE-23927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176438#comment-17176438 ] Panagiotis Garefalakis commented on HIVE-23927: --- I guess the main issue here is *PrimitiveObjectInspectorUtils.getTimestamp(Object, PrimitiveObjectInspector, boolean)*. For int: https://github.com/apache/hive/blob/6ceeea87a34f53add62fa6d0a332b06b8863c440/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritableV2.java#L531 *intToTimestampInSeconds = false * https://github.com/apache/hive/blob/1758c8c857f8a6dc4c9dc9c522de449f53e5e5cc/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java#L1181 While for double: https://github.com/apache/hive/blob/e6900fea9108b2dd00f0e4bf2a598f6fc9ba01cf/common/src/java/org/apache/hadoop/hive/common/type/TimestampUtils.java#L43 Not sure where the assumption that Double is in seconds comes from ? Maybe we should make this configurable as well -- as we do in *longToTimestamp* method > Cast to Timestamp generates different output for Integer & Float values > > > Key: HIVE-23927 > URL: https://issues.apache.org/jira/browse/HIVE-23927 > Project: Hive > Issue Type: Bug >Reporter: Renukaprasad C >Priority: Major > > Double consider the input value as SECOND and converts into Millis internally. > Whereas, Integer value will be considered as Millis and produce different > output. > org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(Object, > PrimitiveObjectInspector, boolean) - Handles Integral & Decimal values > differently. This cause the issue. > 0: jdbc:hive2://localhost:1> select cast(1.204135216E9 as timestamp) > Double2TimeStamp, cast(1204135216 as timestamp) Int2TimeStamp from abc > tablesample(1 rows); > OK > INFO : Compiling > command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14): > select cast(1.204135216E9 as timestamp) Double2TimeStamp, cast(1204135216 as > timestamp) Int2TimeStamp from abc tablesample(1 rows) > INFO : Concurrency mode is disabled, not creating a lock manager > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: > Schema(fieldSchemas:[FieldSchema(name:double2timestamp, type:timestamp, > comment:null), FieldSchema(name:int2timestamp, type:timestamp, > comment:null)], properties:null) > INFO : Completed compiling > command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14); > Time taken: 0.175 seconds > INFO : Concurrency mode is disabled, not creating a lock manager > INFO : Executing > command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14): > select cast(1.204135216E9 as timestamp) Double2TimeStamp, cast(1204135216 as > timestamp) Int2TimeStamp from abc tablesample(1 rows) > INFO : Completed executing > command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14); > Time taken: 0.001 seconds > INFO : OK > INFO : Concurrency mode is disabled, not creating a lock manager > ++--+ > |double2timestamp| int2timestamp | > ++--+ > | 2008-02-27 18:00:16.0 | 1970-01-14 22:28:55.216 | > ++--+ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24025) Add getAggrStatsFor to HS2 cache
[ https://issues.apache.org/jira/browse/HIVE-24025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Soumyakanti Das updated HIVE-24025: --- Summary: Add getAggrStatsFor to HS2 cache (was: Add getTable and getAggrStatsFor to HS2 cache) > Add getAggrStatsFor to HS2 cache > > > Key: HIVE-24025 > URL: https://issues.apache.org/jira/browse/HIVE-24025 > Project: Hive > Issue Type: New Feature > Components: HiveServer2 >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Minor > > getAggrColStats API takes a long time to run in HMS. Adding it to the HS2 > local cache can reduce the query compilation time significantly. > Local cache was introduced in https://issues.apache.org/jira/browse/HIVE-23949 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176406#comment-17176406 ] Jesus Camacho Rodriguez commented on HIVE-23965: +1 on removing old driver, since it fixes issues with the existing one. I do not think having the old one around adds much value and updating all those q files will be a pain. [~zabetak], [~kgyrtkirk], if this PR is ready to be merged, I think the removal can be done in a follow-up. > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters
[ https://issues.apache.org/jira/browse/HIVE-23922?focusedWorklogId=469749=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469749 ] ASF GitHub Bot logged work on HIVE-23922: - Author: ASF GitHub Bot Created on: 12/Aug/20 14:51 Start Date: 12/Aug/20 14:51 Worklog Time Spent: 10m Work Description: sunchao merged pull request #1307: URL: https://github.com/apache/hive/pull/1307 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469749) Time Spent: 1h 10m (was: 1h) > Improve code quality, UDFArgumentException.getMessage Method requires only > two parameters > - > > Key: HIVE-23922 > URL: https://issues.apache.org/jira/browse/HIVE-23922 > Project: Hive > Issue Type: Improvement >Reporter: hao >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > [UDFArgumentException.getMessage] This method only needs two parameters, > message and methods. The rest parameters are not used -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23993) Handle irrecoverable errors
[ https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176378#comment-17176378 ] Pravin Sinha commented on HIVE-23993: - +1 > Handle irrecoverable errors > --- > > Key: HIVE-23993 > URL: https://issues.apache.org/jira/browse/HIVE-23993 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, > HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, > HIVE-23993.06.patch, HIVE-23993.07.patch, HIVE-23993.08.patch, Retry Logic > for Replication.pdf > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23993) Handle irrecoverable errors
[ https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23993: --- Attachment: HIVE-23993.08.patch Status: Patch Available (was: In Progress) > Handle irrecoverable errors > --- > > Key: HIVE-23993 > URL: https://issues.apache.org/jira/browse/HIVE-23993 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, > HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, > HIVE-23993.06.patch, HIVE-23993.07.patch, HIVE-23993.08.patch, Retry Logic > for Replication.pdf > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23993) Handle irrecoverable errors
[ https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23993: --- Status: In Progress (was: Patch Available) > Handle irrecoverable errors > --- > > Key: HIVE-23993 > URL: https://issues.apache.org/jira/browse/HIVE-23993 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, > HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, > HIVE-23993.06.patch, HIVE-23993.07.patch, HIVE-23993.08.patch, Retry Logic > for Replication.pdf > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469734=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469734 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 14:26 Start Date: 12/Aug/20 14:26 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469300187 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java ## @@ -77,6 +75,211 @@ public void reset() { // Do not change the initial bytes which contain NumHashFunctions/NumBits! Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, bfBytes.length, (byte) 0); } + +public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector inputColumn, +int batchSize, boolean selectedInUse, int[] selected, Configuration conf) { + // already set in previous iterations, no need to call initExecutor again + if (numThreads == 0) { +return false; + } + if (executor == null) { +initExecutor(conf, batchSize); +if (!isParallel) { + return false; +} + } + + // split every bloom filter (represented by a part of a byte[]) across workers + for (int j = 0; j < batchSize; j++) { +if (!selectedInUse && inputColumn.noNulls) { + splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], + inputColumn.length[j]); +} else if (!selectedInUse) { + if (!inputColumn.isNull[j]) { +splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], +inputColumn.length[j]); + } +} else if (inputColumn.noNulls) { + int i = selected[j]; + splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], + inputColumn.length[i]); +} else { + int i = selected[j]; + if (!inputColumn.isNull[i]) { +splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], +inputColumn.length[i]); + } +} + } + + return true; +} + +private void initExecutor(Configuration conf, int batchSize) { + numThreads = conf.getInt(HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.varname, + HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.defaultIntVal); + LOG.info("Number of threads used for bloom filter merge: {}", numThreads); + + if (numThreads < 0) { +throw new RuntimeException( +"invalid number of threads for bloom filter merge: " + numThreads); + } + if (numThreads == 0) { // disable parallel feature +return; // this will leave isParallel=false + } + isParallel = true; + executor = Executors.newFixedThreadPool(numThreads); + + workers = new BloomFilterMergeWorker[numThreads]; + for (int f = 0; f < numThreads; f++) { +workers[f] = new BloomFilterMergeWorker(bfBytes, 0, bfBytes.length); + } + + for (int f = 0; f < numThreads; f++) { +executor.submit(workers[f]); + } +} + +public int getNumberOfWaitingMergeTasks(){ + int size = 0; + for (BloomFilterMergeWorker w : workers){ +size += w.queue.size(); + } + return size; +} + +public int getNumberOfMergingWorkers() { + int working = 0; + for (BloomFilterMergeWorker w : workers) { +if (w.isMerging.get()) { + working += 1; +} + } + return working; +} + +private static void splitVectorAcrossWorkers(BloomFilterMergeWorker[] workers, byte[] bytes, +int start, int length) { + if (bytes == null || length == 0) { +return; + } + /* + * This will split a byte[] across workers as below: + * let's say there are 10 workers for 7813 bytes, in this case + * length: 7813, elementPerBatch: 781 + * bytes assigned to workers: inclusive lower bound, exclusive upper bound + * 1. worker: 5 -> 786 + * 2. worker: 786 -> 1567 + * 3. worker: 1567 -> 2348 + * 4. worker: 2348 -> 3129 + * 5. worker: 3129 -> 3910 + * 6. worker: 3910 -> 4691 + * 7. worker: 4691 -> 5472 + * 8. worker: 5472 -> 6253 + * 9. worker: 6253 -> 7034 + * 10. worker: 7034 -> 7813 (last element per batch is: 779) + * + * This way, a particular worker will be given with the same part + * of all bloom filters along with the shared base bloom filter, + * so the bitwise OR function will not be a subject of threading/sync issues. + */ + int elementPerBatch = + (int) Math.ceil((double) (length -
[jira] [Updated] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore
[ https://issues.apache.org/jira/browse/HIVE-24032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-24032: --- Attachment: HIVE-24032.01.patch Status: Patch Available (was: In Progress) > Remove hadoop shims dependency and use FileSystem Api directly from > standalone metastore > > > Key: HIVE-24032 > URL: https://issues.apache.org/jira/browse/HIVE-24032 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24032.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore
[ https://issues.apache.org/jira/browse/HIVE-24032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-24032 started by Aasha Medhi. -- > Remove hadoop shims dependency and use FileSystem Api directly from > standalone metastore > > > Key: HIVE-24032 > URL: https://issues.apache.org/jira/browse/HIVE-24032 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24032.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore
[ https://issues.apache.org/jira/browse/HIVE-24032?focusedWorklogId=469713=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469713 ] ASF GitHub Bot logged work on HIVE-24032: - Author: ASF GitHub Bot Created on: 12/Aug/20 13:52 Start Date: 12/Aug/20 13:52 Worklog Time Spent: 10m Work Description: aasha opened a new pull request #1396: URL: https://github.com/apache/hive/pull/1396 …rectly from standalone metastore ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469713) Remaining Estimate: 0h Time Spent: 10m > Remove hadoop shims dependency and use FileSystem Api directly from > standalone metastore > > > Key: HIVE-24032 > URL: https://issues.apache.org/jira/browse/HIVE-24032 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore
[ https://issues.apache.org/jira/browse/HIVE-24032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-24032: -- Labels: pull-request-available (was: ) > Remove hadoop shims dependency and use FileSystem Api directly from > standalone metastore > > > Key: HIVE-24032 > URL: https://issues.apache.org/jira/browse/HIVE-24032 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24033) full outer join returns wrong number of results if hive.optimize.joinreducededuplication is enabled
[ https://issues.apache.org/jira/browse/HIVE-24033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176324#comment-17176324 ] Sebastian Klemke commented on HIVE-24033: - Execution plan of the failing query is here: [^failing_query_plan.txt] joinreducededuplication optimizer logs for this query: {code:java} 2020-07-31T14:42:41,542 DEBUG [89354899-5041-441a-ab6f-41e4eb1d3625 main] correlation.ReduceSinkJoinDeDuplication: Set RS[21] to forward data 2020-07-31T14:42:41,542 DEBUG [89354899-5041-441a-ab6f-41e4eb1d3625 main] correlation.ReduceSinkJoinDeDuplication: Set RS[20] to FIXED parallelism: 120 2020-07-31T14:42:41,542 DEBUG [89354899-5041-441a-ab6f-41e4eb1d3625 main] correlation.ReduceSinkJoinDeDuplication: Set RS[21] to FIXED parallelism: 120 2020-07-31T14:42:41,542 DEBUG [89354899-5041-441a-ab6f-41e4eb1d3625 main] correlation.ReduceSinkJoinDeDuplication: Set RS[17] to FIXED parallelism: 120 {code} > full outer join returns wrong number of results if > hive.optimize.joinreducededuplication is enabled > --- > > Key: HIVE-24033 > URL: https://issues.apache.org/jira/browse/HIVE-24033 > Project: Hive > Issue Type: Bug >Reporter: Sebastian Klemke >Priority: Major > Attachments: failing_query_plan.txt > > > We encountered a hive query that returns incorrect results, when joining two > CTEs on a group by value. The input tables `id_table` and > `reference_table` are unfortunately too large to share and on smaller tables > we have not been able to reproduce. > {code} > WITH ids AS ( > SELECT > record.id AS id > FROM > `id_table` > LATERAL VIEW explode(records) r AS record > WHERE > record.id = '5ef0bad74d325f72f0360c19' > LIMIT 1 > ), > refs AS ( > SELECT > reference['id'] AS referenceId > FROM > `reference_table` > WHERE > partition_date = '2020-06-24' > AND type = '1b0e9eb5c492d1859815410253dd79b5' > AND reference['id'] = '5ef0bad74d325f72f0360c19' > GROUP BY > reference['id'] > ) > SELECT > l.id AS id > , r.referenceId AS referenceId > FROM > ids l > FULL OUTER JOIN > refs r > ON > l.id = r.referenceId > {code} > This returns 2 rows, because the join clause misses: > {code} > OK > 5ef0bad74d325f72f0360c19NULL > NULL5ef0bad74d325f72f0360c19 > {code} > Instead, a single row should be returned. The correct behavior can be > achieved by either > * calling lower() on the refs group by statement (doesn't change the string > contents) > * setting hive.optimize.joinreducededuplication=false -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24033) full outer join returns wrong number of results if hive.optimize.joinreducededuplication is enabled
[ https://issues.apache.org/jira/browse/HIVE-24033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Klemke updated HIVE-24033: Attachment: failing_query_plan.txt > full outer join returns wrong number of results if > hive.optimize.joinreducededuplication is enabled > --- > > Key: HIVE-24033 > URL: https://issues.apache.org/jira/browse/HIVE-24033 > Project: Hive > Issue Type: Bug >Reporter: Sebastian Klemke >Priority: Major > Attachments: failing_query_plan.txt > > > We encountered a hive query that returns incorrect results, when joining two > CTEs on a group by value. The input tables `id_table` and > `reference_table` are unfortunately too large to share and on smaller tables > we have not been able to reproduce. > {code} > WITH ids AS ( > SELECT > record.id AS id > FROM > `id_table` > LATERAL VIEW explode(records) r AS record > WHERE > record.id = '5ef0bad74d325f72f0360c19' > LIMIT 1 > ), > refs AS ( > SELECT > reference['id'] AS referenceId > FROM > `reference_table` > WHERE > partition_date = '2020-06-24' > AND type = '1b0e9eb5c492d1859815410253dd79b5' > AND reference['id'] = '5ef0bad74d325f72f0360c19' > GROUP BY > reference['id'] > ) > SELECT > l.id AS id > , r.referenceId AS referenceId > FROM > ids l > FULL OUTER JOIN > refs r > ON > l.id = r.referenceId > {code} > This returns 2 rows, because the join clause misses: > {code} > OK > 5ef0bad74d325f72f0360c19NULL > NULL5ef0bad74d325f72f0360c19 > {code} > Instead, a single row should be returned. The correct behavior can be > achieved by either > * calling lower() on the refs group by statement (doesn't change the string > contents) > * setting hive.optimize.joinreducededuplication=false -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24031) Infinite planning time on syntactically big queries
[ https://issues.apache.org/jira/browse/HIVE-24031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176322#comment-17176322 ] Stamatis Zampetakis commented on HIVE-24031: I run the query from above with {{TestMiniLlapLocalCliDriver}} and the profiling ([^query_big_array_constructor.nps]) shows that the vast majority of time is spend on creating defensive copies of the node expression list inside ASTNode#getChildren. !ASTNode_getChildren_cost.png! The method is called extensively from various places in the code especially those walking over the expression tree so it needs to be efficient. I propose to drop the defensive copy (possibly protecting the list from modifications via an unmodiafable collection) and let clients do copies of the list if they deem necessary. In most of the cases, if not all, making copies of the list seems useless. > Infinite planning time on syntactically big queries > --- > > Key: HIVE-24031 > URL: https://issues.apache.org/jira/browse/HIVE-24031 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.0.0 > > Attachments: ASTNode_getChildren_cost.png, > query_big_array_constructor.nps > > > Syntactically big queries (~1 million tokens), such as the query shown below, > lead to very big (seemingly infinite) planning times. > {code:sql} > select posexplode(array('item1', 'item2', ..., 'item1M')); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24031) Infinite planning time on syntactically big queries
[ https://issues.apache.org/jira/browse/HIVE-24031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-24031: --- Attachment: ASTNode_getChildren_cost.png > Infinite planning time on syntactically big queries > --- > > Key: HIVE-24031 > URL: https://issues.apache.org/jira/browse/HIVE-24031 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.0.0 > > Attachments: ASTNode_getChildren_cost.png, > query_big_array_constructor.nps > > > Syntactically big queries (~1 million tokens), such as the query shown below, > lead to very big (seemingly infinite) planning times. > {code:sql} > select posexplode(array('item1', 'item2', ..., 'item1M')); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24031) Infinite planning time on syntactically big queries
[ https://issues.apache.org/jira/browse/HIVE-24031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-24031: --- Attachment: query_big_array_constructor.nps > Infinite planning time on syntactically big queries > --- > > Key: HIVE-24031 > URL: https://issues.apache.org/jira/browse/HIVE-24031 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.0.0 > > Attachments: query_big_array_constructor.nps > > > Syntactically big queries (~1 million tokens), such as the query shown below, > lead to very big (seemingly infinite) planning times. > {code:sql} > select posexplode(array('item1', 'item2', ..., 'item1M')); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469682 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 12:41 Start Date: 12/Aug/20 12:41 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469099535 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java ## @@ -1126,6 +1137,7 @@ protected void initializeOp(Configuration hconf) throws HiveException { VectorAggregateExpression vecAggrExpr = null; try { vecAggrExpr = ctor.newInstance(vecAggrDesc); + vecAggrExpr.withConf(hconf); Review comment: 1. constructor: first I tried to pass it to constructor, but that breaks compatibility with every other subclasses of VectorAggregateExpression, if I use ctor.newInstance(vecAggrDesc, hconf), I need to add that constructor to all subclasses, because they don't inherit this ctor from VectorAggregateExpression...withConf can solve this, let me know about better ways 2. single int: this config is specific to VectorUDAFBloomFilterMerge, so I believe I should not pass it through a constructor to every VectorAggregateExpression, and I didn't want to go for an instanceof hack for a cast + specific call This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469682) Time Spent: 6.5h (was: 6h 20m) > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 6.5h > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters (Reducer 2), as in: > [^lipwig-output3605036885489193068.svg] > {code} > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 3 .. llap SUCCEEDED 1 100 > 0 0 > Map 1 .. llap SUCCEEDED 1263 126300 > 0 0 > Reducer 2 llap RUNNING 1 010 > 0 0 > Map 4 llap RUNNING 6154 0 207 5947 > 0 0 > Reducer 5 llapINITED 43 00 43 > 0 0 > Reducer 6 llapINITED 1 001 > 0 0 > -- > VERTICES: 02/06 [>>--] 16% ELAPSED TIME: 149.98 s >
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469680 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 12:40 Start Date: 12/Aug/20 12:40 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469229142 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java ## @@ -517,6 +532,10 @@ public void close(boolean aborted) throws HiveException { } +//TODO: implement finishAggregators +protected void finishAggregators(boolean aborted) { Review comment: valid point, I need to recheck and learn how aggregators and aggretation buffers are paired together for a specific mode, it seems complicated for the first time This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469680) Time Spent: 6h 10m (was: 6h) > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 6h 10m > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters (Reducer 2), as in: > [^lipwig-output3605036885489193068.svg] > {code} > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 3 .. llap SUCCEEDED 1 100 > 0 0 > Map 1 .. llap SUCCEEDED 1263 126300 > 0 0 > Reducer 2 llap RUNNING 1 010 > 0 0 > Map 4 llap RUNNING 6154 0 207 5947 > 0 0 > Reducer 5 llapINITED 43 00 43 > 0 0 > Reducer 6 llapINITED 1 001 > 0 0 > -- > VERTICES: 02/06 [>>--] 16% ELAPSED TIME: 149.98 s > -- > {code} > For example, 70M entries in bloom filter leads to a 436 465 696 bits, so > merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR > operation, which is very hot codepath, but can be parallelized. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469681=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469681 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 12:40 Start Date: 12/Aug/20 12:40 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469229142 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java ## @@ -517,6 +532,10 @@ public void close(boolean aborted) throws HiveException { } +//TODO: implement finishAggregators +protected void finishAggregators(boolean aborted) { Review comment: valid point, I need to recheck and learn how aggregators and aggregation buffers are paired together for a specific mode, it seems complicated for the first time This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469681) Time Spent: 6h 20m (was: 6h 10m) > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 6h 20m > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters (Reducer 2), as in: > [^lipwig-output3605036885489193068.svg] > {code} > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 3 .. llap SUCCEEDED 1 100 > 0 0 > Map 1 .. llap SUCCEEDED 1263 126300 > 0 0 > Reducer 2 llap RUNNING 1 010 > 0 0 > Map 4 llap RUNNING 6154 0 207 5947 > 0 0 > Reducer 5 llapINITED 43 00 43 > 0 0 > Reducer 6 llapINITED 1 001 > 0 0 > -- > VERTICES: 02/06 [>>--] 16% ELAPSED TIME: 149.98 s > -- > {code} > For example, 70M entries in bloom filter leads to a 436 465 696 bits, so > merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR > operation, which is very hot codepath, but can be parallelized. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469678=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469678 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 12:37 Start Date: 12/Aug/20 12:37 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469227267 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java ## @@ -77,6 +75,211 @@ public void reset() { // Do not change the initial bytes which contain NumHashFunctions/NumBits! Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, bfBytes.length, (byte) 0); } + +public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector inputColumn, +int batchSize, boolean selectedInUse, int[] selected, Configuration conf) { + // already set in previous iterations, no need to call initExecutor again + if (numThreads == 0) { +return false; + } + if (executor == null) { +initExecutor(conf, batchSize); +if (!isParallel) { + return false; +} + } + + // split every bloom filter (represented by a part of a byte[]) across workers + for (int j = 0; j < batchSize; j++) { +if (!selectedInUse && inputColumn.noNulls) { + splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], + inputColumn.length[j]); +} else if (!selectedInUse) { + if (!inputColumn.isNull[j]) { +splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], +inputColumn.length[j]); + } +} else if (inputColumn.noNulls) { + int i = selected[j]; + splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], + inputColumn.length[i]); +} else { + int i = selected[j]; + if (!inputColumn.isNull[i]) { +splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], +inputColumn.length[i]); + } +} + } + + return true; +} + +private void initExecutor(Configuration conf, int batchSize) { + numThreads = conf.getInt(HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.varname, + HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.defaultIntVal); + LOG.info("Number of threads used for bloom filter merge: {}", numThreads); + + if (numThreads < 0) { +throw new RuntimeException( +"invalid number of threads for bloom filter merge: " + numThreads); + } + if (numThreads == 0) { // disable parallel feature +return; // this will leave isParallel=false + } + isParallel = true; + executor = Executors.newFixedThreadPool(numThreads); + + workers = new BloomFilterMergeWorker[numThreads]; + for (int f = 0; f < numThreads; f++) { +workers[f] = new BloomFilterMergeWorker(bfBytes, 0, bfBytes.length); + } + + for (int f = 0; f < numThreads; f++) { +executor.submit(workers[f]); + } +} + +public int getNumberOfWaitingMergeTasks(){ + int size = 0; + for (BloomFilterMergeWorker w : workers){ +size += w.queue.size(); + } + return size; +} + +public int getNumberOfMergingWorkers() { + int working = 0; + for (BloomFilterMergeWorker w : workers) { +if (w.isMerging.get()) { + working += 1; +} + } + return working; +} + +private static void splitVectorAcrossWorkers(BloomFilterMergeWorker[] workers, byte[] bytes, +int start, int length) { + if (bytes == null || length == 0) { +return; + } + /* + * This will split a byte[] across workers as below: + * let's say there are 10 workers for 7813 bytes, in this case + * length: 7813, elementPerBatch: 781 + * bytes assigned to workers: inclusive lower bound, exclusive upper bound + * 1. worker: 5 -> 786 + * 2. worker: 786 -> 1567 + * 3. worker: 1567 -> 2348 + * 4. worker: 2348 -> 3129 + * 5. worker: 3129 -> 3910 + * 6. worker: 3910 -> 4691 + * 7. worker: 4691 -> 5472 + * 8. worker: 5472 -> 6253 + * 9. worker: 6253 -> 7034 + * 10. worker: 7034 -> 7813 (last element per batch is: 779) + * + * This way, a particular worker will be given with the same part + * of all bloom filters along with the shared base bloom filter, + * so the bitwise OR function will not be a subject of threading/sync issues. + */ + int elementPerBatch = + (int) Math.ceil((double) (length -
[jira] [Commented] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176300#comment-17176300 ] Zoltan Haindrich commented on HIVE-23965: - * the description clearly describes that the metastore data is a composition of questionable quality stuff...so we are running our planning tests against some weird metastore content * I don't think adding more tests will increase test coverage - in this case we are talking about queries which are already run 2 times already - I've seen people updating q.out's like crazyso adding an extra 100 q.out-s will not neccessarily increase coverage... * the independence from having docker setup is a great - the new approach uses docker - but if that's a problem we could try to come up with some other approach - I'm wondering about using an archived derby database with metastore data * the metastore content lodader approach is quite unfortunate - IIRC once I had to fix up something in the loader once... because I made some changes to the column statistics I think we should remove the old approach...and run tests against the new; more-realistic schema. > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=469658=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469658 ] ASF GitHub Bot logged work on HIVE-23965: - Author: ASF GitHub Bot Created on: 12/Aug/20 12:03 Start Date: 12/Aug/20 12:03 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #1347: URL: https://github.com/apache/hive/pull/1347#discussion_r469206883 ## File path: itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezPerfCliDriver.java ## @@ -34,7 +34,7 @@ @RunWith(Parameterized.class) public class TestTezPerfCliDriver { - static CliAdapter adapter = new CliConfigs.TezPerfCliConfig(false).getCliAdapter(); + static CliAdapter adapter = new CliConfigs.TezCustomTPCDSCliConfig(false).getCliAdapter(); Review comment: you can do that - but I still don't see why do we need to add the keyword `custom` to the name of the test...it was around for a few years ...now we rename it and add "custom" to its name while there won't be a non-custom version anymore... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469658) Time Spent: 1h 10m (was: 1h) > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469647 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 11:21 Start Date: 12/Aug/20 11:21 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469186450 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java ## @@ -77,6 +75,211 @@ public void reset() { // Do not change the initial bytes which contain NumHashFunctions/NumBits! Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, bfBytes.length, (byte) 0); } + +public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector inputColumn, +int batchSize, boolean selectedInUse, int[] selected, Configuration conf) { + // already set in previous iterations, no need to call initExecutor again + if (numThreads == 0) { +return false; + } + if (executor == null) { +initExecutor(conf, batchSize); +if (!isParallel) { + return false; +} + } + + // split every bloom filter (represented by a part of a byte[]) across workers + for (int j = 0; j < batchSize; j++) { +if (!selectedInUse && inputColumn.noNulls) { + splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], + inputColumn.length[j]); +} else if (!selectedInUse) { + if (!inputColumn.isNull[j]) { +splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], +inputColumn.length[j]); + } +} else if (inputColumn.noNulls) { + int i = selected[j]; + splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], + inputColumn.length[i]); +} else { + int i = selected[j]; + if (!inputColumn.isNull[i]) { +splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], +inputColumn.length[i]); + } +} + } + + return true; +} + +private void initExecutor(Configuration conf, int batchSize) { Review comment: right, I'll remove This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469647) Time Spent: 5h 50m (was: 5h 40m) > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 5h 50m > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters (Reducer 2), as in: > [^lipwig-output3605036885489193068.svg] > {code} >
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469646 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 11:20 Start Date: 12/Aug/20 11:20 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469186380 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java ## @@ -77,6 +75,211 @@ public void reset() { // Do not change the initial bytes which contain NumHashFunctions/NumBits! Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, bfBytes.length, (byte) 0); } + +public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector inputColumn, +int batchSize, boolean selectedInUse, int[] selected, Configuration conf) { + // already set in previous iterations, no need to call initExecutor again + if (numThreads == 0) { +return false; + } + if (executor == null) { +initExecutor(conf, batchSize); +if (!isParallel) { + return false; +} + } + + // split every bloom filter (represented by a part of a byte[]) across workers + for (int j = 0; j < batchSize; j++) { +if (!selectedInUse && inputColumn.noNulls) { + splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], + inputColumn.length[j]); +} else if (!selectedInUse) { + if (!inputColumn.isNull[j]) { +splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], +inputColumn.length[j]); + } +} else if (inputColumn.noNulls) { + int i = selected[j]; + splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], + inputColumn.length[i]); +} else { + int i = selected[j]; + if (!inputColumn.isNull[i]) { +splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], +inputColumn.length[i]); + } +} + } + + return true; +} + +private void initExecutor(Configuration conf, int batchSize) { + numThreads = conf.getInt(HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.varname, + HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.defaultIntVal); + LOG.info("Number of threads used for bloom filter merge: {}", numThreads); + + if (numThreads < 0) { +throw new RuntimeException( +"invalid number of threads for bloom filter merge: " + numThreads); + } + if (numThreads == 0) { // disable parallel feature +return; // this will leave isParallel=false + } + isParallel = true; + executor = Executors.newFixedThreadPool(numThreads); + + workers = new BloomFilterMergeWorker[numThreads]; + for (int f = 0; f < numThreads; f++) { +workers[f] = new BloomFilterMergeWorker(bfBytes, 0, bfBytes.length); + } + + for (int f = 0; f < numThreads; f++) { +executor.submit(workers[f]); + } +} + +public int getNumberOfWaitingMergeTasks(){ + int size = 0; + for (BloomFilterMergeWorker w : workers){ +size += w.queue.size(); + } + return size; +} + +public int getNumberOfMergingWorkers() { + int working = 0; + for (BloomFilterMergeWorker w : workers) { +if (w.isMerging.get()) { + working += 1; +} + } + return working; +} + +private static void splitVectorAcrossWorkers(BloomFilterMergeWorker[] workers, byte[] bytes, +int start, int length) { + if (bytes == null || length == 0) { +return; + } + /* + * This will split a byte[] across workers as below: + * let's say there are 10 workers for 7813 bytes, in this case + * length: 7813, elementPerBatch: 781 + * bytes assigned to workers: inclusive lower bound, exclusive upper bound + * 1. worker: 5 -> 786 + * 2. worker: 786 -> 1567 + * 3. worker: 1567 -> 2348 + * 4. worker: 2348 -> 3129 + * 5. worker: 3129 -> 3910 + * 6. worker: 3910 -> 4691 + * 7. worker: 4691 -> 5472 + * 8. worker: 5472 -> 6253 + * 9. worker: 6253 -> 7034 + * 10. worker: 7034 -> 7813 (last element per batch is: 779) + * + * This way, a particular worker will be given with the same part + * of all bloom filters along with the shared base bloom filter, + * so the bitwise OR function will not be a subject of threading/sync issues. + */ + int elementPerBatch = + (int) Math.ceil((double) (length -
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469645=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469645 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 11:18 Start Date: 12/Aug/20 11:18 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469185446 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java ## @@ -77,6 +75,211 @@ public void reset() { // Do not change the initial bytes which contain NumHashFunctions/NumBits! Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, bfBytes.length, (byte) 0); } + +public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector inputColumn, +int batchSize, boolean selectedInUse, int[] selected, Configuration conf) { + // already set in previous iterations, no need to call initExecutor again + if (numThreads == 0) { +return false; + } + if (executor == null) { +initExecutor(conf, batchSize); +if (!isParallel) { + return false; +} + } + + // split every bloom filter (represented by a part of a byte[]) across workers + for (int j = 0; j < batchSize; j++) { +if (!selectedInUse && inputColumn.noNulls) { + splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], + inputColumn.length[j]); +} else if (!selectedInUse) { + if (!inputColumn.isNull[j]) { +splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], +inputColumn.length[j]); + } +} else if (inputColumn.noNulls) { + int i = selected[j]; + splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], + inputColumn.length[i]); +} else { + int i = selected[j]; + if (!inputColumn.isNull[i]) { +splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], +inputColumn.length[i]); + } +} + } + + return true; +} + +private void initExecutor(Configuration conf, int batchSize) { + numThreads = conf.getInt(HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.varname, + HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.defaultIntVal); + LOG.info("Number of threads used for bloom filter merge: {}", numThreads); + + if (numThreads < 0) { +throw new RuntimeException( +"invalid number of threads for bloom filter merge: " + numThreads); + } + if (numThreads == 0) { // disable parallel feature Review comment: good catch, I'm eliminating this check by initializing parallel behavior while initializing the aggregation buffer This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469645) Time Spent: 5.5h (was: 5h 20m) > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 5.5h > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469644 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 11:17 Start Date: 12/Aug/20 11:17 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469184919 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java ## @@ -77,6 +75,211 @@ public void reset() { // Do not change the initial bytes which contain NumHashFunctions/NumBits! Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, bfBytes.length, (byte) 0); } + Review comment: fortunately we won't need this, I've eliminated with boolean return hack according to another PR comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469644) Time Spent: 5h 20m (was: 5h 10m) > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters (Reducer 2), as in: > [^lipwig-output3605036885489193068.svg] > {code} > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 3 .. llap SUCCEEDED 1 100 > 0 0 > Map 1 .. llap SUCCEEDED 1263 126300 > 0 0 > Reducer 2 llap RUNNING 1 010 > 0 0 > Map 4 llap RUNNING 6154 0 207 5947 > 0 0 > Reducer 5 llapINITED 43 00 43 > 0 0 > Reducer 6 llapINITED 1 001 > 0 0 > -- > VERTICES: 02/06 [>>--] 16% ELAPSED TIME: 149.98 s > -- > {code} > For example, 70M entries in bloom filter leads to a 436 465 696 bits, so > merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR > operation, which is very hot codepath, but can be parallelized. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore
[ https://issues.apache.org/jira/browse/HIVE-24032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi reassigned HIVE-24032: -- > Remove hadoop shims dependency and use FileSystem Api directly from > standalone metastore > > > Key: HIVE-24032 > URL: https://issues.apache.org/jira/browse/HIVE-24032 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24031) Infinite planning time on syntactically big queries
[ https://issues.apache.org/jira/browse/HIVE-24031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176238#comment-17176238 ] Stamatis Zampetakis commented on HIVE-24031: Obviously such queries should be rather rare and usually can be avoided (using temporary tables and other tricks) but the fact that the planner is stuck processing the query indefinitely is a problem that should be addressed. > Infinite planning time on syntactically big queries > --- > > Key: HIVE-24031 > URL: https://issues.apache.org/jira/browse/HIVE-24031 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.0.0 > > > Syntactically big queries (~1 million tokens), such as the query shown below, > lead to very big (seemingly infinite) planning times. > {code:sql} > select posexplode(array('item1', 'item2', ..., 'item1M')); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24031) Infinite planning time on syntactically big queries
[ https://issues.apache.org/jira/browse/HIVE-24031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis reassigned HIVE-24031: -- > Infinite planning time on syntactically big queries > --- > > Key: HIVE-24031 > URL: https://issues.apache.org/jira/browse/HIVE-24031 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.0.0 > > > Syntactically big queries (~1 million tokens), such as the query shown below, > lead to very big (seemingly infinite) planning times. > {code:sql} > select posexplode(array('item1', 'item2', ..., 'item1M')); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23993) Handle irrecoverable errors
[ https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23993: --- Attachment: HIVE-23993.07.patch Status: Patch Available (was: In Progress) > Handle irrecoverable errors > --- > > Key: HIVE-23993 > URL: https://issues.apache.org/jira/browse/HIVE-23993 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, > HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, > HIVE-23993.06.patch, HIVE-23993.07.patch, Retry Logic for Replication.pdf > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23993) Handle irrecoverable errors
[ https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23993: --- Status: In Progress (was: Patch Available) > Handle irrecoverable errors > --- > > Key: HIVE-23993 > URL: https://issues.apache.org/jira/browse/HIVE-23993 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, > HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, > HIVE-23993.06.patch, HIVE-23993.07.patch, Retry Logic for Replication.pdf > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters
[ https://issues.apache.org/jira/browse/HIVE-23922?focusedWorklogId=469620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469620 ] ASF GitHub Bot logged work on HIVE-23922: - Author: ASF GitHub Bot Created on: 12/Aug/20 09:15 Start Date: 12/Aug/20 09:15 Worklog Time Spent: 10m Work Description: dh20 commented on pull request #1307: URL: https://github.com/apache/hive/pull/1307#issuecomment-672757291 @sunchao The test has passed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469620) Time Spent: 1h (was: 50m) > Improve code quality, UDFArgumentException.getMessage Method requires only > two parameters > - > > Key: HIVE-23922 > URL: https://issues.apache.org/jira/browse/HIVE-23922 > Project: Hive > Issue Type: Improvement >Reporter: hao >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > [UDFArgumentException.getMessage] This method only needs two parameters, > message and methods. The rest parameters are not used -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469619=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469619 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 09:13 Start Date: 12/Aug/20 09:13 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469120003 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java ## @@ -77,6 +75,211 @@ public void reset() { // Do not change the initial bytes which contain NumHashFunctions/NumBits! Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, bfBytes.length, (byte) 0); } + +public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector inputColumn, +int batchSize, boolean selectedInUse, int[] selected, Configuration conf) { + // already set in previous iterations, no need to call initExecutor again + if (numThreads == 0) { +return false; + } + if (executor == null) { +initExecutor(conf, batchSize); +if (!isParallel) { + return false; +} + } + + // split every bloom filter (represented by a part of a byte[]) across workers + for (int j = 0; j < batchSize; j++) { +if (!selectedInUse && inputColumn.noNulls) { + splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], + inputColumn.length[j]); +} else if (!selectedInUse) { + if (!inputColumn.isNull[j]) { Review comment: yeah, I missed this cleanup while mirroring the [original logic](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java#L132-L191) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469619) Time Spent: 5h 10m (was: 5h) > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 5h 10m > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters (Reducer 2), as in: > [^lipwig-output3605036885489193068.svg] > {code} > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 3 .. llap SUCCEEDED 1 100 > 0 0 > Map 1 .. llap SUCCEEDED 1263 126300 > 0 0 > Reducer 2 llap RUNNING 1 010 > 0 0 > Map 4
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469610 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 09:00 Start Date: 12/Aug/20 09:00 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469112026 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java ## @@ -77,6 +75,211 @@ public void reset() { // Do not change the initial bytes which contain NumHashFunctions/NumBits! Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, bfBytes.length, (byte) 0); } + +public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector inputColumn, +int batchSize, boolean selectedInUse, int[] selected, Configuration conf) { + // already set in previous iterations, no need to call initExecutor again + if (numThreads == 0) { +return false; + } + if (executor == null) { +initExecutor(conf, batchSize); +if (!isParallel) { + return false; +} + } + + // split every bloom filter (represented by a part of a byte[]) across workers + for (int j = 0; j < batchSize; j++) { +if (!selectedInUse && inputColumn.noNulls) { + splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], + inputColumn.length[j]); +} else if (!selectedInUse) { + if (!inputColumn.isNull[j]) { +splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], +inputColumn.length[j]); + } +} else if (inputColumn.noNulls) { + int i = selected[j]; + splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], + inputColumn.length[i]); +} else { + int i = selected[j]; + if (!inputColumn.isNull[i]) { +splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], +inputColumn.length[i]); + } +} + } + + return true; +} + +private void initExecutor(Configuration conf, int batchSize) { + numThreads = conf.getInt(HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.varname, + HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.defaultIntVal); + LOG.info("Number of threads used for bloom filter merge: {}", numThreads); + + if (numThreads < 0) { +throw new RuntimeException( +"invalid number of threads for bloom filter merge: " + numThreads); + } + if (numThreads == 0) { // disable parallel feature +return; // this will leave isParallel=false + } + isParallel = true; + executor = Executors.newFixedThreadPool(numThreads); + + workers = new BloomFilterMergeWorker[numThreads]; + for (int f = 0; f < numThreads; f++) { Review comment: cood catch, moving them to a single loop ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java ## @@ -77,6 +75,211 @@ public void reset() { // Do not change the initial bytes which contain NumHashFunctions/NumBits! Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, bfBytes.length, (byte) 0); } + +public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector inputColumn, +int batchSize, boolean selectedInUse, int[] selected, Configuration conf) { + // already set in previous iterations, no need to call initExecutor again + if (numThreads == 0) { +return false; + } + if (executor == null) { +initExecutor(conf, batchSize); +if (!isParallel) { + return false; +} + } + + // split every bloom filter (represented by a part of a byte[]) across workers + for (int j = 0; j < batchSize; j++) { +if (!selectedInUse && inputColumn.noNulls) { + splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], + inputColumn.length[j]); +} else if (!selectedInUse) { + if (!inputColumn.isNull[j]) { +splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], +inputColumn.length[j]); + } +} else if (inputColumn.noNulls) { + int i = selected[j]; + splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], + inputColumn.length[i]); +} else { + int i = selected[j]; + if (!inputColumn.isNull[i]) { +splitVectorAcrossWorkers(workers,
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469608=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469608 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 08:59 Start Date: 12/Aug/20 08:59 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469111595 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -4330,6 +4330,12 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal "Bloom filter should be of at max certain size to be effective"), TEZ_BLOOM_FILTER_FACTOR("hive.tez.bloom.filter.factor", (float) 1.0, "Bloom filter should be a multiple of this factor with nDV"), +TEZ_BLOOM_FILTER_MERGE_THREADS("hive.tez.bloom.filter.merge.threads", 1, +"How many threads are used for merging bloom filters?\n" ++ "-1: sanity check, it will fail if execution hits bloom filter merge codepath\n" ++ " 0: feature is disabled\n" Review comment: agree, added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469608) Time Spent: 4h 50m (was: 4h 40m) > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 4h 50m > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters (Reducer 2), as in: > [^lipwig-output3605036885489193068.svg] > {code} > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 3 .. llap SUCCEEDED 1 100 > 0 0 > Map 1 .. llap SUCCEEDED 1263 126300 > 0 0 > Reducer 2 llap RUNNING 1 010 > 0 0 > Map 4 llap RUNNING 6154 0 207 5947 > 0 0 > Reducer 5 llapINITED 43 00 43 > 0 0 > Reducer 6 llapINITED 1 001 > 0 0 > -- > VERTICES: 02/06 [>>--] 16% ELAPSED TIME: 149.98 s > -- > {code} > For example, 70M entries in bloom filter leads to a 436 465 696 bits, so > merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR > operation, which
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469605=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469605 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 08:58 Start Date: 12/Aug/20 08:58 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469110972 ## File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ## @@ -4330,6 +4330,12 @@ private static void populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal "Bloom filter should be of at max certain size to be effective"), TEZ_BLOOM_FILTER_FACTOR("hive.tez.bloom.filter.factor", (float) 1.0, "Bloom filter should be a multiple of this factor with nDV"), +TEZ_BLOOM_FILTER_MERGE_THREADS("hive.tez.bloom.filter.merge.threads", 1, +"How many threads are used for merging bloom filters?\n" Review comment: agree, adding that This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469605) Time Spent: 4h 40m (was: 4.5h) > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters (Reducer 2), as in: > [^lipwig-output3605036885489193068.svg] > {code} > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 3 .. llap SUCCEEDED 1 100 > 0 0 > Map 1 .. llap SUCCEEDED 1263 126300 > 0 0 > Reducer 2 llap RUNNING 1 010 > 0 0 > Map 4 llap RUNNING 6154 0 207 5947 > 0 0 > Reducer 5 llapINITED 43 00 43 > 0 0 > Reducer 6 llapINITED 1 001 > 0 0 > -- > VERTICES: 02/06 [>>--] 16% ELAPSED TIME: 149.98 s > -- > {code} > For example, 70M entries in bloom filter leads to a 436 465 696 bits, so > merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR > operation, which is very hot codepath, but can be parallelized. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469604=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469604 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 08:56 Start Date: 12/Aug/20 08:56 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469109686 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorAggregateExpression.java ## @@ -20,24 +20,25 @@ import java.io.Serializable; +import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hive.common.type.DataTypePhysicalVariation; import org.apache.hadoop.hive.ql.exec.vector.ColumnVector; import org.apache.hadoop.hive.ql.exec.vector.VectorAggregationBufferRow; import org.apache.hadoop.hive.ql.exec.vector.VectorAggregationDesc; import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; import org.apache.hadoop.hive.ql.metadata.HiveException; -import org.apache.hadoop.hive.ql.plan.AggregationDesc; import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator; -import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.Mode; /** * Base class for aggregation expressions. */ public abstract class VectorAggregateExpression implements Serializable { - + protected final Logger LOG = LoggerFactory.getLogger(getClass().getName()); Review comment: personally, I don't really like protected static Logger, because subclasses won't show the actual class name (only the parent) in this case, you're right, this LOG is not used in VectorUDAFBloomFilterMerge at all, it's useless leftover, I'm going to remove it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469604) Time Spent: 4.5h (was: 4h 20m) > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 4.5h > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters (Reducer 2), as in: > [^lipwig-output3605036885489193068.svg] > {code} > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 3 .. llap SUCCEEDED 1 100 > 0 0 > Map 1 .. llap SUCCEEDED 1263 126300 >
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469603 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 08:56 Start Date: 12/Aug/20 08:56 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469109686 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorAggregateExpression.java ## @@ -20,24 +20,25 @@ import java.io.Serializable; +import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hive.common.type.DataTypePhysicalVariation; import org.apache.hadoop.hive.ql.exec.vector.ColumnVector; import org.apache.hadoop.hive.ql.exec.vector.VectorAggregationBufferRow; import org.apache.hadoop.hive.ql.exec.vector.VectorAggregationDesc; import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression; import org.apache.hadoop.hive.ql.metadata.HiveException; -import org.apache.hadoop.hive.ql.plan.AggregationDesc; import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator; -import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.Mode; /** * Base class for aggregation expressions. */ public abstract class VectorAggregateExpression implements Serializable { - + protected final Logger LOG = LoggerFactory.getLogger(getClass().getName()); Review comment: personally, I don't really like protected static Logger, because subclasses won't show the actual class name (only the parent), but in this case, this LOG is not used in VectorUDAFBloomFilterMerge at all, it's useless leftover, I'm going to remove it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469603) Time Spent: 4h 20m (was: 4h 10m) > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters (Reducer 2), as in: > [^lipwig-output3605036885489193068.svg] > {code} > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 3 .. llap SUCCEEDED 1 100 > 0 0 > Map 1 .. llap SUCCEEDED 1263 126300 > 0
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469600=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469600 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 08:52 Start Date: 12/Aug/20 08:52 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469107566 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java ## @@ -77,6 +75,211 @@ public void reset() { // Do not change the initial bytes which contain NumHashFunctions/NumBits! Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, bfBytes.length, (byte) 0); } + +public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector inputColumn, +int batchSize, boolean selectedInUse, int[] selected, Configuration conf) { + // already set in previous iterations, no need to call initExecutor again + if (numThreads == 0) { +return false; + } + if (executor == null) { +initExecutor(conf, batchSize); +if (!isParallel) { + return false; +} + } + + // split every bloom filter (represented by a part of a byte[]) across workers + for (int j = 0; j < batchSize; j++) { +if (!selectedInUse && inputColumn.noNulls) { + splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], + inputColumn.length[j]); +} else if (!selectedInUse) { + if (!inputColumn.isNull[j]) { +splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], +inputColumn.length[j]); + } +} else if (inputColumn.noNulls) { + int i = selected[j]; + splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], + inputColumn.length[i]); +} else { + int i = selected[j]; + if (!inputColumn.isNull[i]) { +splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], +inputColumn.length[i]); + } +} + } + + return true; +} + +private void initExecutor(Configuration conf, int batchSize) { + numThreads = conf.getInt(HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.varname, + HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.defaultIntVal); + LOG.info("Number of threads used for bloom filter merge: {}", numThreads); + + if (numThreads < 0) { +throw new RuntimeException( +"invalid number of threads for bloom filter merge: " + numThreads); + } + if (numThreads == 0) { // disable parallel feature +return; // this will leave isParallel=false + } + isParallel = true; + executor = Executors.newFixedThreadPool(numThreads); + + workers = new BloomFilterMergeWorker[numThreads]; + for (int f = 0; f < numThreads; f++) { +workers[f] = new BloomFilterMergeWorker(bfBytes, 0, bfBytes.length); + } + + for (int f = 0; f < numThreads; f++) { +executor.submit(workers[f]); + } +} + +public int getNumberOfWaitingMergeTasks(){ + int size = 0; + for (BloomFilterMergeWorker w : workers){ +size += w.queue.size(); + } + return size; +} + +public int getNumberOfMergingWorkers() { + int working = 0; + for (BloomFilterMergeWorker w : workers) { +if (w.isMerging.get()) { + working += 1; +} + } + return working; +} + +private static void splitVectorAcrossWorkers(BloomFilterMergeWorker[] workers, byte[] bytes, +int start, int length) { + if (bytes == null || length == 0) { +return; + } + /* + * This will split a byte[] across workers as below: + * let's say there are 10 workers for 7813 bytes, in this case + * length: 7813, elementPerBatch: 781 + * bytes assigned to workers: inclusive lower bound, exclusive upper bound + * 1. worker: 5 -> 786 + * 2. worker: 786 -> 1567 + * 3. worker: 1567 -> 2348 + * 4. worker: 2348 -> 3129 + * 5. worker: 3129 -> 3910 + * 6. worker: 3910 -> 4691 + * 7. worker: 4691 -> 5472 + * 8. worker: 5472 -> 6253 + * 9. worker: 6253 -> 7034 + * 10. worker: 7034 -> 7813 (last element per batch is: 779) + * + * This way, a particular worker will be given with the same part + * of all bloom filters along with the shared base bloom filter, + * so the bitwise OR function will not be a subject of threading/sync issues. + */ + int elementPerBatch = + (int) Math.ceil((double) (length -
[jira] [Updated] (HIVE-23993) Handle irrecoverable errors
[ https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23993: --- Status: In Progress (was: Patch Available) > Handle irrecoverable errors > --- > > Key: HIVE-23993 > URL: https://issues.apache.org/jira/browse/HIVE-23993 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, > HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, > HIVE-23993.06.patch, Retry Logic for Replication.pdf > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23993) Handle irrecoverable errors
[ https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23993: --- Attachment: HIVE-23993.06.patch Status: Patch Available (was: In Progress) > Handle irrecoverable errors > --- > > Key: HIVE-23993 > URL: https://issues.apache.org/jira/browse/HIVE-23993 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, > HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, > HIVE-23993.06.patch, Retry Logic for Replication.pdf > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469596=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469596 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 08:44 Start Date: 12/Aug/20 08:44 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469102366 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java ## @@ -77,6 +75,211 @@ public void reset() { // Do not change the initial bytes which contain NumHashFunctions/NumBits! Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, bfBytes.length, (byte) 0); } + +public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector inputColumn, +int batchSize, boolean selectedInUse, int[] selected, Configuration conf) { + // already set in previous iterations, no need to call initExecutor again + if (numThreads == 0) { +return false; + } + if (executor == null) { +initExecutor(conf, batchSize); +if (!isParallel) { + return false; +} + } + + // split every bloom filter (represented by a part of a byte[]) across workers + for (int j = 0; j < batchSize; j++) { +if (!selectedInUse && inputColumn.noNulls) { + splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], + inputColumn.length[j]); +} else if (!selectedInUse) { + if (!inputColumn.isNull[j]) { +splitVectorAcrossWorkers(workers, inputColumn.vector[j], inputColumn.start[j], +inputColumn.length[j]); + } +} else if (inputColumn.noNulls) { + int i = selected[j]; + splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], + inputColumn.length[i]); +} else { + int i = selected[j]; + if (!inputColumn.isNull[i]) { +splitVectorAcrossWorkers(workers, inputColumn.vector[i], inputColumn.start[i], +inputColumn.length[i]); + } +} + } + + return true; +} + +private void initExecutor(Configuration conf, int batchSize) { + numThreads = conf.getInt(HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.varname, + HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.defaultIntVal); + LOG.info("Number of threads used for bloom filter merge: {}", numThreads); + + if (numThreads < 0) { +throw new RuntimeException( +"invalid number of threads for bloom filter merge: " + numThreads); + } + if (numThreads == 0) { // disable parallel feature +return; // this will leave isParallel=false + } + isParallel = true; + executor = Executors.newFixedThreadPool(numThreads); + + workers = new BloomFilterMergeWorker[numThreads]; + for (int f = 0; f < numThreads; f++) { +workers[f] = new BloomFilterMergeWorker(bfBytes, 0, bfBytes.length); + } + + for (int f = 0; f < numThreads; f++) { +executor.submit(workers[f]); + } +} + +public int getNumberOfWaitingMergeTasks(){ + int size = 0; + for (BloomFilterMergeWorker w : workers){ +size += w.queue.size(); + } + return size; +} + +public int getNumberOfMergingWorkers() { Review comment: yeah, only for logging, it was for validating my executor shutdown correctness...that can be misleading, I'm removing it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469596) Time Spent: 4h (was: 3h 50m) > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 4h > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected
[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
[ https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469594 ] ASF GitHub Bot logged work on HIVE-23880: - Author: ASF GitHub Bot Created on: 12/Aug/20 08:39 Start Date: 12/Aug/20 08:39 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #1280: URL: https://github.com/apache/hive/pull/1280#discussion_r469099535 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java ## @@ -1126,6 +1137,7 @@ protected void initializeOp(Configuration hconf) throws HiveException { VectorAggregateExpression vecAggrExpr = null; try { vecAggrExpr = ctor.newInstance(vecAggrDesc); + vecAggrExpr.withConf(hconf); Review comment: 1. constructor: first I tried to pass it to constructor, but that breaks compatibility with every other subclasses of VectorAggregateExpression, if I use ctor.newInstance(vecAggrDesc, hconf), I need to add that constructor to all subclasses, because they don't inherit this ctor from VectorAggregateExpression...withConf can solve this, let me know about better ways 2. single int: this config is specific to VectorUDAFBloomFilterMerge, I think I should pass it through a constructor to every VectorAggregateExpressio, and I didn't want to go for an instanceof hack for a cast + specific call This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469594) Time Spent: 3h 50m (was: 3h 40m) > Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge > --- > > Key: HIVE-23880 > URL: https://issues.apache.org/jira/browse/HIVE-23880 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: lipwig-output3605036885489193068.svg > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Merging bloom filters in semijoin reduction can become the main bottleneck in > case of large number of source mapper tasks (~1000, Map 1 in below example) > and a large amount of expected entries (50M) in bloom filters. > For example in TPCDS Q93: > {code} > select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ > ss_customer_sk > ,sum(act_sales) sumsales > from (select ss_item_sk > ,ss_ticket_number > ,ss_customer_sk > ,case when sr_return_quantity is not null then > (ss_quantity-sr_return_quantity)*ss_sales_price > else > (ss_quantity*ss_sales_price) end act_sales > from store_sales left outer join store_returns on (sr_item_sk = > ss_item_sk >and > sr_ticket_number = ss_ticket_number) > ,reason > where sr_reason_sk = r_reason_sk > and r_reason_desc = 'reason 66') t > group by ss_customer_sk > order by sumsales, ss_customer_sk > limit 100; > {code} > On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 > mins are spent with merging bloom filters (Reducer 2), as in: > [^lipwig-output3605036885489193068.svg] > {code} > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 3 .. llap SUCCEEDED 1 100 > 0 0 > Map 1 .. llap SUCCEEDED 1263 126300 > 0 0 > Reducer 2 llap RUNNING 1 010 > 0 0 > Map 4 llap RUNNING 6154 0 207 5947 > 0 0 > Reducer 5 llapINITED 43 00 43 > 0 0 > Reducer 6 llapINITED 1 001 > 0 0 > -- > VERTICES: 02/06 [>>--] 16% ELAPSED TIME: 149.98 s >
[jira] [Updated] (HIVE-23993) Handle irrecoverable errors
[ https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23993: --- Status: In Progress (was: Patch Available) > Handle irrecoverable errors > --- > > Key: HIVE-23993 > URL: https://issues.apache.org/jira/browse/HIVE-23993 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, > HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, Retry Logic > for Replication.pdf > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23993) Handle irrecoverable errors
[ https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23993: --- Attachment: HIVE-23993.05.patch Status: Patch Available (was: In Progress) > Handle irrecoverable errors > --- > > Key: HIVE-23993 > URL: https://issues.apache.org/jira/browse/HIVE-23993 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, > HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, Retry Logic > for Replication.pdf > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-23994) TestRetryable is unstable
[ https://issues.apache.org/jira/browse/HIVE-23994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi resolved HIVE-23994. Resolution: Fixed > TestRetryable is unstable > - > > Key: HIVE-23994 > URL: https://issues.apache.org/jira/browse/HIVE-23994 > Project: Hive > Issue Type: Bug >Reporter: Peter Varga >Assignee: Aasha Medhi >Priority: Major > > The flaky test check run: > [http://ci.hive.apache.org/job/hive-flaky-check/83/console] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23993) Handle irrecoverable errors
[ https://issues.apache.org/jira/browse/HIVE-23993?focusedWorklogId=469590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469590 ] ASF GitHub Bot logged work on HIVE-23993: - Author: ASF GitHub Bot Created on: 12/Aug/20 08:33 Start Date: 12/Aug/20 08:33 Worklog Time Spent: 10m Work Description: aasha commented on a change in pull request #1367: URL: https://github.com/apache/hive/pull/1367#discussion_r469095976 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java ## @@ -366,4 +371,26 @@ public static boolean includeAcidTableInDump(HiveConf conf) { public static boolean tableIncludedInReplScope(ReplScope replScope, String tableName) { return ((replScope == null) || replScope.tableIncludedInReplScope(tableName)); } + + public static boolean failedWithNonRecoverableError(Path dumpRoot, HiveConf conf) throws SemanticException { +if (dumpRoot == null) { Review comment: Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469590) Time Spent: 2h 10m (was: 2h) > Handle irrecoverable errors > --- > > Key: HIVE-23993 > URL: https://issues.apache.org/jira/browse/HIVE-23993 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, > HIVE-23993.03.patch, HIVE-23993.04.patch, Retry Logic for Replication.pdf > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs
[ https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176161#comment-17176161 ] Stamatis Zampetakis commented on HIVE-23965: Before finalizing this case it may be good to decide if we want to keep or delete the old TPC-DS drivers (or better say tests) namely TestTezPerfCliDriver, TestTezPerfConstraintsCliDriver as part of this issue. A few advantages/disadvantages of keeping the old tests are outlined below. +Advantages:+ * Better code coverage. Due to differences in stats, and configurations we end up with different plans so potentially we are covering different codepaths. * The old drivers can be run on local dev environment without the need of installing Docker. +Disadvantages+ * Maintenance cost. Now for every change in the planner we may need to update the results from three drivers (~300 queries). * Unrealistic plans. As mentioned previously the table stats are not obtained from a single TPC-DS scale factor so we in some cases we may never see this plans in practice; this can be seen also as a bug that we could possibly fix. * Test execution time. Obviously running three test suites instead of one is going to take more time and there is nothing that we can do about it. In case there is disagreement we can keep them for now and postpone the decision in another JIRA. Let me know what you think. > Improve plan regression tests using TPCDS30TB metastore dump and custom > configs > --- > > Key: HIVE-23965 > URL: https://issues.apache.org/jira/browse/HIVE-23965 > Project: Hive > Issue Type: Improvement >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > The existing regression tests (HIVE-12586) based on TPC-DS have certain > shortcomings: > The table statistics do not reflect cardinalities from a specific TPC-DS > scale factor (SF). Some tables are from a 30TB dataset, others from 200GB > dataset, and others from a 3GB dataset. This mix leads to plans that may > never appear when using an actual TPC-DS dataset. > The existing statistics do not contain information about partitions something > that can have a big impact on the resulting plans. > The existing regression tests rely on more or less on the default > configuration (hive-site.xml). In real-life scenarios though some of the > configurations differ and may impact the choices of the optimizer. > This issue aims to address the above shortcomings by using a curated > TPCDS30TB metastore dump along with some custom hive configurations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23993) Handle irrecoverable errors
[ https://issues.apache.org/jira/browse/HIVE-23993?focusedWorklogId=469569=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469569 ] ASF GitHub Bot logged work on HIVE-23993: - Author: ASF GitHub Bot Created on: 12/Aug/20 08:08 Start Date: 12/Aug/20 08:08 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1367: URL: https://github.com/apache/hive/pull/1367#discussion_r469081208 ## File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java ## @@ -35,20 +37,17 @@ import org.apache.hadoop.hive.ql.ErrorMsg; import org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder; import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils; -import org.apache.hadoop.hive.ql.metadata.Hive; import org.apache.hadoop.hive.ql.parse.repl.PathBuilder; import org.apache.hadoop.hive.ql.processors.CommandProcessorException; import org.apache.hadoop.hive.ql.util.DependencyResolver; +import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.security.UserGroupInformation; import org.junit.Assert; import org.junit.BeforeClass; import org.junit.Test; import javax.annotation.Nullable; -import java.io.BufferedReader; -import java.io.File; -import java.io.IOException; -import java.io.InputStreamReader; +import java.io.*; Review comment: Revert this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469569) Time Spent: 2h (was: 1h 50m) > Handle irrecoverable errors > --- > > Key: HIVE-23993 > URL: https://issues.apache.org/jira/browse/HIVE-23993 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, > HIVE-23993.03.patch, HIVE-23993.04.patch, Retry Logic for Replication.pdf > > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23927) Cast to Timestamp generates different output for Integer & Float values
[ https://issues.apache.org/jira/browse/HIVE-23927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176135#comment-17176135 ] László Bodor commented on HIVE-23927: - unfortunately, I cannot recall anything from ORC-554 which is related to this, in ORC-554 we handled an overflow case, where a float is not precise enough to represent a timestamp, and messes up the values in TimestampColumnVector ([fix is here|https://github.com/apache/orc/commit/7de945b080c5ca83b84397db105f70082a2107f4#diff-9090b54d59f8163ec2be71169d4813c8R1412-R1426]) this one is indeed not related to ORC/schemaevolution, but the reported problem is present on master, [as my repro shows|https://github.com/abstractdog/hive/commit/54ec318203#diff-219ede90fa98943fb8e1518350ff074dR36] > Cast to Timestamp generates different output for Integer & Float values > > > Key: HIVE-23927 > URL: https://issues.apache.org/jira/browse/HIVE-23927 > Project: Hive > Issue Type: Bug >Reporter: Renukaprasad C >Priority: Major > > Double consider the input value as SECOND and converts into Millis internally. > Whereas, Integer value will be considered as Millis and produce different > output. > org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(Object, > PrimitiveObjectInspector, boolean) - Handles Integral & Decimal values > differently. This cause the issue. > 0: jdbc:hive2://localhost:1> select cast(1.204135216E9 as timestamp) > Double2TimeStamp, cast(1204135216 as timestamp) Int2TimeStamp from abc > tablesample(1 rows); > OK > INFO : Compiling > command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14): > select cast(1.204135216E9 as timestamp) Double2TimeStamp, cast(1204135216 as > timestamp) Int2TimeStamp from abc tablesample(1 rows) > INFO : Concurrency mode is disabled, not creating a lock manager > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: > Schema(fieldSchemas:[FieldSchema(name:double2timestamp, type:timestamp, > comment:null), FieldSchema(name:int2timestamp, type:timestamp, > comment:null)], properties:null) > INFO : Completed compiling > command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14); > Time taken: 0.175 seconds > INFO : Concurrency mode is disabled, not creating a lock manager > INFO : Executing > command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14): > select cast(1.204135216E9 as timestamp) Double2TimeStamp, cast(1204135216 as > timestamp) Int2TimeStamp from abc tablesample(1 rows) > INFO : Completed executing > command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14); > Time taken: 0.001 seconds > INFO : OK > INFO : Concurrency mode is disabled, not creating a lock manager > ++--+ > |double2timestamp| int2timestamp | > ++--+ > | 2008-02-27 18:00:16.0 | 1970-01-14 22:28:55.216 | > ++--+ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava
[ https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176126#comment-17176126 ] Zoltan Haindrich commented on HIVE-22126: - [~zhengchenyu]: yes that's an option - however I've not seen this in hive-3 before - but for hive-4 the same was done (HIVE-23593) you may already know it - the issue behind this is that a shaded calcite may pull in thru reflection classes from the "non-shaded" version and that will wreck some havoc there was an attempt to make a better fix than that; but it's not yet read (HIVE-23772) > hive-exec packaging should shade guava > -- > > Key: HIVE-22126 > URL: https://issues.apache.org/jira/browse/HIVE-22126 > Project: Hive > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Eugene Chung >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, > HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, > HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, > HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, > HIVE-22126.09.patch, HIVE-22126.09.patch > > > The ql/pom.xml includes complete guava library into hive-exec.jar > https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a > problems for downstream clients of hive which have hive-exec.jar in their > classpath since they are pinned to the same guava version as that of hive. > We should shade guava classes so that other components which depend on > hive-exec can independently use a different version of guava as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters
[ https://issues.apache.org/jira/browse/HIVE-23922?focusedWorklogId=469561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469561 ] ASF GitHub Bot logged work on HIVE-23922: - Author: ASF GitHub Bot Created on: 12/Aug/20 07:39 Start Date: 12/Aug/20 07:39 Worklog Time Spent: 10m Work Description: dh20 commented on pull request #1307: URL: https://github.com/apache/hive/pull/1307#issuecomment-672693891 @sunchao thanks very mach! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469561) Time Spent: 50m (was: 40m) > Improve code quality, UDFArgumentException.getMessage Method requires only > two parameters > - > > Key: HIVE-23922 > URL: https://issues.apache.org/jira/browse/HIVE-23922 > Project: Hive > Issue Type: Improvement >Reporter: hao >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > [UDFArgumentException.getMessage] This method only needs two parameters, > message and methods. The rest parameters are not used -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23993) Handle irrecoverable errors
[ https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23993: --- Status: In Progress (was: Patch Available) > Handle irrecoverable errors > --- > > Key: HIVE-23993 > URL: https://issues.apache.org/jira/browse/HIVE-23993 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, > HIVE-23993.03.patch, HIVE-23993.04.patch, Retry Logic for Replication.pdf > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23993) Handle irrecoverable errors
[ https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aasha Medhi updated HIVE-23993: --- Attachment: HIVE-23993.04.patch Status: Patch Available (was: In Progress) > Handle irrecoverable errors > --- > > Key: HIVE-23993 > URL: https://issues.apache.org/jira/browse/HIVE-23993 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, > HIVE-23993.03.patch, HIVE-23993.04.patch, Retry Logic for Replication.pdf > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters
[ https://issues.apache.org/jira/browse/HIVE-23922?focusedWorklogId=469546=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469546 ] ASF GitHub Bot logged work on HIVE-23922: - Author: ASF GitHub Bot Created on: 12/Aug/20 07:13 Start Date: 12/Aug/20 07:13 Worklog Time Spent: 10m Work Description: dh20 commented on pull request #1307: URL: https://github.com/apache/hive/pull/1307#issuecomment-672672560 > can you remove `typeNames` as well in the method body? it is also not used. > @dh20 looks good - can you remove `typeNames` as well in the method body? it is also not used. Yes, my colleagues have submitted this question to pr [HIVE-23996] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469546) Time Spent: 0.5h (was: 20m) > Improve code quality, UDFArgumentException.getMessage Method requires only > two parameters > - > > Key: HIVE-23922 > URL: https://issues.apache.org/jira/browse/HIVE-23922 > Project: Hive > Issue Type: Improvement >Reporter: hao >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > [UDFArgumentException.getMessage] This method only needs two parameters, > message and methods. The rest parameters are not used -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters
[ https://issues.apache.org/jira/browse/HIVE-23922?focusedWorklogId=469548=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469548 ] ASF GitHub Bot logged work on HIVE-23922: - Author: ASF GitHub Bot Created on: 12/Aug/20 07:13 Start Date: 12/Aug/20 07:13 Worklog Time Spent: 10m Work Description: dh20 edited a comment on pull request #1307: URL: https://github.com/apache/hive/pull/1307#issuecomment-672672560 > can you remove `typeNames` as well in the method body? it is also not used. > @dh20 looks good - can you remove `typeNames` as well in the method body? it is also not used. @sunchao Yes, my colleagues have submitted this question to pr [HIVE-23996] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469548) Time Spent: 40m (was: 0.5h) > Improve code quality, UDFArgumentException.getMessage Method requires only > two parameters > - > > Key: HIVE-23922 > URL: https://issues.apache.org/jira/browse/HIVE-23922 > Project: Hive > Issue Type: Improvement >Reporter: hao >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > [UDFArgumentException.getMessage] This method only needs two parameters, > message and methods. The rest parameters are not used -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters
[ https://issues.apache.org/jira/browse/HIVE-23922?focusedWorklogId=469538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469538 ] ASF GitHub Bot logged work on HIVE-23922: - Author: ASF GitHub Bot Created on: 12/Aug/20 06:41 Start Date: 12/Aug/20 06:41 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #1307: URL: https://github.com/apache/hive/pull/1307#issuecomment-672641648 @dh20 looks good - can you remove `typeNames` as well in the method body? it is also not used. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469538) Time Spent: 20m (was: 10m) > Improve code quality, UDFArgumentException.getMessage Method requires only > two parameters > - > > Key: HIVE-23922 > URL: https://issues.apache.org/jira/browse/HIVE-23922 > Project: Hive > Issue Type: Improvement >Reporter: hao >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > [UDFArgumentException.getMessage] This method only needs two parameters, > message and methods. The rest parameters are not used -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23993) Handle irrecoverable errors
[ https://issues.apache.org/jira/browse/HIVE-23993?focusedWorklogId=469537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469537 ] ASF GitHub Bot logged work on HIVE-23993: - Author: ASF GitHub Bot Created on: 12/Aug/20 06:39 Start Date: 12/Aug/20 06:39 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1367: URL: https://github.com/apache/hive/pull/1367#discussion_r469036313 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java ## @@ -418,6 +420,27 @@ private void analyzeReplLoad(ASTNode ast) throws SemanticException { } } + private Path getLatestDumpPath() throws IOException { Review comment: We can reuse the same code in ReplDumpTask This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469537) Time Spent: 1h 50m (was: 1h 40m) > Handle irrecoverable errors > --- > > Key: HIVE-23993 > URL: https://issues.apache.org/jira/browse/HIVE-23993 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, > HIVE-23993.03.patch, Retry Logic for Replication.pdf > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters
[ https://issues.apache.org/jira/browse/HIVE-23922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23922: -- Labels: pull-request-available (was: ) > Improve code quality, UDFArgumentException.getMessage Method requires only > two parameters > - > > Key: HIVE-23922 > URL: https://issues.apache.org/jira/browse/HIVE-23922 > Project: Hive > Issue Type: Improvement >Reporter: hao >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > [UDFArgumentException.getMessage] This method only needs two parameters, > message and methods. The rest parameters are not used -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters
[ https://issues.apache.org/jira/browse/HIVE-23922?focusedWorklogId=469534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469534 ] ASF GitHub Bot logged work on HIVE-23922: - Author: ASF GitHub Bot Created on: 12/Aug/20 06:34 Start Date: 12/Aug/20 06:34 Worklog Time Spent: 10m Work Description: dh20 commented on pull request #1307: URL: https://github.com/apache/hive/pull/1307#issuecomment-672638988 > @dh20 can you fix the title and the JIRA number? [HIVE-23896](https://issues.apache.org/jira/browse/HIVE-23896) seems unrelated to this PR. @sunchao Sorry, I made a mistake in Jire number. Now I have corrected it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469534) Remaining Estimate: 0h Time Spent: 10m > Improve code quality, UDFArgumentException.getMessage Method requires only > two parameters > - > > Key: HIVE-23922 > URL: https://issues.apache.org/jira/browse/HIVE-23922 > Project: Hive > Issue Type: Improvement >Reporter: hao >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > [UDFArgumentException.getMessage] This method only needs two parameters, > message and methods. The rest parameters are not used -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23993) Handle irrecoverable errors
[ https://issues.apache.org/jira/browse/HIVE-23993?focusedWorklogId=469532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469532 ] ASF GitHub Bot logged work on HIVE-23993: - Author: ASF GitHub Bot Created on: 12/Aug/20 06:29 Start Date: 12/Aug/20 06:29 Worklog Time Spent: 10m Work Description: pkumarsinha commented on a change in pull request #1367: URL: https://github.com/apache/hive/pull/1367#discussion_r469032996 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java ## @@ -366,4 +371,26 @@ public static boolean includeAcidTableInDump(HiveConf conf) { public static boolean tableIncludedInReplScope(ReplScope replScope, String tableName) { return ((replScope == null) || replScope.tableIncludedInReplScope(tableName)); } + + public static boolean failedWithNonRecoverableError(Path dumpRoot, HiveConf conf) throws SemanticException { +if (dumpRoot == null) { Review comment: Is this also applicable during load? I mean, can the dumpRoot here be null in load case as well? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469532) Time Spent: 1h 40m (was: 1.5h) > Handle irrecoverable errors > --- > > Key: HIVE-23993 > URL: https://issues.apache.org/jira/browse/HIVE-23993 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, > HIVE-23993.03.patch, Retry Logic for Replication.pdf > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23998) Upgrave Guava to 27 for Hive 2.3
[ https://issues.apache.org/jira/browse/HIVE-23998?focusedWorklogId=469530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469530 ] ASF GitHub Bot logged work on HIVE-23998: - Author: ASF GitHub Bot Created on: 12/Aug/20 06:23 Start Date: 12/Aug/20 06:23 Worklog Time Spent: 10m Work Description: viirya commented on pull request #1395: URL: https://github.com/apache/hive/pull/1395#issuecomment-672634649 @sunchao Yeah, seems the tests was triggered. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469530) Time Spent: 4h 50m (was: 4h 40m) > Upgrave Guava to 27 for Hive 2.3 > > > Key: HIVE-23998 > URL: https://issues.apache.org/jira/browse/HIVE-23998 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.3.7 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23998.01.branch-2.3.patch > > Time Spent: 4h 50m > Remaining Estimate: 0h > > Try to upgrade Guava to 27.0-jre for Hive 2.3 branch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23998) Upgrave Guava to 27 for Hive 2.3
[ https://issues.apache.org/jira/browse/HIVE-23998?focusedWorklogId=469529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469529 ] ASF GitHub Bot logged work on HIVE-23998: - Author: ASF GitHub Bot Created on: 12/Aug/20 06:22 Start Date: 12/Aug/20 06:22 Worklog Time Spent: 10m Work Description: viirya commented on pull request #1395: URL: https://github.com/apache/hive/pull/1395#issuecomment-672634424 cc @sunchao This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469529) Time Spent: 4h 40m (was: 4.5h) > Upgrave Guava to 27 for Hive 2.3 > > > Key: HIVE-23998 > URL: https://issues.apache.org/jira/browse/HIVE-23998 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.3.7 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23998.01.branch-2.3.patch > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Try to upgrade Guava to 27.0-jre for Hive 2.3 branch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23998) Upgrave Guava to 27 for Hive 2.3
[ https://issues.apache.org/jira/browse/HIVE-23998?focusedWorklogId=469527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469527 ] ASF GitHub Bot logged work on HIVE-23998: - Author: ASF GitHub Bot Created on: 12/Aug/20 06:22 Start Date: 12/Aug/20 06:22 Worklog Time Spent: 10m Work Description: viirya opened a new pull request #1395: URL: https://github.com/apache/hive/pull/1395 ### What changes were proposed in this pull request? This PR proposes to upgrade Guava to 27 in Hive branch-2. This is basically used to trigger test for #1394. ### Why are the changes needed? When trying to upgrade Guava in Spark, found the following error. A Guava method became package-private since Guava version 20. So there is incompatibility with Guava versions > 19.0. ``` sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: tried to access method com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator; from class org.apache.hadoop.hive.ql.exec.FetchOperator at org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108) at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) ``` ### Does this PR introduce _any_ user-facing change? Yes. This upgrades Guava to 27. ### How was this patch tested? Built Hive locally. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469527) Time Spent: 4.5h (was: 4h 20m) > Upgrave Guava to 27 for Hive 2.3 > > > Key: HIVE-23998 > URL: https://issues.apache.org/jira/browse/HIVE-23998 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.3.7 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23998.01.branch-2.3.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > Try to upgrade Guava to 27.0-jre for Hive 2.3 branch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-23896) hiveserver2 not listening on any port, am i miss some configurations?
[ https://issues.apache.org/jira/browse/HIVE-23896?focusedWorklogId=469524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469524 ] ASF GitHub Bot logged work on HIVE-23896: - Author: ASF GitHub Bot Created on: 12/Aug/20 06:19 Start Date: 12/Aug/20 06:19 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #1307: URL: https://github.com/apache/hive/pull/1307#issuecomment-672633192 @dh20 can you fix the title and the JIRA number? HIVE-23896 seems unrelated to this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 469524) Time Spent: 1h 40m (was: 1.5h) > hiveserver2 not listening on any port, am i miss some configurations? > - > > Key: HIVE-23896 > URL: https://issues.apache.org/jira/browse/HIVE-23896 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.1.2 > Environment: hive: 3.1.2 > hadoop: 3.2.1, standalone, url: hdfs://namenode.hadoop.svc.cluster.local:9000 > {quote}$ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp > $ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse > {quote} > hadoop commands are workable in the hiveserver node(POD). > >Reporter: alanwake >Priority: Blocker > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > > > i try deply hive 3.1.2 on k8s. it was worked on version 2.3.2. > metastore node and postgres node are ok, but hiveserver look like i miss some > important configuration properties? > {code:java} > {code} > > > > {code:java} > [root@master hive]# ./get.sh > NAME READY STATUSRESTARTS AGE IP > NODE NOMINATED NODE READINESS GATES > hive-7bd48747d4-5zjmh1/1 Running 0 56s 10.244.3.110 > node03.51.local > metastore-66b58f9f76-6wsxj 1/1 Running 0 56s 10.244.3.109 > node03.51.local > postgres-57794b99b7-pqxwm1/1 Running 0 56s 10.244.2.241 > node02.51.local NAMETYPECLUSTER-IP > EXTERNAL-IP PORT(S) AGE SELECTOR > hiveNodePort10.108.40.17 > 10002:30626/TCP,1:31845/TCP 56s app=hive > metastore ClusterIP 10.106.159.220 9083/TCP >56s app=metastore > postgresClusterIP 10.108.85.47 5432/TCP >56s app=postgres > {code} > > > {code:java} > [root@master hive]# kubectl logs hive-7bd48747d4-5zjmh -n=hive > Configuring core > - Setting hadoop.proxyuser.hue.hosts=* > - Setting fs.defaultFS=hdfs://namenode.hadoop.svc.cluster.local:9000 > - Setting hadoop.http.staticuser.user=root > - Setting hadoop.proxyuser.hue.groups=* > Configuring hdfs > - Setting dfs.namenode.datanode.registration.ip-hostname-check=false > - Setting dfs.webhdfs.enabled=true > - Setting dfs.permissions.enabled=false > Configuring yarn > - Setting yarn.timeline-service.enabled=true > - Setting yarn.resourcemanager.system-metrics-publisher.enabled=true > - Setting > yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore > - Setting > yarn.log.server.url=http://historyserver.hadoop.svc.cluster.local:8188/applicationhistory/logs/ > - Setting yarn.resourcemanager.fs.state-store.uri=/rmstate > - Setting yarn.timeline-service.generic-application-history.enabled=true > - Setting yarn.log-aggregation-enable=true > - Setting > yarn.resourcemanager.hostname=resourcemanager.hadoop.svc.cluster.local > - Setting > yarn.resourcemanager.resource.tracker.address=resourcemanager.hadoop.svc.cluster.local:8031 > - Setting > yarn.timeline-service.hostname=historyserver.hadoop.svc.cluster.local > - Setting > yarn.resourcemanager.scheduler.address=resourcemanager.hadoop.svc.cluster.local:8030 > - Setting > yarn.resourcemanager.address=resourcemanager.hadoop.svc.cluster.local:8032 > - Setting yarn.nodemanager.remote-app-log-dir=/app-logs > - Setting yarn.resourcemanager.recovery.enabled=true > Configuring httpfs > Configuring kms > Configuring mapred > Configuring hive > - Setting datanucleus.autoCreateSchema=false > - Setting javax.jdo.option.ConnectionPassword=hive > - Setting hive.metastore.uris=thrift://metastore:9083 > - Setting >
[jira] [Updated] (HIVE-23995) Don't set location for managed tables in case of replication
[ https://issues.apache.org/jira/browse/HIVE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anishek Agarwal updated HIVE-23995: --- Resolution: Fixed Status: Resolved (was: Patch Available) committed to master , Thanks for the patch [~aasha] and review [~pkumarsinha] > Don't set location for managed tables in case of replication > > > Key: HIVE-23995 > URL: https://issues.apache.org/jira/browse/HIVE-23995 > Project: Hive > Issue Type: Task >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23995.01.patch, HIVE-23995.02.patch, > HIVE-23995.03.patch, HIVE-23995.04.patch, HIVE-23995.05.patch, > HIVE-23995.06.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Managed table location should not be set > Migration code of replication should be removed > add logging to all ack files > set hive.repl.data.copy.lazy to true -- This message was sent by Atlassian Jira (v8.3.4#803005)