[jira] [Updated] (HIVE-24036) Kryo Exception while serializing plan for getSplits UDF call

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24036:
--
Labels: pull-request-available  (was: )

> Kryo Exception while serializing plan for getSplits UDF call
> 
>
> Key: HIVE-24036
> URL: https://issues.apache.org/jira/browse/HIVE-24036
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java}
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.llap.LlapOutputFormat
> Serialization trace:
> outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
> tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
> conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.PTFOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)    
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700)
>   
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571)
>   
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24036) Kryo Exception while serializing plan for getSplits UDF call

2020-08-12 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-24036:
--
Status: Patch Available  (was: Open)

> Kryo Exception while serializing plan for getSplits UDF call
> 
>
> Key: HIVE-24036
> URL: https://issues.apache.org/jira/browse/HIVE-24036
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java}
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.llap.LlapOutputFormat
> Serialization trace:
> outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
> tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
> conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.PTFOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)    
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700)
>   
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571)
>   
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24036) Kryo Exception while serializing plan for getSplits UDF call

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24036?focusedWorklogId=470074=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-470074
 ]

ASF GitHub Bot logged work on HIVE-24036:
-

Author: ASF GitHub Bot
Created on: 13/Aug/20 04:41
Start Date: 13/Aug/20 04:41
Worklog Time Spent: 10m 
  Work Description: nareshpr opened a new pull request #1399:
URL: https://github.com/apache/hive/pull/1399


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 470074)
Remaining Estimate: 0h
Time Spent: 10m

> Kryo Exception while serializing plan for getSplits UDF call
> 
>
> Key: HIVE-24036
> URL: https://issues.apache.org/jira/browse/HIVE-24036
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java}
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.llap.LlapOutputFormat
> Serialization trace:
> outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
> tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
> conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.PTFOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)    
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700)
>   
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571)
>   
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24036) Kryo Exception while serializing plan for getSplits UDF call

2020-08-12 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-24036:
--
Description: 
{code:java}
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: org.apache.hadoop.hive.llap.LlapOutputFormat
Serialization trace:
outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
childOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.MapJoinOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.PTFOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)    
   at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700)
  
   at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571)
  
   at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560)
 
{code}

  was:
{code:java}
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: org.apache.hadoop.hive.llap.LlapOutputFormat
Serialization trace:outputFileFormatClass 
(org.apache.hadoop.hive.ql.plan.TableDesc)tableInfo 
(org.apache.hadoop.hive.ql.plan.FileSinkDesc)conf 
(org.apache.hadoop.hive.ql.exec.FileSinkOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.UnionOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.MapJoinOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.PTFOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)    at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700)
  at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571)
  at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560)
 
{code}


> Kryo Exception while serializing plan for getSplits UDF call
> 
>
> Key: HIVE-24036
> URL: https://issues.apache.org/jira/browse/HIVE-24036
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
> {code:java}
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.llap.LlapOutputFormat
> Serialization trace:
> outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
> tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
> conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.PTFOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)    
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700)
>   
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571)
>   
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24036) Kryo Exception while serializing plan for getSplits UDF call

2020-08-12 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-24036:
--
Description: 
{code:java}
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: org.apache.hadoop.hive.llap.LlapOutputFormat
Serialization trace:outputFileFormatClass 
(org.apache.hadoop.hive.ql.plan.TableDesc)tableInfo 
(org.apache.hadoop.hive.ql.plan.FileSinkDesc)conf 
(org.apache.hadoop.hive.ql.exec.FileSinkOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.UnionOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.MapJoinOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.PTFOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)    at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700)
  at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571)
  at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560)
 
{code}

  was:
{code:java}
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: org.apache.hadoop.hive.llap.LlapOutputFormatCaused by: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: org.apache.hadoop.hive.llap.LlapOutputFormatSerialization 
trace:outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)tableInfo 
(org.apache.hadoop.hive.ql.plan.FileSinkDesc)conf 
(org.apache.hadoop.hive.ql.exec.FileSinkOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.UnionOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.MapJoinOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.PTFOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)    at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700)
  at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571)
  at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560)
 
{code}


> Kryo Exception while serializing plan for getSplits UDF call
> 
>
> Key: HIVE-24036
> URL: https://issues.apache.org/jira/browse/HIVE-24036
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
> {code:java}
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.llap.LlapOutputFormat
> Serialization trace:outputFileFormatClass 
> (org.apache.hadoop.hive.ql.plan.TableDesc)tableInfo 
> (org.apache.hadoop.hive.ql.plan.FileSinkDesc)conf 
> (org.apache.hadoop.hive.ql.exec.FileSinkOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.UnionOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.MapJoinOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.PTFOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.SelectOperator)    at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24036) Kryo Exception while serializing plan for getSplits UDF call

2020-08-12 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-24036:
--
Description: 
{code:java}
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: org.apache.hadoop.hive.llap.LlapOutputFormatCaused by: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: org.apache.hadoop.hive.llap.LlapOutputFormatSerialization 
trace:outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)tableInfo 
(org.apache.hadoop.hive.ql.plan.FileSinkDesc)conf 
(org.apache.hadoop.hive.ql.exec.FileSinkOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.UnionOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.MapJoinOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.PTFOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)    at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700)
  at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571)
  at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560)
 
{code}

  was:
{code:java}
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: org.apache.hadoop.hive.llap.LlapOutputFormatCaused by: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: org.apache.hadoop.hive.llap.LlapOutputFormatSerialization 
trace:outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)tableInfo 
(org.apache.hadoop.hive.ql.plan.FileSinkDesc)conf 
(org.apache.hadoop.hive.ql.exec.FileSinkOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.UnionOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.MapJoinOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.PTFOperator)childOperators 
(org.apache.hadoop.hive.ql.exec.SelectOperator)    at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700)
  at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571)
  at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560)
 {code}


> Kryo Exception while serializing plan for getSplits UDF call
> 
>
> Key: HIVE-24036
> URL: https://issues.apache.org/jira/browse/HIVE-24036
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
> {code:java}
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.llap.LlapOutputFormatCaused by: 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.llap.LlapOutputFormatSerialization 
> trace:outputFileFormatClass 
> (org.apache.hadoop.hive.ql.plan.TableDesc)tableInfo 
> (org.apache.hadoop.hive.ql.plan.FileSinkDesc)conf 
> (org.apache.hadoop.hive.ql.exec.FileSinkOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.UnionOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.MapJoinOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.PTFOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.SelectOperator)    at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560)
>  
> 

[jira] [Assigned] (HIVE-24036) Kryo Exception while serializing plan for getSplits UDF call

2020-08-12 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R reassigned HIVE-24036:
-

Assignee: Naresh P R

> Kryo Exception while serializing plan for getSplits UDF call
> 
>
> Key: HIVE-24036
> URL: https://issues.apache.org/jira/browse/HIVE-24036
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
> {code:java}
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.llap.LlapOutputFormatCaused by: 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.llap.LlapOutputFormatSerialization 
> trace:outputFileFormatClass 
> (org.apache.hadoop.hive.ql.plan.TableDesc)tableInfo 
> (org.apache.hadoop.hive.ql.plan.FileSinkDesc)conf 
> (org.apache.hadoop.hive.ql.exec.FileSinkOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.UnionOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.MapJoinOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.PTFOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.SelectOperator)    at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571)
>   at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560)
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava

2020-08-12 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176752#comment-17176752
 ] 

zhengchenyu commented on HIVE-22126:


[~euigeun_chung] I found another problem. In deriveRowType function, the change 
in this patch will result in dead loop, then throw OOM. the variable 'name' is 
not changed, so the loop never exit.

> hive-exec packaging should shade guava
> --
>
> Key: HIVE-22126
> URL: https://issues.apache.org/jira/browse/HIVE-22126
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Eugene Chung
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, 
> HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, 
> HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch
>
>
> The ql/pom.xml includes complete guava library into hive-exec.jar 
> https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a 
> problems for downstream clients of hive which have hive-exec.jar in their 
> classpath since they are pinned to the same guava version as that of hive. 
> We should shade guava classes so that other components which depend on 
> hive-exec can independently use a different version of guava as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava

2020-08-12 Thread zhengchenyu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176751#comment-17176751
 ] 

zhengchenyu commented on HIVE-22126:


[~kgyrtkirk]  Yeah, I decompile the jar, found duplicated calcite, I slove this 
problem.

> hive-exec packaging should shade guava
> --
>
> Key: HIVE-22126
> URL: https://issues.apache.org/jira/browse/HIVE-22126
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Eugene Chung
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, 
> HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, 
> HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch
>
>
> The ql/pom.xml includes complete guava library into hive-exec.jar 
> https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a 
> problems for downstream clients of hive which have hive-exec.jar in their 
> classpath since they are pinned to the same guava version as that of hive. 
> We should shade guava classes so that other components which depend on 
> hive-exec can independently use a different version of guava as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21488) Update Apache Parquet 1.10.1

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21488?focusedWorklogId=470020=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-470020
 ]

ASF GitHub Bot logged work on HIVE-21488:
-

Author: ASF GitHub Bot
Created on: 13/Aug/20 00:38
Start Date: 13/Aug/20 00:38
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #576:
URL: https://github.com/apache/hive/pull/576#issuecomment-673180505


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 470020)
Time Spent: 1h  (was: 50m)

> Update Apache Parquet 1.10.1
> 
>
> Key: HIVE-21488
> URL: https://issues.apache.org/jira/browse/HIVE-21488
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.4
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: parquet, pull-request-available
> Attachments: HIVE-21488.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-21011) Upgrade MurmurHash 2.0 to 3.0 in vectorized map and reduce operators

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21011?focusedWorklogId=470019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-470019
 ]

ASF GitHub Bot logged work on HIVE-21011:
-

Author: ASF GitHub Bot
Created on: 13/Aug/20 00:38
Start Date: 13/Aug/20 00:38
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #503:
URL: https://github.com/apache/hive/pull/503#issuecomment-673180511


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 470019)
Time Spent: 0.5h  (was: 20m)

> Upgrade MurmurHash 2.0 to 3.0 in vectorized map and reduce operators
> 
>
> Key: HIVE-21011
> URL: https://issues.apache.org/jira/browse/HIVE-21011
> Project: Hive
>  Issue Type: Improvement
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21011.1.patch, HIVE-21011.2.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HIVE-20873 improved map join performance by using MurmurHash 3.0. However, 
> there's more operators that can use it. VectorMapJoinCommonOperator and 
> VectorReduceSinkUniformHashOperator use MurmurHash 2.0, so it can be upgraded 
> to MurmurHash 3.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23958) HiveServer2 should support additional keystore/truststores types besides JKS

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23958?focusedWorklogId=470006=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-470006
 ]

ASF GitHub Bot logged work on HIVE-23958:
-

Author: ASF GitHub Bot
Created on: 13/Aug/20 00:08
Start Date: 13/Aug/20 00:08
Worklog Time Spent: 10m 
  Work Description: risdenk commented on pull request #1342:
URL: https://github.com/apache/hive/pull/1342#issuecomment-673169549


   closed via 
https://github.com/apache/hive/commit/2b3c689baff857c18164a9610f2854583105734a



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 470006)
Time Spent: 1h  (was: 50m)

> HiveServer2 should support additional keystore/truststores types besides JKS
> 
>
> Key: HIVE-23958
> URL: https://issues.apache.org/jira/browse/HIVE-23958
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently HiveServer2 (through Jetty and Thrift) only supports JKS (and 
> PCKS12 based on JDK fallback) keystore/truststore types. There are additional 
> keystore/truststore types used for different applications like for FIPS 
> crypto algorithms. HS2 should support the default keystore type specified for 
> the JDK and not always use JKS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23958) HiveServer2 should support additional keystore/truststores types besides JKS

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23958?focusedWorklogId=470007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-470007
 ]

ASF GitHub Bot logged work on HIVE-23958:
-

Author: ASF GitHub Bot
Created on: 13/Aug/20 00:08
Start Date: 13/Aug/20 00:08
Worklog Time Spent: 10m 
  Work Description: risdenk closed pull request #1342:
URL: https://github.com/apache/hive/pull/1342


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 470007)
Time Spent: 1h 10m  (was: 1h)

> HiveServer2 should support additional keystore/truststores types besides JKS
> 
>
> Key: HIVE-23958
> URL: https://issues.apache.org/jira/browse/HIVE-23958
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently HiveServer2 (through Jetty and Thrift) only supports JKS (and 
> PCKS12 based on JDK fallback) keystore/truststore types. There are additional 
> keystore/truststore types used for different applications like for FIPS 
> crypto algorithms. HS2 should support the default keystore type specified for 
> the JDK and not always use JKS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24035) Add Jenkinsfile for branch-2.3

2020-08-12 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned HIVE-24035:
---


> Add Jenkinsfile for branch-2.3
> --
>
> Key: HIVE-24035
> URL: https://issues.apache.org/jira/browse/HIVE-24035
> Project: Hive
>  Issue Type: Test
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> To enable precommit tests for github PR, we need to have a Jenkinsfile in the 
> repo. This is already done for master and branch-2. This adds the same for 
> branch-2.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23980) Shade guava from existing Hive versions

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23980?focusedWorklogId=469991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469991
 ]

ASF GitHub Bot logged work on HIVE-23980:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 23:30
Start Date: 12/Aug/20 23:30
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1397:
URL: https://github.com/apache/hive/pull/1397#issuecomment-673159411


   @viirya it probably means those tests were already failing previously. 
However I was not able to find a history for this. According to this 
[comment](https://issues.apache.org/jira/browse/HIVE-21790?focusedCommentId=17134282=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17134282),
 it seems many tests were failing after we enabled CI for branch-2.
   
   cc @belugabehr - wonder if you have any info on this. Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469991)
Time Spent: 1.5h  (was: 1h 20m)

> Shade guava from existing Hive versions
> ---
>
> Key: HIVE-23980
> URL: https://issues.apache.org/jira/browse/HIVE-23980
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23980.01.branch-2.3.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I'm trying to upgrade Guava version in Spark. The JIRA ticket is SPARK-32502.
> Running test hits an error:
> {code}
> sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: 
> tried to access method 
> com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator;
>  from class org.apache.hadoop.hive.ql.exec.FetchOperator
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> {code}
> I know that hive-exec doesn't shade Guava until HIVE-22126 but that work 
> targets 4.0.0. I'm wondering if there is a solution for current Hive 
> versions, e.g. Hive 2.3.7? Any ideas?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23980) Shade guava from existing Hive versions

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23980?focusedWorklogId=469976=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469976
 ]

ASF GitHub Bot logged work on HIVE-23980:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 22:41
Start Date: 12/Aug/20 22:41
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #1397:
URL: https://github.com/apache/hive/pull/1397#issuecomment-673145461


   @sunchao But I saw some test failures?
   
   `There are 0 new tests failing, 636 existing failing and 231 skipped.`
   
   What existing failing means?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469976)
Time Spent: 1h 20m  (was: 1h 10m)

> Shade guava from existing Hive versions
> ---
>
> Key: HIVE-23980
> URL: https://issues.apache.org/jira/browse/HIVE-23980
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23980.01.branch-2.3.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I'm trying to upgrade Guava version in Spark. The JIRA ticket is SPARK-32502.
> Running test hits an error:
> {code}
> sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: 
> tried to access method 
> com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator;
>  from class org.apache.hadoop.hive.ql.exec.FetchOperator
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> {code}
> I know that hive-exec doesn't shade Guava until HIVE-22126 but that work 
> targets 4.0.0. I'm wondering if there is a solution for current Hive 
> versions, e.g. Hive 2.3.7? Any ideas?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23980) Shade guava from existing Hive versions

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23980?focusedWorklogId=469959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469959
 ]

ASF GitHub Bot logged work on HIVE-23980:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 21:55
Start Date: 12/Aug/20 21:55
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1397:
URL: https://github.com/apache/hive/pull/1397#issuecomment-673129939


   @viirya seems there is no new testing failing which is good.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469959)
Time Spent: 1h 10m  (was: 1h)

> Shade guava from existing Hive versions
> ---
>
> Key: HIVE-23980
> URL: https://issues.apache.org/jira/browse/HIVE-23980
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23980.01.branch-2.3.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I'm trying to upgrade Guava version in Spark. The JIRA ticket is SPARK-32502.
> Running test hits an error:
> {code}
> sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: 
> tried to access method 
> com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator;
>  from class org.apache.hadoop.hive.ql.exec.FetchOperator
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> {code}
> I know that hive-exec doesn't shade Guava until HIVE-22126 but that work 
> targets 4.0.0. I'm wondering if there is a solution for current Hive 
> versions, e.g. Hive 2.3.7? Any ideas?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23980) Shade guava from existing Hive versions

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23980?focusedWorklogId=469941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469941
 ]

ASF GitHub Bot logged work on HIVE-23980:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 21:09
Start Date: 12/Aug/20 21:09
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #1397:
URL: https://github.com/apache/hive/pull/1397#issuecomment-673112654


   cc @sunchao



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469941)
Time Spent: 1h  (was: 50m)

> Shade guava from existing Hive versions
> ---
>
> Key: HIVE-23980
> URL: https://issues.apache.org/jira/browse/HIVE-23980
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23980.01.branch-2.3.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I'm trying to upgrade Guava version in Spark. The JIRA ticket is SPARK-32502.
> Running test hits an error:
> {code}
> sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: 
> tried to access method 
> com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator;
>  from class org.apache.hadoop.hive.ql.exec.FetchOperator
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> {code}
> I know that hive-exec doesn't shade Guava until HIVE-22126 but that work 
> targets 4.0.0. I'm wondering if there is a solution for current Hive 
> versions, e.g. Hive 2.3.7? Any ideas?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23980) Shade guava from existing Hive versions

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23980?focusedWorklogId=469940=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469940
 ]

ASF GitHub Bot logged work on HIVE-23980:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 21:09
Start Date: 12/Aug/20 21:09
Worklog Time Spent: 10m 
  Work Description: viirya opened a new pull request #1397:
URL: https://github.com/apache/hive/pull/1397


   
   
   ### What changes were proposed in this pull request?
   
   
   This PR proposes to shade Guava from hive-exec in Hive branch-2. This is 
basically for triggering test for #1356.
   
   ### Why are the changes needed?
   
   
   When trying to upgrade Guava in Spark, found the following error. A Guava 
method became package-private since Guava version 20. So there is 
incompatibility with Guava versions > 19.0.
   
   ```
   sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: 
java.lang.IllegalAccessError: tried to access method 
com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator;
 from class org.apache.hadoop.hive.ql.exec.FetchOperator
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108)
at 
org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
   ```
   
   This is a problem for downstream clients. Hive project noticed that problem 
too in [HIVE-22126](https://issues.apache.org/jira/browse/HIVE-22126), however 
that only targets 4.0.0. It'd be nicer if we can also shade Guava from current 
Hive versions, e.g. Hive 2.3 line.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   Yes. Guava will be shaded from hive-exec.
   
   ### How was this patch tested?
   
   
   Built Hive locally and checked jar content.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469940)
Time Spent: 50m  (was: 40m)

> Shade guava from existing Hive versions
> ---
>
> Key: HIVE-23980
> URL: https://issues.apache.org/jira/browse/HIVE-23980
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.7
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23980.01.branch-2.3.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I'm trying to upgrade Guava version in Spark. The JIRA ticket is SPARK-32502.
> Running test hits an error:
> {code}
> sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: java.lang.IllegalAccessError: 
> tried to access method 
> com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator;
>  from class org.apache.hadoop.hive.ql.exec.FetchOperator
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> {code}
> I know that hive-exec doesn't shade Guava until HIVE-22126 but that work 
> targets 4.0.0. I'm wondering if there is a solution for current Hive 
> versions, e.g. Hive 2.3.7? Any ideas?
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24025) Add getAggrStatsFor to HS2 cache

2020-08-12 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-24025.

Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master, thanks for your contribution [~soumyakanti.das]!

> Add getAggrStatsFor to HS2 cache
> 
>
> Key: HIVE-24025
> URL: https://issues.apache.org/jira/browse/HIVE-24025
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> getAggrColStats API takes a long time to run in HMS. Adding it to the HS2 
> local cache can reduce the query compilation time significantly.
> Local cache was introduced in https://issues.apache.org/jira/browse/HIVE-23949



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24025) Add getAggrStatsFor to HS2 cache

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24025?focusedWorklogId=469937=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469937
 ]

ASF GitHub Bot logged work on HIVE-24025:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 21:04
Start Date: 12/Aug/20 21:04
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #1390:
URL: https://github.com/apache/hive/pull/1390


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469937)
Remaining Estimate: 0h
Time Spent: 10m

> Add getAggrStatsFor to HS2 cache
> 
>
> Key: HIVE-24025
> URL: https://issues.apache.org/jira/browse/HIVE-24025
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> getAggrColStats API takes a long time to run in HMS. Adding it to the HS2 
> local cache can reduce the query compilation time significantly.
> Local cache was introduced in https://issues.apache.org/jira/browse/HIVE-23949



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24025) Add getAggrStatsFor to HS2 cache

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24025:
--
Labels: pull-request-available  (was: )

> Add getAggrStatsFor to HS2 cache
> 
>
> Key: HIVE-24025
> URL: https://issues.apache.org/jira/browse/HIVE-24025
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> getAggrColStats API takes a long time to run in HMS. Adding it to the HS2 
> local cache can reduce the query compilation time significantly.
> Local cache was introduced in https://issues.apache.org/jira/browse/HIVE-23949



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23927) Cast to Timestamp generates different output for Integer & Float values

2020-08-12 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176543#comment-17176543
 ] 

Jesus Camacho Rodriguez commented on HIVE-23927:


[~abstractdog], maybe not in the context of ORC-554 then. My point was that the 
same issue was faced in ORC since the logic was coming from Hive, and some 
sensible defaults to make the conversion uniform were chosen... We could use 
same defaults.

> Cast to Timestamp generates different output for Integer & Float values 
> 
>
> Key: HIVE-23927
> URL: https://issues.apache.org/jira/browse/HIVE-23927
> Project: Hive
>  Issue Type: Bug
>Reporter: Renukaprasad C
>Priority: Major
>
> Double consider the input value as SECOND and converts into Millis internally.
> Whereas, Integer value will be considered as Millis and produce different 
> output.
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(Object,
>  PrimitiveObjectInspector, boolean) - Handles Integral & Decimal values 
> differently. This cause the issue.
> 0: jdbc:hive2://localhost:1> select cast(1.204135216E9 as timestamp) 
> Double2TimeStamp, cast(1204135216 as timestamp) Int2TimeStamp from abc 
> tablesample(1 rows);
> OK
> INFO  : Compiling 
> command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14): 
> select cast(1.204135216E9 as timestamp) Double2TimeStamp, cast(1204135216 as 
> timestamp) Int2TimeStamp from abc tablesample(1 rows)
> INFO  : Concurrency mode is disabled, not creating a lock manager
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:double2timestamp, type:timestamp, 
> comment:null), FieldSchema(name:int2timestamp, type:timestamp, 
> comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14); 
> Time taken: 0.175 seconds
> INFO  : Concurrency mode is disabled, not creating a lock manager
> INFO  : Executing 
> command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14): 
> select cast(1.204135216E9 as timestamp) Double2TimeStamp, cast(1204135216 as 
> timestamp) Int2TimeStamp from abc tablesample(1 rows)
> INFO  : Completed executing 
> command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14); 
> Time taken: 0.001 seconds
> INFO  : OK
> INFO  : Concurrency mode is disabled, not creating a lock manager
> ++--+
> |double2timestamp|  int2timestamp   |
> ++--+
> | 2008-02-27 18:00:16.0  | 1970-01-14 22:28:55.216  |
> ++--+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24022) Optimise HiveMetaStoreAuthorizer.createHiveMetaStoreAuthorizer

2020-08-12 Thread Sam An (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176513#comment-17176513
 ] 

Sam An commented on HIVE-24022:
---

I am trying to use ThreadLocal to skip creating HiveConf each and 
every time createHiveMetastoreAuthorizer gets called. Currently there are 2 
Unit Test failures. The root cause is in the TestHiveMetastoreAuthorizer unit 
test set up, the Config overlay was not visible inside the ThreadLocal 
hiveconf, so it is not using the DummyHiveAuthorizerFactory as Authorization 
manager, but instead look for SqlStdAuthorizationFactoryForTest, and it fails 
to find the class, causing failure.


[https://github.com/apache/hive/blob/2b3c689baff857c18164a9610f2854583105734a/ql/src/test/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/TestHiveMetaStoreAuthorizer.java#L83]

 

So I see 2 problem, 
1. DummyHiveAuthorizerFactory is not passed in to ThreadLocal<> for unit test. 
2. classloader not able to find SQLStdHiveAuthorizerFactoryForTest class in 
classpath.

Without my change, it was using the DummyHiveAuthorizerFactory for unit test. 
So I should make the overlay happen first, then see if it can find the class 
DummyHiveAuthorizerFactory.

> Optimise HiveMetaStoreAuthorizer.createHiveMetaStoreAuthorizer
> --
>
> Key: HIVE-24022
> URL: https://issues.apache.org/jira/browse/HIVE-24022
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Minor
>  Labels: performance, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For a table with 3000+ partitions, analyze table takes a lot longer time as 
> HiveMetaStoreAuthorizer tries to create HiveConf for every partition request.
>  
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/HiveMetaStoreAuthorizer.java#L319]
>  
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/metastore/HiveMetaStoreAuthorizer.java#L447]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24030) Upgrade ORC to 1.5.10

2020-08-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved HIVE-24030.
--
Fix Version/s: 4.0.0
 Assignee: Dongjoon Hyun
   Resolution: Fixed

> Upgrade ORC to 1.5.10
> -
>
> Key: HIVE-24030
> URL: https://issues.apache.org/jira/browse/HIVE-24030
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters

2020-08-12 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176473#comment-17176473
 ] 

Panagiotis Garefalakis edited comment on HIVE-23922 at 8/12/20, 4:46 PM:
-

Thanks for contributing to the project [~hao.duan]!


was (Author: pgaref):
Thanks for your contribution [~hao.duan]!

> Improve code quality, UDFArgumentException.getMessage Method requires only 
> two parameters
> -
>
> Key: HIVE-23922
> URL: https://issues.apache.org/jira/browse/HIVE-23922
> Project: Hive
>  Issue Type: Improvement
>Reporter: hao
>Assignee: hao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [UDFArgumentException.getMessage] This method only needs two parameters, 
> message and methods. The rest parameters are not used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters

2020-08-12 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176473#comment-17176473
 ] 

Panagiotis Garefalakis commented on HIVE-23922:
---

Thanks for your contribution [~hao.duan]!

> Improve code quality, UDFArgumentException.getMessage Method requires only 
> two parameters
> -
>
> Key: HIVE-23922
> URL: https://issues.apache.org/jira/browse/HIVE-23922
> Project: Hive
>  Issue Type: Improvement
>Reporter: hao
>Assignee: hao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [UDFArgumentException.getMessage] This method only needs two parameters, 
> message and methods. The rest parameters are not used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters

2020-08-12 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis resolved HIVE-23922.
---
Resolution: Fixed

> Improve code quality, UDFArgumentException.getMessage Method requires only 
> two parameters
> -
>
> Key: HIVE-23922
> URL: https://issues.apache.org/jira/browse/HIVE-23922
> Project: Hive
>  Issue Type: Improvement
>Reporter: hao
>Assignee: hao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [UDFArgumentException.getMessage] This method only needs two parameters, 
> message and methods. The rest parameters are not used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23922?focusedWorklogId=469799=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469799
 ]

ASF GitHub Bot logged work on HIVE-23922:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 16:44
Start Date: 12/Aug/20 16:44
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1307:
URL: https://github.com/apache/hive/pull/1307#discussion_r469396911



##
File path: udf/src/java/org/apache/hadoop/hive/ql/exec/UDFArgumentException.java
##
@@ -65,16 +65,13 @@ public UDFArgumentException(String message,
   Class funcClass,
   List argTypeInfos,
   List methods) {
-super(getMessage(message, funcClass, argTypeInfos, methods));
+super(getMessage(message, methods));
 this.funcClass = funcClass;
 this.argTypeInfos = argTypeInfos;
 this.methods = methods;
   }
-  
-  private static String getMessage(String message, 
-  Class funcClass,
-  List argTypeInfos,
-  List methods) {
+  //HIVE-23896 remove unnecessary parameter

Review comment:
   Could you please remove the comment and replace with a new line? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469799)
Time Spent: 1h 20m  (was: 1h 10m)

> Improve code quality, UDFArgumentException.getMessage Method requires only 
> two parameters
> -
>
> Key: HIVE-23922
> URL: https://issues.apache.org/jira/browse/HIVE-23922
> Project: Hive
>  Issue Type: Improvement
>Reporter: hao
>Assignee: hao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [UDFArgumentException.getMessage] This method only needs two parameters, 
> message and methods. The rest parameters are not used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters

2020-08-12 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-23922:
-

Assignee: hao

> Improve code quality, UDFArgumentException.getMessage Method requires only 
> two parameters
> -
>
> Key: HIVE-23922
> URL: https://issues.apache.org/jira/browse/HIVE-23922
> Project: Hive
>  Issue Type: Improvement
>Reporter: hao
>Assignee: hao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> [UDFArgumentException.getMessage] This method only needs two parameters, 
> message and methods. The rest parameters are not used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23958) HiveServer2 should support additional keystore/truststores types besides JKS

2020-08-12 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-23958:
-
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Fix has been pushed to master. Thank you for the patch [~krisden]

> HiveServer2 should support additional keystore/truststores types besides JKS
> 
>
> Key: HIVE-23958
> URL: https://issues.apache.org/jira/browse/HIVE-23958
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently HiveServer2 (through Jetty and Thrift) only supports JKS (and 
> PCKS12 based on JDK fallback) keystore/truststore types. There are additional 
> keystore/truststore types used for different applications like for FIPS 
> crypto algorithms. HS2 should support the default keystore type specified for 
> the JDK and not always use JKS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23927) Cast to Timestamp generates different output for Integer & Float values

2020-08-12 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176438#comment-17176438
 ] 

Panagiotis Garefalakis commented on HIVE-23927:
---

I guess the main issue here is  
*PrimitiveObjectInspectorUtils.getTimestamp(Object, PrimitiveObjectInspector, 
boolean)*.

For int: 
https://github.com/apache/hive/blob/6ceeea87a34f53add62fa6d0a332b06b8863c440/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritableV2.java#L531
*intToTimestampInSeconds = false * 
https://github.com/apache/hive/blob/1758c8c857f8a6dc4c9dc9c522de449f53e5e5cc/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils.java#L1181

While for double: 
https://github.com/apache/hive/blob/e6900fea9108b2dd00f0e4bf2a598f6fc9ba01cf/common/src/java/org/apache/hadoop/hive/common/type/TimestampUtils.java#L43

Not sure where the assumption that Double is in seconds comes from ?
Maybe we should make this configurable as well -- as we do in *longToTimestamp* 
method

> Cast to Timestamp generates different output for Integer & Float values 
> 
>
> Key: HIVE-23927
> URL: https://issues.apache.org/jira/browse/HIVE-23927
> Project: Hive
>  Issue Type: Bug
>Reporter: Renukaprasad C
>Priority: Major
>
> Double consider the input value as SECOND and converts into Millis internally.
> Whereas, Integer value will be considered as Millis and produce different 
> output.
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(Object,
>  PrimitiveObjectInspector, boolean) - Handles Integral & Decimal values 
> differently. This cause the issue.
> 0: jdbc:hive2://localhost:1> select cast(1.204135216E9 as timestamp) 
> Double2TimeStamp, cast(1204135216 as timestamp) Int2TimeStamp from abc 
> tablesample(1 rows);
> OK
> INFO  : Compiling 
> command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14): 
> select cast(1.204135216E9 as timestamp) Double2TimeStamp, cast(1204135216 as 
> timestamp) Int2TimeStamp from abc tablesample(1 rows)
> INFO  : Concurrency mode is disabled, not creating a lock manager
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:double2timestamp, type:timestamp, 
> comment:null), FieldSchema(name:int2timestamp, type:timestamp, 
> comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14); 
> Time taken: 0.175 seconds
> INFO  : Concurrency mode is disabled, not creating a lock manager
> INFO  : Executing 
> command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14): 
> select cast(1.204135216E9 as timestamp) Double2TimeStamp, cast(1204135216 as 
> timestamp) Int2TimeStamp from abc tablesample(1 rows)
> INFO  : Completed executing 
> command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14); 
> Time taken: 0.001 seconds
> INFO  : OK
> INFO  : Concurrency mode is disabled, not creating a lock manager
> ++--+
> |double2timestamp|  int2timestamp   |
> ++--+
> | 2008-02-27 18:00:16.0  | 1970-01-14 22:28:55.216  |
> ++--+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24025) Add getAggrStatsFor to HS2 cache

2020-08-12 Thread Soumyakanti Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soumyakanti Das updated HIVE-24025:
---
Summary: Add getAggrStatsFor to HS2 cache  (was: Add getTable and 
getAggrStatsFor to HS2 cache)

> Add getAggrStatsFor to HS2 cache
> 
>
> Key: HIVE-24025
> URL: https://issues.apache.org/jira/browse/HIVE-24025
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Minor
>
> getAggrColStats API takes a long time to run in HMS. Adding it to the HS2 
> local cache can reduce the query compilation time significantly.
> Local cache was introduced in https://issues.apache.org/jira/browse/HIVE-23949



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-08-12 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176406#comment-17176406
 ] 

Jesus Camacho Rodriguez commented on HIVE-23965:


+1 on removing old driver, since it fixes issues with the existing one. I do 
not think having the old one around adds much value and updating all those q 
files will be a pain.

[~zabetak], [~kgyrtkirk], if this PR is ready to be merged, I think the removal 
can be done in a follow-up.

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23922?focusedWorklogId=469749=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469749
 ]

ASF GitHub Bot logged work on HIVE-23922:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 14:51
Start Date: 12/Aug/20 14:51
Worklog Time Spent: 10m 
  Work Description: sunchao merged pull request #1307:
URL: https://github.com/apache/hive/pull/1307


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469749)
Time Spent: 1h 10m  (was: 1h)

> Improve code quality, UDFArgumentException.getMessage Method requires only 
> two parameters
> -
>
> Key: HIVE-23922
> URL: https://issues.apache.org/jira/browse/HIVE-23922
> Project: Hive
>  Issue Type: Improvement
>Reporter: hao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> [UDFArgumentException.getMessage] This method only needs two parameters, 
> message and methods. The rest parameters are not used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23993) Handle irrecoverable errors

2020-08-12 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176378#comment-17176378
 ] 

Pravin Sinha commented on HIVE-23993:
-

+1

> Handle irrecoverable errors
> ---
>
> Key: HIVE-23993
> URL: https://issues.apache.org/jira/browse/HIVE-23993
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, 
> HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, 
> HIVE-23993.06.patch, HIVE-23993.07.patch, HIVE-23993.08.patch, Retry Logic 
> for Replication.pdf
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23993) Handle irrecoverable errors

2020-08-12 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23993:
---
Attachment: HIVE-23993.08.patch
Status: Patch Available  (was: In Progress)

> Handle irrecoverable errors
> ---
>
> Key: HIVE-23993
> URL: https://issues.apache.org/jira/browse/HIVE-23993
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, 
> HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, 
> HIVE-23993.06.patch, HIVE-23993.07.patch, HIVE-23993.08.patch, Retry Logic 
> for Replication.pdf
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23993) Handle irrecoverable errors

2020-08-12 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23993:
---
Status: In Progress  (was: Patch Available)

> Handle irrecoverable errors
> ---
>
> Key: HIVE-23993
> URL: https://issues.apache.org/jira/browse/HIVE-23993
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, 
> HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, 
> HIVE-23993.06.patch, HIVE-23993.07.patch, HIVE-23993.08.patch, Retry Logic 
> for Replication.pdf
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469734=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469734
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 14:26
Start Date: 12/Aug/20 14:26
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469300187



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java
##
@@ -77,6 +75,211 @@ public void reset() {
   // Do not change the initial bytes which contain 
NumHashFunctions/NumBits!
   Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, 
bfBytes.length, (byte) 0);
 }
+
+public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector 
inputColumn,
+int batchSize, boolean selectedInUse, int[] selected, Configuration 
conf) {
+  // already set in previous iterations, no need to call initExecutor again
+  if (numThreads == 0) {
+return false;
+  }
+  if (executor == null) {
+initExecutor(conf, batchSize);
+if (!isParallel) {
+  return false;
+}
+  }
+
+  // split every bloom filter (represented by a part of a byte[]) across 
workers
+  for (int j = 0; j < batchSize; j++) {
+if (!selectedInUse && inputColumn.noNulls) {
+  splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+  inputColumn.length[j]);
+} else if (!selectedInUse) {
+  if (!inputColumn.isNull[j]) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+inputColumn.length[j]);
+  }
+} else if (inputColumn.noNulls) {
+  int i = selected[j];
+  splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+  inputColumn.length[i]);
+} else {
+  int i = selected[j];
+  if (!inputColumn.isNull[i]) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+inputColumn.length[i]);
+  }
+}
+  }
+
+  return true;
+}
+
+private void initExecutor(Configuration conf, int batchSize) {
+  numThreads = 
conf.getInt(HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.varname,
+  HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.defaultIntVal);
+  LOG.info("Number of threads used for bloom filter merge: {}", 
numThreads);
+
+  if (numThreads < 0) {
+throw new RuntimeException(
+"invalid number of threads for bloom filter merge: " + numThreads);
+  }
+  if (numThreads == 0) { // disable parallel feature
+return; // this will leave isParallel=false
+  }
+  isParallel = true;
+  executor = Executors.newFixedThreadPool(numThreads);
+
+  workers = new BloomFilterMergeWorker[numThreads];
+  for (int f = 0; f < numThreads; f++) {
+workers[f] = new BloomFilterMergeWorker(bfBytes, 0, bfBytes.length);
+  }
+
+  for (int f = 0; f < numThreads; f++) {
+executor.submit(workers[f]);
+  }
+}
+
+public int getNumberOfWaitingMergeTasks(){
+  int size = 0;
+  for (BloomFilterMergeWorker w : workers){
+size += w.queue.size();
+  }
+  return size;
+}
+
+public int getNumberOfMergingWorkers() {
+  int working = 0;
+  for (BloomFilterMergeWorker w : workers) {
+if (w.isMerging.get()) {
+  working += 1;
+}
+  }
+  return working;
+}
+
+private static void splitVectorAcrossWorkers(BloomFilterMergeWorker[] 
workers, byte[] bytes,
+int start, int length) {
+  if (bytes == null || length == 0) {
+return;
+  }
+  /*
+   * This will split a byte[] across workers as below:
+   * let's say there are 10 workers for 7813 bytes, in this case
+   * length: 7813, elementPerBatch: 781
+   * bytes assigned to workers: inclusive lower bound, exclusive upper 
bound
+   * 1. worker: 5 -> 786
+   * 2. worker: 786 -> 1567
+   * 3. worker: 1567 -> 2348
+   * 4. worker: 2348 -> 3129
+   * 5. worker: 3129 -> 3910
+   * 6. worker: 3910 -> 4691
+   * 7. worker: 4691 -> 5472
+   * 8. worker: 5472 -> 6253
+   * 9. worker: 6253 -> 7034
+   * 10. worker: 7034 -> 7813 (last element per batch is: 779)
+   *
+   * This way, a particular worker will be given with the same part
+   * of all bloom filters along with the shared base bloom filter,
+   * so the bitwise OR function will not be a subject of threading/sync 
issues.
+   */
+  int elementPerBatch =
+  (int) Math.ceil((double) (length - 

[jira] [Updated] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore

2020-08-12 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24032:
---
Attachment: HIVE-24032.01.patch
Status: Patch Available  (was: In Progress)

> Remove hadoop shims dependency and use FileSystem Api directly from 
> standalone metastore
> 
>
> Key: HIVE-24032
> URL: https://issues.apache.org/jira/browse/HIVE-24032
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24032.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore

2020-08-12 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24032 started by Aasha Medhi.
--
> Remove hadoop shims dependency and use FileSystem Api directly from 
> standalone metastore
> 
>
> Key: HIVE-24032
> URL: https://issues.apache.org/jira/browse/HIVE-24032
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24032.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24032?focusedWorklogId=469713=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469713
 ]

ASF GitHub Bot logged work on HIVE-24032:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 13:52
Start Date: 12/Aug/20 13:52
Worklog Time Spent: 10m 
  Work Description: aasha opened a new pull request #1396:
URL: https://github.com/apache/hive/pull/1396


   …rectly from standalone metastore
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469713)
Remaining Estimate: 0h
Time Spent: 10m

> Remove hadoop shims dependency and use FileSystem Api directly from 
> standalone metastore
> 
>
> Key: HIVE-24032
> URL: https://issues.apache.org/jira/browse/HIVE-24032
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24032:
--
Labels: pull-request-available  (was: )

> Remove hadoop shims dependency and use FileSystem Api directly from 
> standalone metastore
> 
>
> Key: HIVE-24032
> URL: https://issues.apache.org/jira/browse/HIVE-24032
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24033) full outer join returns wrong number of results if hive.optimize.joinreducededuplication is enabled

2020-08-12 Thread Sebastian Klemke (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176324#comment-17176324
 ] 

Sebastian Klemke commented on HIVE-24033:
-

Execution plan of the failing query is here: [^failing_query_plan.txt]

joinreducededuplication optimizer logs for this query:
{code:java}
2020-07-31T14:42:41,542 DEBUG [89354899-5041-441a-ab6f-41e4eb1d3625 main] 
correlation.ReduceSinkJoinDeDuplication: Set RS[21] to forward data
2020-07-31T14:42:41,542 DEBUG [89354899-5041-441a-ab6f-41e4eb1d3625 main] 
correlation.ReduceSinkJoinDeDuplication: Set RS[20] to FIXED parallelism: 120
2020-07-31T14:42:41,542 DEBUG [89354899-5041-441a-ab6f-41e4eb1d3625 main] 
correlation.ReduceSinkJoinDeDuplication: Set RS[21] to FIXED parallelism: 120
2020-07-31T14:42:41,542 DEBUG [89354899-5041-441a-ab6f-41e4eb1d3625 main] 
correlation.ReduceSinkJoinDeDuplication: Set RS[17] to FIXED parallelism: 120 
{code}

> full outer join returns wrong number of results if 
> hive.optimize.joinreducededuplication is enabled
> ---
>
> Key: HIVE-24033
> URL: https://issues.apache.org/jira/browse/HIVE-24033
> Project: Hive
>  Issue Type: Bug
>Reporter: Sebastian Klemke
>Priority: Major
> Attachments: failing_query_plan.txt
>
>
> We encountered a hive query that returns incorrect results, when joining two 
> CTEs on a group by value. The input tables `id_table` and
> `reference_table` are unfortunately too large to share and on smaller tables 
> we have not been able to reproduce.
> {code}
> WITH ids AS (
> SELECT
> record.id AS id
> FROM
> `id_table`
> LATERAL VIEW explode(records) r AS record
> WHERE
> record.id = '5ef0bad74d325f72f0360c19'
> LIMIT 1
> ),
> refs AS (
> SELECT
> reference['id'] AS referenceId
> FROM
> `reference_table`
> WHERE
>   partition_date = '2020-06-24'
> AND type = '1b0e9eb5c492d1859815410253dd79b5'
> AND reference['id'] = '5ef0bad74d325f72f0360c19'
> GROUP BY
> reference['id']
> )
> SELECT
> l.id AS id
> , r.referenceId AS referenceId
> FROM 
> ids l
> FULL OUTER JOIN
> refs r
> ON
> l.id = r.referenceId
> {code}
> This returns 2 rows, because the join clause misses: 
> {code}
> OK
> 5ef0bad74d325f72f0360c19NULL
> NULL5ef0bad74d325f72f0360c19
> {code}
> Instead, a single row should be returned. The correct behavior can be 
> achieved by either 
>  * calling lower() on the refs group by statement (doesn't change the string 
> contents)
>  * setting hive.optimize.joinreducededuplication=false



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24033) full outer join returns wrong number of results if hive.optimize.joinreducededuplication is enabled

2020-08-12 Thread Sebastian Klemke (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Klemke updated HIVE-24033:

Attachment: failing_query_plan.txt

> full outer join returns wrong number of results if 
> hive.optimize.joinreducededuplication is enabled
> ---
>
> Key: HIVE-24033
> URL: https://issues.apache.org/jira/browse/HIVE-24033
> Project: Hive
>  Issue Type: Bug
>Reporter: Sebastian Klemke
>Priority: Major
> Attachments: failing_query_plan.txt
>
>
> We encountered a hive query that returns incorrect results, when joining two 
> CTEs on a group by value. The input tables `id_table` and
> `reference_table` are unfortunately too large to share and on smaller tables 
> we have not been able to reproduce.
> {code}
> WITH ids AS (
> SELECT
> record.id AS id
> FROM
> `id_table`
> LATERAL VIEW explode(records) r AS record
> WHERE
> record.id = '5ef0bad74d325f72f0360c19'
> LIMIT 1
> ),
> refs AS (
> SELECT
> reference['id'] AS referenceId
> FROM
> `reference_table`
> WHERE
>   partition_date = '2020-06-24'
> AND type = '1b0e9eb5c492d1859815410253dd79b5'
> AND reference['id'] = '5ef0bad74d325f72f0360c19'
> GROUP BY
> reference['id']
> )
> SELECT
> l.id AS id
> , r.referenceId AS referenceId
> FROM 
> ids l
> FULL OUTER JOIN
> refs r
> ON
> l.id = r.referenceId
> {code}
> This returns 2 rows, because the join clause misses: 
> {code}
> OK
> 5ef0bad74d325f72f0360c19NULL
> NULL5ef0bad74d325f72f0360c19
> {code}
> Instead, a single row should be returned. The correct behavior can be 
> achieved by either 
>  * calling lower() on the refs group by statement (doesn't change the string 
> contents)
>  * setting hive.optimize.joinreducededuplication=false



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24031) Infinite planning time on syntactically big queries

2020-08-12 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176322#comment-17176322
 ] 

Stamatis Zampetakis commented on HIVE-24031:


I run the query  from above with {{TestMiniLlapLocalCliDriver}} and the 
profiling ([^query_big_array_constructor.nps])  shows that the vast majority of 
time is spend on creating defensive copies of the node expression list inside 
ASTNode#getChildren. 

 !ASTNode_getChildren_cost.png! 

The method is called extensively from various places in the code especially 
those walking over the expression tree so it needs to be efficient. I propose 
to drop the defensive copy (possibly protecting the list from modifications via 
an unmodiafable collection) and let clients do copies of the list if they deem 
necessary. In most of the cases, if not all, making copies of the list seems 
useless.

> Infinite planning time on syntactically big queries
> ---
>
> Key: HIVE-24031
> URL: https://issues.apache.org/jira/browse/HIVE-24031
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: ASTNode_getChildren_cost.png, 
> query_big_array_constructor.nps
>
>
> Syntactically big queries (~1 million tokens), such as the query shown below, 
> lead to very big (seemingly infinite) planning times.
> {code:sql}
> select posexplode(array('item1', 'item2', ..., 'item1M'));
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24031) Infinite planning time on syntactically big queries

2020-08-12 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-24031:
---
Attachment: ASTNode_getChildren_cost.png

> Infinite planning time on syntactically big queries
> ---
>
> Key: HIVE-24031
> URL: https://issues.apache.org/jira/browse/HIVE-24031
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: ASTNode_getChildren_cost.png, 
> query_big_array_constructor.nps
>
>
> Syntactically big queries (~1 million tokens), such as the query shown below, 
> lead to very big (seemingly infinite) planning times.
> {code:sql}
> select posexplode(array('item1', 'item2', ..., 'item1M'));
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24031) Infinite planning time on syntactically big queries

2020-08-12 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-24031:
---
Attachment: query_big_array_constructor.nps

> Infinite planning time on syntactically big queries
> ---
>
> Key: HIVE-24031
> URL: https://issues.apache.org/jira/browse/HIVE-24031
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: query_big_array_constructor.nps
>
>
> Syntactically big queries (~1 million tokens), such as the query shown below, 
> lead to very big (seemingly infinite) planning times.
> {code:sql}
> select posexplode(array('item1', 'item2', ..., 'item1M'));
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469682
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 12:41
Start Date: 12/Aug/20 12:41
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469099535



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##
@@ -1126,6 +1137,7 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
 VectorAggregateExpression vecAggrExpr = null;
 try {
   vecAggrExpr = ctor.newInstance(vecAggrDesc);
+  vecAggrExpr.withConf(hconf);

Review comment:
   1. constructor: first I tried to pass it to constructor, but that breaks 
compatibility with every other subclasses of VectorAggregateExpression, if I 
use ctor.newInstance(vecAggrDesc, hconf), I need to add that constructor to all 
subclasses, because they don't inherit this ctor from 
VectorAggregateExpression...withConf can solve this, let me know about better 
ways
   
   2. single int: this config is specific to VectorUDAFBloomFilterMerge, so I 
believe I should not pass it through a constructor to every 
VectorAggregateExpression, and I didn't want to go for an instanceof hack for a 
cast + specific call





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469682)
Time Spent: 6.5h  (was: 6h 20m)

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0   0
> Reducer 2 llap   RUNNING  1  010  
>  0   0
> Map 4 llap   RUNNING   6154  0  207 5947  
>  0   0
> Reducer 5 llapINITED 43  00   43  
>  0   0
> Reducer 6 llapINITED  1  001  
>  0   0
> --
> VERTICES: 02/06  [>>--] 16%   ELAPSED TIME: 149.98 s
> 

[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469680
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 12:40
Start Date: 12/Aug/20 12:40
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469229142



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##
@@ -517,6 +532,10 @@ public void close(boolean aborted) throws HiveException {
 
 }
 
+//TODO: implement finishAggregators
+protected void finishAggregators(boolean aborted) {

Review comment:
   valid point, I need to recheck and learn how aggregators and aggretation 
buffers are paired together for a specific mode, it seems complicated for the 
first time





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469680)
Time Spent: 6h 10m  (was: 6h)

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0   0
> Reducer 2 llap   RUNNING  1  010  
>  0   0
> Map 4 llap   RUNNING   6154  0  207 5947  
>  0   0
> Reducer 5 llapINITED 43  00   43  
>  0   0
> Reducer 6 llapINITED  1  001  
>  0   0
> --
> VERTICES: 02/06  [>>--] 16%   ELAPSED TIME: 149.98 s
> --
> {code}
> For example, 70M entries in bloom filter leads to a 436 465 696 bits, so 
> merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR 
> operation, which is very hot codepath, but can be parallelized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469681=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469681
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 12:40
Start Date: 12/Aug/20 12:40
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469229142



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##
@@ -517,6 +532,10 @@ public void close(boolean aborted) throws HiveException {
 
 }
 
+//TODO: implement finishAggregators
+protected void finishAggregators(boolean aborted) {

Review comment:
   valid point, I need to recheck and learn how aggregators and aggregation 
buffers are paired together for a specific mode, it seems complicated for the 
first time





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469681)
Time Spent: 6h 20m  (was: 6h 10m)

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0   0
> Reducer 2 llap   RUNNING  1  010  
>  0   0
> Map 4 llap   RUNNING   6154  0  207 5947  
>  0   0
> Reducer 5 llapINITED 43  00   43  
>  0   0
> Reducer 6 llapINITED  1  001  
>  0   0
> --
> VERTICES: 02/06  [>>--] 16%   ELAPSED TIME: 149.98 s
> --
> {code}
> For example, 70M entries in bloom filter leads to a 436 465 696 bits, so 
> merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR 
> operation, which is very hot codepath, but can be parallelized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469678=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469678
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 12:37
Start Date: 12/Aug/20 12:37
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469227267



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java
##
@@ -77,6 +75,211 @@ public void reset() {
   // Do not change the initial bytes which contain 
NumHashFunctions/NumBits!
   Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, 
bfBytes.length, (byte) 0);
 }
+
+public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector 
inputColumn,
+int batchSize, boolean selectedInUse, int[] selected, Configuration 
conf) {
+  // already set in previous iterations, no need to call initExecutor again
+  if (numThreads == 0) {
+return false;
+  }
+  if (executor == null) {
+initExecutor(conf, batchSize);
+if (!isParallel) {
+  return false;
+}
+  }
+
+  // split every bloom filter (represented by a part of a byte[]) across 
workers
+  for (int j = 0; j < batchSize; j++) {
+if (!selectedInUse && inputColumn.noNulls) {
+  splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+  inputColumn.length[j]);
+} else if (!selectedInUse) {
+  if (!inputColumn.isNull[j]) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+inputColumn.length[j]);
+  }
+} else if (inputColumn.noNulls) {
+  int i = selected[j];
+  splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+  inputColumn.length[i]);
+} else {
+  int i = selected[j];
+  if (!inputColumn.isNull[i]) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+inputColumn.length[i]);
+  }
+}
+  }
+
+  return true;
+}
+
+private void initExecutor(Configuration conf, int batchSize) {
+  numThreads = 
conf.getInt(HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.varname,
+  HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.defaultIntVal);
+  LOG.info("Number of threads used for bloom filter merge: {}", 
numThreads);
+
+  if (numThreads < 0) {
+throw new RuntimeException(
+"invalid number of threads for bloom filter merge: " + numThreads);
+  }
+  if (numThreads == 0) { // disable parallel feature
+return; // this will leave isParallel=false
+  }
+  isParallel = true;
+  executor = Executors.newFixedThreadPool(numThreads);
+
+  workers = new BloomFilterMergeWorker[numThreads];
+  for (int f = 0; f < numThreads; f++) {
+workers[f] = new BloomFilterMergeWorker(bfBytes, 0, bfBytes.length);
+  }
+
+  for (int f = 0; f < numThreads; f++) {
+executor.submit(workers[f]);
+  }
+}
+
+public int getNumberOfWaitingMergeTasks(){
+  int size = 0;
+  for (BloomFilterMergeWorker w : workers){
+size += w.queue.size();
+  }
+  return size;
+}
+
+public int getNumberOfMergingWorkers() {
+  int working = 0;
+  for (BloomFilterMergeWorker w : workers) {
+if (w.isMerging.get()) {
+  working += 1;
+}
+  }
+  return working;
+}
+
+private static void splitVectorAcrossWorkers(BloomFilterMergeWorker[] 
workers, byte[] bytes,
+int start, int length) {
+  if (bytes == null || length == 0) {
+return;
+  }
+  /*
+   * This will split a byte[] across workers as below:
+   * let's say there are 10 workers for 7813 bytes, in this case
+   * length: 7813, elementPerBatch: 781
+   * bytes assigned to workers: inclusive lower bound, exclusive upper 
bound
+   * 1. worker: 5 -> 786
+   * 2. worker: 786 -> 1567
+   * 3. worker: 1567 -> 2348
+   * 4. worker: 2348 -> 3129
+   * 5. worker: 3129 -> 3910
+   * 6. worker: 3910 -> 4691
+   * 7. worker: 4691 -> 5472
+   * 8. worker: 5472 -> 6253
+   * 9. worker: 6253 -> 7034
+   * 10. worker: 7034 -> 7813 (last element per batch is: 779)
+   *
+   * This way, a particular worker will be given with the same part
+   * of all bloom filters along with the shared base bloom filter,
+   * so the bitwise OR function will not be a subject of threading/sync 
issues.
+   */
+  int elementPerBatch =
+  (int) Math.ceil((double) (length - 

[jira] [Commented] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-08-12 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176300#comment-17176300
 ] 

Zoltan Haindrich commented on HIVE-23965:
-

* the description clearly describes that the metastore data is a composition of 
questionable quality stuff...so we are running our planning tests against some 
weird metastore content
* I don't think adding more tests will increase test coverage - in this case we 
are talking about queries which are already run 2 times already - I've seen 
people updating q.out's like crazyso adding an extra 100 q.out-s will not 
neccessarily increase coverage...
* the independence from having docker setup is a great - the new approach uses 
docker - but if that's a problem we could try to come up with some other 
approach - I'm wondering about using an archived derby database with metastore 
data
* the metastore content lodader approach is quite unfortunate - IIRC once I had 
to fix up something in the loader once... because I made some changes to the 
column statistics

I think we should remove the old approach...and run tests against the new; 
more-realistic schema.




> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23965?focusedWorklogId=469658=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469658
 ]

ASF GitHub Bot logged work on HIVE-23965:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 12:03
Start Date: 12/Aug/20 12:03
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1347:
URL: https://github.com/apache/hive/pull/1347#discussion_r469206883



##
File path: 
itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezPerfCliDriver.java
##
@@ -34,7 +34,7 @@
 @RunWith(Parameterized.class)
 public class TestTezPerfCliDriver {
 
-  static CliAdapter adapter = new 
CliConfigs.TezPerfCliConfig(false).getCliAdapter();
+  static CliAdapter adapter = new 
CliConfigs.TezCustomTPCDSCliConfig(false).getCliAdapter();

Review comment:
   you can do that - but I still don't see why do we need to add the 
keyword `custom` to the name of the test...it was around for a few years ...now 
we rename it and add "custom" to its name while there won't be a non-custom 
version anymore...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469658)
Time Spent: 1h 10m  (was: 1h)

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469647
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 11:21
Start Date: 12/Aug/20 11:21
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469186450



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java
##
@@ -77,6 +75,211 @@ public void reset() {
   // Do not change the initial bytes which contain 
NumHashFunctions/NumBits!
   Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, 
bfBytes.length, (byte) 0);
 }
+
+public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector 
inputColumn,
+int batchSize, boolean selectedInUse, int[] selected, Configuration 
conf) {
+  // already set in previous iterations, no need to call initExecutor again
+  if (numThreads == 0) {
+return false;
+  }
+  if (executor == null) {
+initExecutor(conf, batchSize);
+if (!isParallel) {
+  return false;
+}
+  }
+
+  // split every bloom filter (represented by a part of a byte[]) across 
workers
+  for (int j = 0; j < batchSize; j++) {
+if (!selectedInUse && inputColumn.noNulls) {
+  splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+  inputColumn.length[j]);
+} else if (!selectedInUse) {
+  if (!inputColumn.isNull[j]) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+inputColumn.length[j]);
+  }
+} else if (inputColumn.noNulls) {
+  int i = selected[j];
+  splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+  inputColumn.length[i]);
+} else {
+  int i = selected[j];
+  if (!inputColumn.isNull[i]) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+inputColumn.length[i]);
+  }
+}
+  }
+
+  return true;
+}
+
+private void initExecutor(Configuration conf, int batchSize) {

Review comment:
   right, I'll remove
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469647)
Time Spent: 5h 50m  (was: 5h 40m)

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> 

[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469646
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 11:20
Start Date: 12/Aug/20 11:20
Worklog Time Spent: 10m 
  Work Description: pgaref commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469186380



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java
##
@@ -77,6 +75,211 @@ public void reset() {
   // Do not change the initial bytes which contain 
NumHashFunctions/NumBits!
   Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, 
bfBytes.length, (byte) 0);
 }
+
+public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector 
inputColumn,
+int batchSize, boolean selectedInUse, int[] selected, Configuration 
conf) {
+  // already set in previous iterations, no need to call initExecutor again
+  if (numThreads == 0) {
+return false;
+  }
+  if (executor == null) {
+initExecutor(conf, batchSize);
+if (!isParallel) {
+  return false;
+}
+  }
+
+  // split every bloom filter (represented by a part of a byte[]) across 
workers
+  for (int j = 0; j < batchSize; j++) {
+if (!selectedInUse && inputColumn.noNulls) {
+  splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+  inputColumn.length[j]);
+} else if (!selectedInUse) {
+  if (!inputColumn.isNull[j]) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+inputColumn.length[j]);
+  }
+} else if (inputColumn.noNulls) {
+  int i = selected[j];
+  splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+  inputColumn.length[i]);
+} else {
+  int i = selected[j];
+  if (!inputColumn.isNull[i]) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+inputColumn.length[i]);
+  }
+}
+  }
+
+  return true;
+}
+
+private void initExecutor(Configuration conf, int batchSize) {
+  numThreads = 
conf.getInt(HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.varname,
+  HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.defaultIntVal);
+  LOG.info("Number of threads used for bloom filter merge: {}", 
numThreads);
+
+  if (numThreads < 0) {
+throw new RuntimeException(
+"invalid number of threads for bloom filter merge: " + numThreads);
+  }
+  if (numThreads == 0) { // disable parallel feature
+return; // this will leave isParallel=false
+  }
+  isParallel = true;
+  executor = Executors.newFixedThreadPool(numThreads);
+
+  workers = new BloomFilterMergeWorker[numThreads];
+  for (int f = 0; f < numThreads; f++) {
+workers[f] = new BloomFilterMergeWorker(bfBytes, 0, bfBytes.length);
+  }
+
+  for (int f = 0; f < numThreads; f++) {
+executor.submit(workers[f]);
+  }
+}
+
+public int getNumberOfWaitingMergeTasks(){
+  int size = 0;
+  for (BloomFilterMergeWorker w : workers){
+size += w.queue.size();
+  }
+  return size;
+}
+
+public int getNumberOfMergingWorkers() {
+  int working = 0;
+  for (BloomFilterMergeWorker w : workers) {
+if (w.isMerging.get()) {
+  working += 1;
+}
+  }
+  return working;
+}
+
+private static void splitVectorAcrossWorkers(BloomFilterMergeWorker[] 
workers, byte[] bytes,
+int start, int length) {
+  if (bytes == null || length == 0) {
+return;
+  }
+  /*
+   * This will split a byte[] across workers as below:
+   * let's say there are 10 workers for 7813 bytes, in this case
+   * length: 7813, elementPerBatch: 781
+   * bytes assigned to workers: inclusive lower bound, exclusive upper 
bound
+   * 1. worker: 5 -> 786
+   * 2. worker: 786 -> 1567
+   * 3. worker: 1567 -> 2348
+   * 4. worker: 2348 -> 3129
+   * 5. worker: 3129 -> 3910
+   * 6. worker: 3910 -> 4691
+   * 7. worker: 4691 -> 5472
+   * 8. worker: 5472 -> 6253
+   * 9. worker: 6253 -> 7034
+   * 10. worker: 7034 -> 7813 (last element per batch is: 779)
+   *
+   * This way, a particular worker will be given with the same part
+   * of all bloom filters along with the shared base bloom filter,
+   * so the bitwise OR function will not be a subject of threading/sync 
issues.
+   */
+  int elementPerBatch =
+  (int) Math.ceil((double) (length - 

[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469645=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469645
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 11:18
Start Date: 12/Aug/20 11:18
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469185446



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java
##
@@ -77,6 +75,211 @@ public void reset() {
   // Do not change the initial bytes which contain 
NumHashFunctions/NumBits!
   Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, 
bfBytes.length, (byte) 0);
 }
+
+public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector 
inputColumn,
+int batchSize, boolean selectedInUse, int[] selected, Configuration 
conf) {
+  // already set in previous iterations, no need to call initExecutor again
+  if (numThreads == 0) {
+return false;
+  }
+  if (executor == null) {
+initExecutor(conf, batchSize);
+if (!isParallel) {
+  return false;
+}
+  }
+
+  // split every bloom filter (represented by a part of a byte[]) across 
workers
+  for (int j = 0; j < batchSize; j++) {
+if (!selectedInUse && inputColumn.noNulls) {
+  splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+  inputColumn.length[j]);
+} else if (!selectedInUse) {
+  if (!inputColumn.isNull[j]) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+inputColumn.length[j]);
+  }
+} else if (inputColumn.noNulls) {
+  int i = selected[j];
+  splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+  inputColumn.length[i]);
+} else {
+  int i = selected[j];
+  if (!inputColumn.isNull[i]) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+inputColumn.length[i]);
+  }
+}
+  }
+
+  return true;
+}
+
+private void initExecutor(Configuration conf, int batchSize) {
+  numThreads = 
conf.getInt(HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.varname,
+  HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.defaultIntVal);
+  LOG.info("Number of threads used for bloom filter merge: {}", 
numThreads);
+
+  if (numThreads < 0) {
+throw new RuntimeException(
+"invalid number of threads for bloom filter merge: " + numThreads);
+  }
+  if (numThreads == 0) { // disable parallel feature

Review comment:
   good catch, I'm eliminating this check by initializing parallel behavior 
while initializing the aggregation buffer





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469645)
Time Spent: 5.5h  (was: 5h 20m)

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>  

[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469644
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 11:17
Start Date: 12/Aug/20 11:17
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469184919



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java
##
@@ -77,6 +75,211 @@ public void reset() {
   // Do not change the initial bytes which contain 
NumHashFunctions/NumBits!
   Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, 
bfBytes.length, (byte) 0);
 }
+

Review comment:
   fortunately we won't need this, I've eliminated with boolean return hack 
according to another PR comment





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469644)
Time Spent: 5h 20m  (was: 5h 10m)

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0   0
> Reducer 2 llap   RUNNING  1  010  
>  0   0
> Map 4 llap   RUNNING   6154  0  207 5947  
>  0   0
> Reducer 5 llapINITED 43  00   43  
>  0   0
> Reducer 6 llapINITED  1  001  
>  0   0
> --
> VERTICES: 02/06  [>>--] 16%   ELAPSED TIME: 149.98 s
> --
> {code}
> For example, 70M entries in bloom filter leads to a 436 465 696 bits, so 
> merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR 
> operation, which is very hot codepath, but can be parallelized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24032) Remove hadoop shims dependency and use FileSystem Api directly from standalone metastore

2020-08-12 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi reassigned HIVE-24032:
--


> Remove hadoop shims dependency and use FileSystem Api directly from 
> standalone metastore
> 
>
> Key: HIVE-24032
> URL: https://issues.apache.org/jira/browse/HIVE-24032
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24031) Infinite planning time on syntactically big queries

2020-08-12 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176238#comment-17176238
 ] 

Stamatis Zampetakis commented on HIVE-24031:


Obviously such queries should be rather rare and usually can be avoided (using 
temporary tables and other tricks) but the fact that the planner is stuck  
processing the query indefinitely is a problem that should be addressed.

> Infinite planning time on syntactically big queries
> ---
>
> Key: HIVE-24031
> URL: https://issues.apache.org/jira/browse/HIVE-24031
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: 4.0.0
>
>
> Syntactically big queries (~1 million tokens), such as the query shown below, 
> lead to very big (seemingly infinite) planning times.
> {code:sql}
> select posexplode(array('item1', 'item2', ..., 'item1M'));
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24031) Infinite planning time on syntactically big queries

2020-08-12 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-24031:
--


> Infinite planning time on syntactically big queries
> ---
>
> Key: HIVE-24031
> URL: https://issues.apache.org/jira/browse/HIVE-24031
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
> Fix For: 4.0.0
>
>
> Syntactically big queries (~1 million tokens), such as the query shown below, 
> lead to very big (seemingly infinite) planning times.
> {code:sql}
> select posexplode(array('item1', 'item2', ..., 'item1M'));
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23993) Handle irrecoverable errors

2020-08-12 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23993:
---
Attachment: HIVE-23993.07.patch
Status: Patch Available  (was: In Progress)

> Handle irrecoverable errors
> ---
>
> Key: HIVE-23993
> URL: https://issues.apache.org/jira/browse/HIVE-23993
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, 
> HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, 
> HIVE-23993.06.patch, HIVE-23993.07.patch, Retry Logic for Replication.pdf
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23993) Handle irrecoverable errors

2020-08-12 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23993:
---
Status: In Progress  (was: Patch Available)

> Handle irrecoverable errors
> ---
>
> Key: HIVE-23993
> URL: https://issues.apache.org/jira/browse/HIVE-23993
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, 
> HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, 
> HIVE-23993.06.patch, HIVE-23993.07.patch, Retry Logic for Replication.pdf
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23922?focusedWorklogId=469620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469620
 ]

ASF GitHub Bot logged work on HIVE-23922:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 09:15
Start Date: 12/Aug/20 09:15
Worklog Time Spent: 10m 
  Work Description: dh20 commented on pull request #1307:
URL: https://github.com/apache/hive/pull/1307#issuecomment-672757291


   @sunchao The test has passed



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469620)
Time Spent: 1h  (was: 50m)

> Improve code quality, UDFArgumentException.getMessage Method requires only 
> two parameters
> -
>
> Key: HIVE-23922
> URL: https://issues.apache.org/jira/browse/HIVE-23922
> Project: Hive
>  Issue Type: Improvement
>Reporter: hao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [UDFArgumentException.getMessage] This method only needs two parameters, 
> message and methods. The rest parameters are not used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469619=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469619
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 09:13
Start Date: 12/Aug/20 09:13
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469120003



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java
##
@@ -77,6 +75,211 @@ public void reset() {
   // Do not change the initial bytes which contain 
NumHashFunctions/NumBits!
   Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, 
bfBytes.length, (byte) 0);
 }
+
+public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector 
inputColumn,
+int batchSize, boolean selectedInUse, int[] selected, Configuration 
conf) {
+  // already set in previous iterations, no need to call initExecutor again
+  if (numThreads == 0) {
+return false;
+  }
+  if (executor == null) {
+initExecutor(conf, batchSize);
+if (!isParallel) {
+  return false;
+}
+  }
+
+  // split every bloom filter (represented by a part of a byte[]) across 
workers
+  for (int j = 0; j < batchSize; j++) {
+if (!selectedInUse && inputColumn.noNulls) {
+  splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+  inputColumn.length[j]);
+} else if (!selectedInUse) {
+  if (!inputColumn.isNull[j]) {

Review comment:
   yeah, I missed this cleanup while mirroring the [original 
logic](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java#L132-L191)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469619)
Time Spent: 5h 10m  (was: 5h)

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0   0
> Reducer 2 llap   RUNNING  1  010  
>  0   0
> Map 4 

[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469610
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 09:00
Start Date: 12/Aug/20 09:00
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469112026



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java
##
@@ -77,6 +75,211 @@ public void reset() {
   // Do not change the initial bytes which contain 
NumHashFunctions/NumBits!
   Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, 
bfBytes.length, (byte) 0);
 }
+
+public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector 
inputColumn,
+int batchSize, boolean selectedInUse, int[] selected, Configuration 
conf) {
+  // already set in previous iterations, no need to call initExecutor again
+  if (numThreads == 0) {
+return false;
+  }
+  if (executor == null) {
+initExecutor(conf, batchSize);
+if (!isParallel) {
+  return false;
+}
+  }
+
+  // split every bloom filter (represented by a part of a byte[]) across 
workers
+  for (int j = 0; j < batchSize; j++) {
+if (!selectedInUse && inputColumn.noNulls) {
+  splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+  inputColumn.length[j]);
+} else if (!selectedInUse) {
+  if (!inputColumn.isNull[j]) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+inputColumn.length[j]);
+  }
+} else if (inputColumn.noNulls) {
+  int i = selected[j];
+  splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+  inputColumn.length[i]);
+} else {
+  int i = selected[j];
+  if (!inputColumn.isNull[i]) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+inputColumn.length[i]);
+  }
+}
+  }
+
+  return true;
+}
+
+private void initExecutor(Configuration conf, int batchSize) {
+  numThreads = 
conf.getInt(HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.varname,
+  HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.defaultIntVal);
+  LOG.info("Number of threads used for bloom filter merge: {}", 
numThreads);
+
+  if (numThreads < 0) {
+throw new RuntimeException(
+"invalid number of threads for bloom filter merge: " + numThreads);
+  }
+  if (numThreads == 0) { // disable parallel feature
+return; // this will leave isParallel=false
+  }
+  isParallel = true;
+  executor = Executors.newFixedThreadPool(numThreads);
+
+  workers = new BloomFilterMergeWorker[numThreads];
+  for (int f = 0; f < numThreads; f++) {

Review comment:
   cood catch, moving them to a single loop

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java
##
@@ -77,6 +75,211 @@ public void reset() {
   // Do not change the initial bytes which contain 
NumHashFunctions/NumBits!
   Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, 
bfBytes.length, (byte) 0);
 }
+
+public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector 
inputColumn,
+int batchSize, boolean selectedInUse, int[] selected, Configuration 
conf) {
+  // already set in previous iterations, no need to call initExecutor again
+  if (numThreads == 0) {
+return false;
+  }
+  if (executor == null) {
+initExecutor(conf, batchSize);
+if (!isParallel) {
+  return false;
+}
+  }
+
+  // split every bloom filter (represented by a part of a byte[]) across 
workers
+  for (int j = 0; j < batchSize; j++) {
+if (!selectedInUse && inputColumn.noNulls) {
+  splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+  inputColumn.length[j]);
+} else if (!selectedInUse) {
+  if (!inputColumn.isNull[j]) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+inputColumn.length[j]);
+  }
+} else if (inputColumn.noNulls) {
+  int i = selected[j];
+  splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+  inputColumn.length[i]);
+} else {
+  int i = selected[j];
+  if (!inputColumn.isNull[i]) {
+splitVectorAcrossWorkers(workers, 

[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469608=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469608
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 08:59
Start Date: 12/Aug/20 08:59
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469111595



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4330,6 +4330,12 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 "Bloom filter should be of at max certain size to be effective"),
 TEZ_BLOOM_FILTER_FACTOR("hive.tez.bloom.filter.factor", (float) 1.0,
 "Bloom filter should be a multiple of this factor with nDV"),
+TEZ_BLOOM_FILTER_MERGE_THREADS("hive.tez.bloom.filter.merge.threads", 1,
+"How many threads are used for merging bloom filters?\n"
++ "-1: sanity check, it will fail if execution hits bloom filter 
merge codepath\n"
++ " 0: feature is disabled\n"

Review comment:
   agree, added





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469608)
Time Spent: 4h 50m  (was: 4h 40m)

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0   0
> Reducer 2 llap   RUNNING  1  010  
>  0   0
> Map 4 llap   RUNNING   6154  0  207 5947  
>  0   0
> Reducer 5 llapINITED 43  00   43  
>  0   0
> Reducer 6 llapINITED  1  001  
>  0   0
> --
> VERTICES: 02/06  [>>--] 16%   ELAPSED TIME: 149.98 s
> --
> {code}
> For example, 70M entries in bloom filter leads to a 436 465 696 bits, so 
> merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR 
> operation, which 

[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469605=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469605
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 08:58
Start Date: 12/Aug/20 08:58
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469110972



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4330,6 +4330,12 @@ private static void 
populateLlapDaemonVarsSet(Set llapDaemonVarsSetLocal
 "Bloom filter should be of at max certain size to be effective"),
 TEZ_BLOOM_FILTER_FACTOR("hive.tez.bloom.filter.factor", (float) 1.0,
 "Bloom filter should be a multiple of this factor with nDV"),
+TEZ_BLOOM_FILTER_MERGE_THREADS("hive.tez.bloom.filter.merge.threads", 1,
+"How many threads are used for merging bloom filters?\n"

Review comment:
   agree, adding that





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469605)
Time Spent: 4h 40m  (was: 4.5h)

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0   0
> Reducer 2 llap   RUNNING  1  010  
>  0   0
> Map 4 llap   RUNNING   6154  0  207 5947  
>  0   0
> Reducer 5 llapINITED 43  00   43  
>  0   0
> Reducer 6 llapINITED  1  001  
>  0   0
> --
> VERTICES: 02/06  [>>--] 16%   ELAPSED TIME: 149.98 s
> --
> {code}
> For example, 70M entries in bloom filter leads to a 436 465 696 bits, so 
> merging 1263 bloom filters means running ~ 1263 * 436 465 696 bitwise OR 
> operation, which is very hot codepath, but can be parallelized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469604=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469604
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 08:56
Start Date: 12/Aug/20 08:56
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469109686



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorAggregateExpression.java
##
@@ -20,24 +20,25 @@
 
 import java.io.Serializable;
 
+import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hive.common.type.DataTypePhysicalVariation;
 import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
 import org.apache.hadoop.hive.ql.exec.vector.VectorAggregationBufferRow;
 import org.apache.hadoop.hive.ql.exec.vector.VectorAggregationDesc;
 import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
 import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
 import org.apache.hadoop.hive.ql.metadata.HiveException;
-import org.apache.hadoop.hive.ql.plan.AggregationDesc;
 import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
-import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
 import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.Mode;
 
 /**
  * Base class for aggregation expressions.
  */
 public abstract class VectorAggregateExpression  implements Serializable {
-
+  protected final Logger LOG = LoggerFactory.getLogger(getClass().getName());

Review comment:
   personally, I don't really like protected static Logger, because 
subclasses won't show the actual class name (only the parent)
   in this case, you're right, this LOG is not used in 
VectorUDAFBloomFilterMerge at all, it's useless leftover, I'm going to remove it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469604)
Time Spent: 4.5h  (was: 4h 20m)

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  

[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469603
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 08:56
Start Date: 12/Aug/20 08:56
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469109686



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorAggregateExpression.java
##
@@ -20,24 +20,25 @@
 
 import java.io.Serializable;
 
+import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hive.common.type.DataTypePhysicalVariation;
 import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
 import org.apache.hadoop.hive.ql.exec.vector.VectorAggregationBufferRow;
 import org.apache.hadoop.hive.ql.exec.vector.VectorAggregationDesc;
 import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
 import org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression;
 import org.apache.hadoop.hive.ql.metadata.HiveException;
-import org.apache.hadoop.hive.ql.plan.AggregationDesc;
 import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
-import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
 import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.Mode;
 
 /**
  * Base class for aggregation expressions.
  */
 public abstract class VectorAggregateExpression  implements Serializable {
-
+  protected final Logger LOG = LoggerFactory.getLogger(getClass().getName());

Review comment:
   personally, I don't really like protected static Logger, because 
subclasses won't show the actual class name (only the parent), but in this 
case, this LOG is not used in VectorUDAFBloomFilterMerge at all, it's useless 
leftover, I'm going to remove it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469603)
Time Spent: 4h 20m  (was: 4h 10m)

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0  

[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469600=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469600
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 08:52
Start Date: 12/Aug/20 08:52
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469107566



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java
##
@@ -77,6 +75,211 @@ public void reset() {
   // Do not change the initial bytes which contain 
NumHashFunctions/NumBits!
   Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, 
bfBytes.length, (byte) 0);
 }
+
+public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector 
inputColumn,
+int batchSize, boolean selectedInUse, int[] selected, Configuration 
conf) {
+  // already set in previous iterations, no need to call initExecutor again
+  if (numThreads == 0) {
+return false;
+  }
+  if (executor == null) {
+initExecutor(conf, batchSize);
+if (!isParallel) {
+  return false;
+}
+  }
+
+  // split every bloom filter (represented by a part of a byte[]) across 
workers
+  for (int j = 0; j < batchSize; j++) {
+if (!selectedInUse && inputColumn.noNulls) {
+  splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+  inputColumn.length[j]);
+} else if (!selectedInUse) {
+  if (!inputColumn.isNull[j]) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+inputColumn.length[j]);
+  }
+} else if (inputColumn.noNulls) {
+  int i = selected[j];
+  splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+  inputColumn.length[i]);
+} else {
+  int i = selected[j];
+  if (!inputColumn.isNull[i]) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+inputColumn.length[i]);
+  }
+}
+  }
+
+  return true;
+}
+
+private void initExecutor(Configuration conf, int batchSize) {
+  numThreads = 
conf.getInt(HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.varname,
+  HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.defaultIntVal);
+  LOG.info("Number of threads used for bloom filter merge: {}", 
numThreads);
+
+  if (numThreads < 0) {
+throw new RuntimeException(
+"invalid number of threads for bloom filter merge: " + numThreads);
+  }
+  if (numThreads == 0) { // disable parallel feature
+return; // this will leave isParallel=false
+  }
+  isParallel = true;
+  executor = Executors.newFixedThreadPool(numThreads);
+
+  workers = new BloomFilterMergeWorker[numThreads];
+  for (int f = 0; f < numThreads; f++) {
+workers[f] = new BloomFilterMergeWorker(bfBytes, 0, bfBytes.length);
+  }
+
+  for (int f = 0; f < numThreads; f++) {
+executor.submit(workers[f]);
+  }
+}
+
+public int getNumberOfWaitingMergeTasks(){
+  int size = 0;
+  for (BloomFilterMergeWorker w : workers){
+size += w.queue.size();
+  }
+  return size;
+}
+
+public int getNumberOfMergingWorkers() {
+  int working = 0;
+  for (BloomFilterMergeWorker w : workers) {
+if (w.isMerging.get()) {
+  working += 1;
+}
+  }
+  return working;
+}
+
+private static void splitVectorAcrossWorkers(BloomFilterMergeWorker[] 
workers, byte[] bytes,
+int start, int length) {
+  if (bytes == null || length == 0) {
+return;
+  }
+  /*
+   * This will split a byte[] across workers as below:
+   * let's say there are 10 workers for 7813 bytes, in this case
+   * length: 7813, elementPerBatch: 781
+   * bytes assigned to workers: inclusive lower bound, exclusive upper 
bound
+   * 1. worker: 5 -> 786
+   * 2. worker: 786 -> 1567
+   * 3. worker: 1567 -> 2348
+   * 4. worker: 2348 -> 3129
+   * 5. worker: 3129 -> 3910
+   * 6. worker: 3910 -> 4691
+   * 7. worker: 4691 -> 5472
+   * 8. worker: 5472 -> 6253
+   * 9. worker: 6253 -> 7034
+   * 10. worker: 7034 -> 7813 (last element per batch is: 779)
+   *
+   * This way, a particular worker will be given with the same part
+   * of all bloom filters along with the shared base bloom filter,
+   * so the bitwise OR function will not be a subject of threading/sync 
issues.
+   */
+  int elementPerBatch =
+  (int) Math.ceil((double) (length - 

[jira] [Updated] (HIVE-23993) Handle irrecoverable errors

2020-08-12 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23993:
---
Status: In Progress  (was: Patch Available)

> Handle irrecoverable errors
> ---
>
> Key: HIVE-23993
> URL: https://issues.apache.org/jira/browse/HIVE-23993
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, 
> HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, 
> HIVE-23993.06.patch, Retry Logic for Replication.pdf
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23993) Handle irrecoverable errors

2020-08-12 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23993:
---
Attachment: HIVE-23993.06.patch
Status: Patch Available  (was: In Progress)

> Handle irrecoverable errors
> ---
>
> Key: HIVE-23993
> URL: https://issues.apache.org/jira/browse/HIVE-23993
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, 
> HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, 
> HIVE-23993.06.patch, Retry Logic for Replication.pdf
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469596=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469596
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 08:44
Start Date: 12/Aug/20 08:44
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469102366



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFBloomFilterMerge.java
##
@@ -77,6 +75,211 @@ public void reset() {
   // Do not change the initial bytes which contain 
NumHashFunctions/NumBits!
   Arrays.fill(bfBytes, BloomKFilter.START_OF_SERIALIZED_LONGS, 
bfBytes.length, (byte) 0);
 }
+
+public boolean mergeBloomFilterBytesFromInputColumn(BytesColumnVector 
inputColumn,
+int batchSize, boolean selectedInUse, int[] selected, Configuration 
conf) {
+  // already set in previous iterations, no need to call initExecutor again
+  if (numThreads == 0) {
+return false;
+  }
+  if (executor == null) {
+initExecutor(conf, batchSize);
+if (!isParallel) {
+  return false;
+}
+  }
+
+  // split every bloom filter (represented by a part of a byte[]) across 
workers
+  for (int j = 0; j < batchSize; j++) {
+if (!selectedInUse && inputColumn.noNulls) {
+  splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+  inputColumn.length[j]);
+} else if (!selectedInUse) {
+  if (!inputColumn.isNull[j]) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[j], 
inputColumn.start[j],
+inputColumn.length[j]);
+  }
+} else if (inputColumn.noNulls) {
+  int i = selected[j];
+  splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+  inputColumn.length[i]);
+} else {
+  int i = selected[j];
+  if (!inputColumn.isNull[i]) {
+splitVectorAcrossWorkers(workers, inputColumn.vector[i], 
inputColumn.start[i],
+inputColumn.length[i]);
+  }
+}
+  }
+
+  return true;
+}
+
+private void initExecutor(Configuration conf, int batchSize) {
+  numThreads = 
conf.getInt(HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.varname,
+  HiveConf.ConfVars.TEZ_BLOOM_FILTER_MERGE_THREADS.defaultIntVal);
+  LOG.info("Number of threads used for bloom filter merge: {}", 
numThreads);
+
+  if (numThreads < 0) {
+throw new RuntimeException(
+"invalid number of threads for bloom filter merge: " + numThreads);
+  }
+  if (numThreads == 0) { // disable parallel feature
+return; // this will leave isParallel=false
+  }
+  isParallel = true;
+  executor = Executors.newFixedThreadPool(numThreads);
+
+  workers = new BloomFilterMergeWorker[numThreads];
+  for (int f = 0; f < numThreads; f++) {
+workers[f] = new BloomFilterMergeWorker(bfBytes, 0, bfBytes.length);
+  }
+
+  for (int f = 0; f < numThreads; f++) {
+executor.submit(workers[f]);
+  }
+}
+
+public int getNumberOfWaitingMergeTasks(){
+  int size = 0;
+  for (BloomFilterMergeWorker w : workers){
+size += w.queue.size();
+  }
+  return size;
+}
+
+public int getNumberOfMergingWorkers() {

Review comment:
   yeah, only for logging, it was for validating my executor shutdown 
correctness...that can be misleading, I'm removing it 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469596)
Time Spent: 4h  (was: 3h 50m)

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected 

[jira] [Work logged] (HIVE-23880) Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23880?focusedWorklogId=469594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469594
 ]

ASF GitHub Bot logged work on HIVE-23880:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 08:39
Start Date: 12/Aug/20 08:39
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on a change in pull request #1280:
URL: https://github.com/apache/hive/pull/1280#discussion_r469099535



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorGroupByOperator.java
##
@@ -1126,6 +1137,7 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
 VectorAggregateExpression vecAggrExpr = null;
 try {
   vecAggrExpr = ctor.newInstance(vecAggrDesc);
+  vecAggrExpr.withConf(hconf);

Review comment:
   1. constructor: first I tried to pass it to constructor, but that breaks 
compatibility with every other subclasses of VectorAggregateExpression, if I 
use ctor.newInstance(vecAggrDesc, hconf), I need to add that constructor to all 
subclasses, because they don't inherit this ctor from 
VectorAggregateExpression...withConf can solve this, let me know about better 
ways
   
   2. single int: this config is specific to VectorUDAFBloomFilterMerge, I 
think I should pass it through a constructor to every VectorAggregateExpressio, 
and I didn't want to go for an instanceof hack for a cast + specific call





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469594)
Time Spent: 3h 50m  (was: 3h 40m)

> Bloom filters can be merged in a parallel way in VectorUDAFBloomFilterMerge
> ---
>
> Key: HIVE-23880
> URL: https://issues.apache.org/jira/browse/HIVE-23880
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Attachments: lipwig-output3605036885489193068.svg
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Merging bloom filters in semijoin reduction can become the main bottleneck in 
> case of large number of source mapper tasks (~1000, Map 1 in below example) 
> and a large amount of expected entries (50M) in bloom filters.
> For example in TPCDS Q93:
> {code}
> select /*+ semi(store_returns, sr_item_sk, store_sales, 7000)*/ 
> ss_customer_sk
> ,sum(act_sales) sumsales
>   from (select ss_item_sk
>   ,ss_ticket_number
>   ,ss_customer_sk
>   ,case when sr_return_quantity is not null then 
> (ss_quantity-sr_return_quantity)*ss_sales_price
> else 
> (ss_quantity*ss_sales_price) end act_sales
> from store_sales left outer join store_returns on (sr_item_sk = 
> ss_item_sk
>and 
> sr_ticket_number = ss_ticket_number)
> ,reason
> where sr_reason_sk = r_reason_sk
>   and r_reason_desc = 'reason 66') t
>   group by ss_customer_sk
>   order by sumsales, ss_customer_sk
> limit 100;
> {code}
> On 10TB-30TB scale there is a chance that from 3-4 mins of query runtime 1-2 
> mins are spent with merging bloom filters (Reducer 2), as in:  
> [^lipwig-output3605036885489193068.svg] 
> {code}
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED   1263   126300  
>  0   0
> Reducer 2 llap   RUNNING  1  010  
>  0   0
> Map 4 llap   RUNNING   6154  0  207 5947  
>  0   0
> Reducer 5 llapINITED 43  00   43  
>  0   0
> Reducer 6 llapINITED  1  001  
>  0   0
> --
> VERTICES: 02/06  [>>--] 16%   ELAPSED TIME: 149.98 s
> 

[jira] [Updated] (HIVE-23993) Handle irrecoverable errors

2020-08-12 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23993:
---
Status: In Progress  (was: Patch Available)

> Handle irrecoverable errors
> ---
>
> Key: HIVE-23993
> URL: https://issues.apache.org/jira/browse/HIVE-23993
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, 
> HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, Retry Logic 
> for Replication.pdf
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23993) Handle irrecoverable errors

2020-08-12 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23993:
---
Attachment: HIVE-23993.05.patch
Status: Patch Available  (was: In Progress)

> Handle irrecoverable errors
> ---
>
> Key: HIVE-23993
> URL: https://issues.apache.org/jira/browse/HIVE-23993
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, 
> HIVE-23993.03.patch, HIVE-23993.04.patch, HIVE-23993.05.patch, Retry Logic 
> for Replication.pdf
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23994) TestRetryable is unstable

2020-08-12 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi resolved HIVE-23994.

Resolution: Fixed

> TestRetryable is unstable
> -
>
> Key: HIVE-23994
> URL: https://issues.apache.org/jira/browse/HIVE-23994
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Aasha Medhi
>Priority: Major
>
> The flaky test check run:
> [http://ci.hive.apache.org/job/hive-flaky-check/83/console]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23993) Handle irrecoverable errors

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23993?focusedWorklogId=469590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469590
 ]

ASF GitHub Bot logged work on HIVE-23993:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 08:33
Start Date: 12/Aug/20 08:33
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1367:
URL: https://github.com/apache/hive/pull/1367#discussion_r469095976



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -366,4 +371,26 @@ public static boolean includeAcidTableInDump(HiveConf 
conf) {
   public static boolean tableIncludedInReplScope(ReplScope replScope, String 
tableName) {
 return ((replScope == null) || 
replScope.tableIncludedInReplScope(tableName));
   }
+
+  public static boolean failedWithNonRecoverableError(Path dumpRoot, HiveConf 
conf) throws SemanticException {
+if (dumpRoot == null) {

Review comment:
   Yes





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469590)
Time Spent: 2h 10m  (was: 2h)

> Handle irrecoverable errors
> ---
>
> Key: HIVE-23993
> URL: https://issues.apache.org/jira/browse/HIVE-23993
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, 
> HIVE-23993.03.patch, HIVE-23993.04.patch, Retry Logic for Replication.pdf
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23965) Improve plan regression tests using TPCDS30TB metastore dump and custom configs

2020-08-12 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176161#comment-17176161
 ] 

Stamatis Zampetakis commented on HIVE-23965:


Before finalizing this case it may be good to decide if we want to keep or 
delete the old TPC-DS drivers (or better say tests) namely 
TestTezPerfCliDriver, TestTezPerfConstraintsCliDriver as part of this issue.

A few advantages/disadvantages of keeping the old tests are outlined below.

+Advantages:+
* Better code coverage. Due to differences in stats, and configurations we end 
up with different plans so potentially we are covering different codepaths.
* The old drivers can be run on local dev environment without the need of 
installing Docker.

+Disadvantages+
* Maintenance cost. Now for every change in the planner we may need to update 
the results from three drivers (~300 queries).
* Unrealistic plans. As mentioned previously the table stats are not obtained 
from a single TPC-DS scale factor so we in some cases we may never see this 
plans in practice; this can be seen also as a bug that we could possibly fix.
* Test execution time. Obviously running three test suites instead of one is 
going to take more time and there is nothing that we can do about it.

In case there is disagreement we can keep them for now and postpone the 
decision in another JIRA. Let me know what you think.

> Improve plan regression tests using TPCDS30TB metastore dump and custom 
> configs
> ---
>
> Key: HIVE-23965
> URL: https://issues.apache.org/jira/browse/HIVE-23965
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The existing regression tests (HIVE-12586) based on TPC-DS have certain 
> shortcomings:
> The table statistics do not reflect cardinalities from a specific TPC-DS 
> scale factor (SF). Some tables are from a 30TB dataset, others from 200GB 
> dataset, and others from a 3GB dataset. This mix leads to plans that may 
> never appear when using an actual TPC-DS dataset. 
> The existing statistics do not contain information about partitions something 
> that can have a big impact on the resulting plans.
> The existing regression tests rely on more or less on the default 
> configuration (hive-site.xml). In real-life scenarios though some of the 
> configurations differ and may impact the choices of the optimizer.
> This issue aims to address the above shortcomings by using a curated 
> TPCDS30TB metastore dump along with some custom hive configurations. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23993) Handle irrecoverable errors

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23993?focusedWorklogId=469569=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469569
 ]

ASF GitHub Bot logged work on HIVE-23993:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 08:08
Start Date: 12/Aug/20 08:08
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1367:
URL: https://github.com/apache/hive/pull/1367#discussion_r469081208



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java
##
@@ -35,20 +37,17 @@
 import org.apache.hadoop.hive.ql.ErrorMsg;
 import 
org.apache.hadoop.hive.ql.exec.repl.incremental.IncrementalLoadTasksBuilder;
 import org.apache.hadoop.hive.ql.exec.repl.util.ReplUtils;
-import org.apache.hadoop.hive.ql.metadata.Hive;
 import org.apache.hadoop.hive.ql.parse.repl.PathBuilder;
 import org.apache.hadoop.hive.ql.processors.CommandProcessorException;
 import org.apache.hadoop.hive.ql.util.DependencyResolver;
+import org.apache.hadoop.io.IOUtils;
 import org.apache.hadoop.security.UserGroupInformation;
 import org.junit.Assert;
 import org.junit.BeforeClass;
 import org.junit.Test;
 
 import javax.annotation.Nullable;
-import java.io.BufferedReader;
-import java.io.File;
-import java.io.IOException;
-import java.io.InputStreamReader;
+import java.io.*;

Review comment:
   Revert this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469569)
Time Spent: 2h  (was: 1h 50m)

> Handle irrecoverable errors
> ---
>
> Key: HIVE-23993
> URL: https://issues.apache.org/jira/browse/HIVE-23993
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, 
> HIVE-23993.03.patch, HIVE-23993.04.patch, Retry Logic for Replication.pdf
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23927) Cast to Timestamp generates different output for Integer & Float values

2020-08-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176135#comment-17176135
 ] 

László Bodor commented on HIVE-23927:
-

unfortunately, I cannot recall anything from ORC-554 which is related to this, 
in ORC-554 we handled an overflow case, where a float is not precise enough to 
represent a timestamp, and messes up the values in TimestampColumnVector ([fix 
is 
here|https://github.com/apache/orc/commit/7de945b080c5ca83b84397db105f70082a2107f4#diff-9090b54d59f8163ec2be71169d4813c8R1412-R1426])

this one is indeed not related to ORC/schemaevolution, but the reported problem 
is present on master, [as my repro 
shows|https://github.com/abstractdog/hive/commit/54ec318203#diff-219ede90fa98943fb8e1518350ff074dR36]



> Cast to Timestamp generates different output for Integer & Float values 
> 
>
> Key: HIVE-23927
> URL: https://issues.apache.org/jira/browse/HIVE-23927
> Project: Hive
>  Issue Type: Bug
>Reporter: Renukaprasad C
>Priority: Major
>
> Double consider the input value as SECOND and converts into Millis internally.
> Whereas, Integer value will be considered as Millis and produce different 
> output.
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(Object,
>  PrimitiveObjectInspector, boolean) - Handles Integral & Decimal values 
> differently. This cause the issue.
> 0: jdbc:hive2://localhost:1> select cast(1.204135216E9 as timestamp) 
> Double2TimeStamp, cast(1204135216 as timestamp) Int2TimeStamp from abc 
> tablesample(1 rows);
> OK
> INFO  : Compiling 
> command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14): 
> select cast(1.204135216E9 as timestamp) Double2TimeStamp, cast(1204135216 as 
> timestamp) Int2TimeStamp from abc tablesample(1 rows)
> INFO  : Concurrency mode is disabled, not creating a lock manager
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:double2timestamp, type:timestamp, 
> comment:null), FieldSchema(name:int2timestamp, type:timestamp, 
> comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14); 
> Time taken: 0.175 seconds
> INFO  : Concurrency mode is disabled, not creating a lock manager
> INFO  : Executing 
> command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14): 
> select cast(1.204135216E9 as timestamp) Double2TimeStamp, cast(1204135216 as 
> timestamp) Int2TimeStamp from abc tablesample(1 rows)
> INFO  : Completed executing 
> command(queryId=renu_20200724140642_70132390-ee12-4214-a2ca-a7e10556fc14); 
> Time taken: 0.001 seconds
> INFO  : OK
> INFO  : Concurrency mode is disabled, not creating a lock manager
> ++--+
> |double2timestamp|  int2timestamp   |
> ++--+
> | 2008-02-27 18:00:16.0  | 1970-01-14 22:28:55.216  |
> ++--+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22126) hive-exec packaging should shade guava

2020-08-12 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176126#comment-17176126
 ] 

Zoltan Haindrich commented on HIVE-22126:
-

[~zhengchenyu]: yes that's an option - however I've not seen this in hive-3 
before - but for hive-4 the same was done (HIVE-23593)

you may already know it - the issue behind this is that a shaded calcite may 
pull in thru reflection classes from the "non-shaded" version and that will 
wreck some havoc

there was an attempt to make a better fix than that; but it's not yet read 
(HIVE-23772)

> hive-exec packaging should shade guava
> --
>
> Key: HIVE-22126
> URL: https://issues.apache.org/jira/browse/HIVE-22126
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Eugene Chung
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22126.01.patch, HIVE-22126.02.patch, 
> HIVE-22126.03.patch, HIVE-22126.04.patch, HIVE-22126.05.patch, 
> HIVE-22126.06.patch, HIVE-22126.07.patch, HIVE-22126.08.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch, HIVE-22126.09.patch, 
> HIVE-22126.09.patch, HIVE-22126.09.patch
>
>
> The ql/pom.xml includes complete guava library into hive-exec.jar 
> https://github.com/apache/hive/blob/master/ql/pom.xml#L990 This causes a 
> problems for downstream clients of hive which have hive-exec.jar in their 
> classpath since they are pinned to the same guava version as that of hive. 
> We should shade guava classes so that other components which depend on 
> hive-exec can independently use a different version of guava as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23922?focusedWorklogId=469561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469561
 ]

ASF GitHub Bot logged work on HIVE-23922:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 07:39
Start Date: 12/Aug/20 07:39
Worklog Time Spent: 10m 
  Work Description: dh20 commented on pull request #1307:
URL: https://github.com/apache/hive/pull/1307#issuecomment-672693891


   @sunchao thanks very mach!
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469561)
Time Spent: 50m  (was: 40m)

> Improve code quality, UDFArgumentException.getMessage Method requires only 
> two parameters
> -
>
> Key: HIVE-23922
> URL: https://issues.apache.org/jira/browse/HIVE-23922
> Project: Hive
>  Issue Type: Improvement
>Reporter: hao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [UDFArgumentException.getMessage] This method only needs two parameters, 
> message and methods. The rest parameters are not used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23993) Handle irrecoverable errors

2020-08-12 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23993:
---
Status: In Progress  (was: Patch Available)

> Handle irrecoverable errors
> ---
>
> Key: HIVE-23993
> URL: https://issues.apache.org/jira/browse/HIVE-23993
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, 
> HIVE-23993.03.patch, HIVE-23993.04.patch, Retry Logic for Replication.pdf
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23993) Handle irrecoverable errors

2020-08-12 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23993:
---
Attachment: HIVE-23993.04.patch
Status: Patch Available  (was: In Progress)

> Handle irrecoverable errors
> ---
>
> Key: HIVE-23993
> URL: https://issues.apache.org/jira/browse/HIVE-23993
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, 
> HIVE-23993.03.patch, HIVE-23993.04.patch, Retry Logic for Replication.pdf
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23922?focusedWorklogId=469546=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469546
 ]

ASF GitHub Bot logged work on HIVE-23922:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 07:13
Start Date: 12/Aug/20 07:13
Worklog Time Spent: 10m 
  Work Description: dh20 commented on pull request #1307:
URL: https://github.com/apache/hive/pull/1307#issuecomment-672672560


   > can you remove `typeNames` as well in the method body? it is also not used.
   
   
   > @dh20 looks good - can you remove `typeNames` as well in the method body? 
it is also not used.
   
   Yes, my colleagues have submitted this question to pr [HIVE-23996]



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469546)
Time Spent: 0.5h  (was: 20m)

> Improve code quality, UDFArgumentException.getMessage Method requires only 
> two parameters
> -
>
> Key: HIVE-23922
> URL: https://issues.apache.org/jira/browse/HIVE-23922
> Project: Hive
>  Issue Type: Improvement
>Reporter: hao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [UDFArgumentException.getMessage] This method only needs two parameters, 
> message and methods. The rest parameters are not used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23922?focusedWorklogId=469548=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469548
 ]

ASF GitHub Bot logged work on HIVE-23922:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 07:13
Start Date: 12/Aug/20 07:13
Worklog Time Spent: 10m 
  Work Description: dh20 edited a comment on pull request #1307:
URL: https://github.com/apache/hive/pull/1307#issuecomment-672672560


   > can you remove `typeNames` as well in the method body? it is also not used.
   
   
   > @dh20 looks good - can you remove `typeNames` as well in the method body? 
it is also not used.
   
   @sunchao Yes, my colleagues have submitted this question to pr [HIVE-23996]



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469548)
Time Spent: 40m  (was: 0.5h)

> Improve code quality, UDFArgumentException.getMessage Method requires only 
> two parameters
> -
>
> Key: HIVE-23922
> URL: https://issues.apache.org/jira/browse/HIVE-23922
> Project: Hive
>  Issue Type: Improvement
>Reporter: hao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [UDFArgumentException.getMessage] This method only needs two parameters, 
> message and methods. The rest parameters are not used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23922?focusedWorklogId=469538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469538
 ]

ASF GitHub Bot logged work on HIVE-23922:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 06:41
Start Date: 12/Aug/20 06:41
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1307:
URL: https://github.com/apache/hive/pull/1307#issuecomment-672641648


   @dh20 looks good - can you remove `typeNames` as well in the method body? it 
is also not used.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469538)
Time Spent: 20m  (was: 10m)

> Improve code quality, UDFArgumentException.getMessage Method requires only 
> two parameters
> -
>
> Key: HIVE-23922
> URL: https://issues.apache.org/jira/browse/HIVE-23922
> Project: Hive
>  Issue Type: Improvement
>Reporter: hao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [UDFArgumentException.getMessage] This method only needs two parameters, 
> message and methods. The rest parameters are not used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23993) Handle irrecoverable errors

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23993?focusedWorklogId=469537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469537
 ]

ASF GitHub Bot logged work on HIVE-23993:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 06:39
Start Date: 12/Aug/20 06:39
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1367:
URL: https://github.com/apache/hive/pull/1367#discussion_r469036313



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java
##
@@ -418,6 +420,27 @@ private void analyzeReplLoad(ASTNode ast) throws 
SemanticException {
 }
   }
 
+  private Path getLatestDumpPath() throws IOException {

Review comment:
   We can reuse the same code in ReplDumpTask





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469537)
Time Spent: 1h 50m  (was: 1h 40m)

> Handle irrecoverable errors
> ---
>
> Key: HIVE-23993
> URL: https://issues.apache.org/jira/browse/HIVE-23993
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, 
> HIVE-23993.03.patch, Retry Logic for Replication.pdf
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23922:
--
Labels: pull-request-available  (was: )

> Improve code quality, UDFArgumentException.getMessage Method requires only 
> two parameters
> -
>
> Key: HIVE-23922
> URL: https://issues.apache.org/jira/browse/HIVE-23922
> Project: Hive
>  Issue Type: Improvement
>Reporter: hao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [UDFArgumentException.getMessage] This method only needs two parameters, 
> message and methods. The rest parameters are not used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23922) Improve code quality, UDFArgumentException.getMessage Method requires only two parameters

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23922?focusedWorklogId=469534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469534
 ]

ASF GitHub Bot logged work on HIVE-23922:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 06:34
Start Date: 12/Aug/20 06:34
Worklog Time Spent: 10m 
  Work Description: dh20 commented on pull request #1307:
URL: https://github.com/apache/hive/pull/1307#issuecomment-672638988


   > @dh20 can you fix the title and the JIRA number? 
[HIVE-23896](https://issues.apache.org/jira/browse/HIVE-23896) seems unrelated 
to this PR.
   
   @sunchao Sorry, I made a mistake in Jire number. Now I have corrected it



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469534)
Remaining Estimate: 0h
Time Spent: 10m

> Improve code quality, UDFArgumentException.getMessage Method requires only 
> two parameters
> -
>
> Key: HIVE-23922
> URL: https://issues.apache.org/jira/browse/HIVE-23922
> Project: Hive
>  Issue Type: Improvement
>Reporter: hao
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [UDFArgumentException.getMessage] This method only needs two parameters, 
> message and methods. The rest parameters are not used



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23993) Handle irrecoverable errors

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23993?focusedWorklogId=469532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469532
 ]

ASF GitHub Bot logged work on HIVE-23993:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 06:29
Start Date: 12/Aug/20 06:29
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1367:
URL: https://github.com/apache/hive/pull/1367#discussion_r469032996



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java
##
@@ -366,4 +371,26 @@ public static boolean includeAcidTableInDump(HiveConf 
conf) {
   public static boolean tableIncludedInReplScope(ReplScope replScope, String 
tableName) {
 return ((replScope == null) || 
replScope.tableIncludedInReplScope(tableName));
   }
+
+  public static boolean failedWithNonRecoverableError(Path dumpRoot, HiveConf 
conf) throws SemanticException {
+if (dumpRoot == null) {

Review comment:
   Is this also applicable during load? I mean, can the dumpRoot here be 
null in load case as well?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469532)
Time Spent: 1h 40m  (was: 1.5h)

> Handle irrecoverable errors
> ---
>
> Key: HIVE-23993
> URL: https://issues.apache.org/jira/browse/HIVE-23993
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23993.01.patch, HIVE-23993.02.patch, 
> HIVE-23993.03.patch, Retry Logic for Replication.pdf
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23998) Upgrave Guava to 27 for Hive 2.3

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23998?focusedWorklogId=469530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469530
 ]

ASF GitHub Bot logged work on HIVE-23998:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 06:23
Start Date: 12/Aug/20 06:23
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #1395:
URL: https://github.com/apache/hive/pull/1395#issuecomment-672634649


   @sunchao Yeah, seems the tests was triggered.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469530)
Time Spent: 4h 50m  (was: 4h 40m)

> Upgrave Guava to 27 for Hive 2.3
> 
>
> Key: HIVE-23998
> URL: https://issues.apache.org/jira/browse/HIVE-23998
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.7
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23998.01.branch-2.3.patch
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Try to upgrade Guava to 27.0-jre for Hive 2.3 branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23998) Upgrave Guava to 27 for Hive 2.3

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23998?focusedWorklogId=469529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469529
 ]

ASF GitHub Bot logged work on HIVE-23998:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 06:22
Start Date: 12/Aug/20 06:22
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #1395:
URL: https://github.com/apache/hive/pull/1395#issuecomment-672634424


   cc @sunchao 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469529)
Time Spent: 4h 40m  (was: 4.5h)

> Upgrave Guava to 27 for Hive 2.3
> 
>
> Key: HIVE-23998
> URL: https://issues.apache.org/jira/browse/HIVE-23998
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.7
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23998.01.branch-2.3.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Try to upgrade Guava to 27.0-jre for Hive 2.3 branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23998) Upgrave Guava to 27 for Hive 2.3

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23998?focusedWorklogId=469527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469527
 ]

ASF GitHub Bot logged work on HIVE-23998:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 06:22
Start Date: 12/Aug/20 06:22
Worklog Time Spent: 10m 
  Work Description: viirya opened a new pull request #1395:
URL: https://github.com/apache/hive/pull/1395


   
   
   ### What changes were proposed in this pull request?
   
   
   This PR proposes to upgrade Guava to 27 in Hive branch-2. This is basically 
used to trigger test for #1394.
   
   ### Why are the changes needed?
   
   
   When trying to upgrade Guava in Spark, found the following error. A Guava 
method became package-private since Guava version 20. So there is 
incompatibility with Guava versions > 19.0.
   
   ```
   sbt.ForkMain$ForkError: sbt.ForkMain$ForkError: 
java.lang.IllegalAccessError: tried to access method 
com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator;
 from class org.apache.hadoop.hive.ql.exec.FetchOperator
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:108)
at 
org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   Yes. This upgrades Guava to 27.
   
   ### How was this patch tested?
   
   
   Built Hive locally.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469527)
Time Spent: 4.5h  (was: 4h 20m)

> Upgrave Guava to 27 for Hive 2.3
> 
>
> Key: HIVE-23998
> URL: https://issues.apache.org/jira/browse/HIVE-23998
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.7
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23998.01.branch-2.3.patch
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Try to upgrade Guava to 27.0-jre for Hive 2.3 branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23896) hiveserver2 not listening on any port, am i miss some configurations?

2020-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23896?focusedWorklogId=469524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-469524
 ]

ASF GitHub Bot logged work on HIVE-23896:
-

Author: ASF GitHub Bot
Created on: 12/Aug/20 06:19
Start Date: 12/Aug/20 06:19
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1307:
URL: https://github.com/apache/hive/pull/1307#issuecomment-672633192


   @dh20 can you fix the title and the JIRA number? HIVE-23896 seems unrelated 
to this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 469524)
Time Spent: 1h 40m  (was: 1.5h)

> hiveserver2 not listening on any port, am i miss some configurations?
> -
>
> Key: HIVE-23896
> URL: https://issues.apache.org/jira/browse/HIVE-23896
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.2
> Environment: hive: 3.1.2
> hadoop: 3.2.1, standalone, url: hdfs://namenode.hadoop.svc.cluster.local:9000
> {quote}$ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp
>  $ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
> {quote}
> hadoop commands  are workable in the hiveserver node(POD).
>  
>Reporter: alanwake
>Priority: Blocker
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
>  
>  
> i try deply hive 3.1.2 on k8s.  it was worked on version 2.3.2.
> metastore node and postgres node are ok, but hiveserver look like i miss some 
> important configuration properties?
> {code:java}
>  {code}
>  
>  
>  
> {code:java}
> [root@master hive]# ./get.sh 
> NAME READY   STATUSRESTARTS   AGE   IP
>  NODE   NOMINATED NODE   READINESS GATES
> hive-7bd48747d4-5zjmh1/1 Running   0  56s   10.244.3.110  
>  node03.51.local  
> metastore-66b58f9f76-6wsxj   1/1 Running   0  56s   10.244.3.109  
>  node03.51.local  
> postgres-57794b99b7-pqxwm1/1 Running   0  56s   10.244.2.241  
>  node02.51.local  NAMETYPECLUSTER-IP  
>  EXTERNAL-IP   PORT(S)   AGE   SELECTOR
> hiveNodePort10.108.40.17 
> 10002:30626/TCP,1:31845/TCP   56s   app=hive
> metastore   ClusterIP   10.106.159.220   9083/TCP   
>56s   app=metastore
> postgresClusterIP   10.108.85.47 5432/TCP   
>56s   app=postgres
> {code}
>  
>  
> {code:java}
> [root@master hive]# kubectl logs hive-7bd48747d4-5zjmh -n=hive
> Configuring core
>  - Setting hadoop.proxyuser.hue.hosts=*
>  - Setting fs.defaultFS=hdfs://namenode.hadoop.svc.cluster.local:9000
>  - Setting hadoop.http.staticuser.user=root
>  - Setting hadoop.proxyuser.hue.groups=*
> Configuring hdfs
>  - Setting dfs.namenode.datanode.registration.ip-hostname-check=false
>  - Setting dfs.webhdfs.enabled=true
>  - Setting dfs.permissions.enabled=false
> Configuring yarn
>  - Setting yarn.timeline-service.enabled=true
>  - Setting yarn.resourcemanager.system-metrics-publisher.enabled=true
>  - Setting 
> yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
>  - Setting 
> yarn.log.server.url=http://historyserver.hadoop.svc.cluster.local:8188/applicationhistory/logs/
>  - Setting yarn.resourcemanager.fs.state-store.uri=/rmstate
>  - Setting yarn.timeline-service.generic-application-history.enabled=true
>  - Setting yarn.log-aggregation-enable=true
>  - Setting 
> yarn.resourcemanager.hostname=resourcemanager.hadoop.svc.cluster.local
>  - Setting 
> yarn.resourcemanager.resource.tracker.address=resourcemanager.hadoop.svc.cluster.local:8031
>  - Setting 
> yarn.timeline-service.hostname=historyserver.hadoop.svc.cluster.local
>  - Setting 
> yarn.resourcemanager.scheduler.address=resourcemanager.hadoop.svc.cluster.local:8030
>  - Setting 
> yarn.resourcemanager.address=resourcemanager.hadoop.svc.cluster.local:8032
>  - Setting yarn.nodemanager.remote-app-log-dir=/app-logs
>  - Setting yarn.resourcemanager.recovery.enabled=true
> Configuring httpfs
> Configuring kms
> Configuring mapred
> Configuring hive
>  - Setting datanucleus.autoCreateSchema=false
>  - Setting javax.jdo.option.ConnectionPassword=hive
>  - Setting hive.metastore.uris=thrift://metastore:9083
>  - Setting 
> 

[jira] [Updated] (HIVE-23995) Don't set location for managed tables in case of replication

2020-08-12 Thread Anishek Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anishek Agarwal updated HIVE-23995:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

committed to master , Thanks for the patch [~aasha] and review [~pkumarsinha]

> Don't set location for managed tables in case of replication
> 
>
> Key: HIVE-23995
> URL: https://issues.apache.org/jira/browse/HIVE-23995
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23995.01.patch, HIVE-23995.02.patch, 
> HIVE-23995.03.patch, HIVE-23995.04.patch, HIVE-23995.05.patch, 
> HIVE-23995.06.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Managed table location should not be set
> Migration code of replication should be removed
> add logging to all ack files
> set hive.repl.data.copy.lazy to true



--
This message was sent by Atlassian Jira
(v8.3.4#803005)