[jira] [Work logged] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-08-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?focusedWorklogId=476335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-476335
 ]

ASF GitHub Bot logged work on HIVE-18284:
-

Author: ASF GitHub Bot
Created on: 31/Aug/20 05:25
Start Date: 31/Aug/20 05:25
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on a change in pull request #1400:
URL: https://github.com/apache/hive/pull/1400#discussion_r479894128



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplicationUtils.java
##
@@ -181,6 +183,23 @@ public static boolean merge(HiveConf hiveConf, 
ReduceSinkOperator cRS, ReduceSin
 TableDesc keyTable = PlanUtils.getReduceKeyTableDesc(new 
ArrayList(), pRS
 .getConf().getOrder(), pRS.getConf().getNullOrder());
 pRS.getConf().setKeySerializeInfo(keyTable);
+  } else if (cRS.getConf().getKeyCols() != null && 
cRS.getConf().getKeyCols().size() > 0) {
+ArrayList keyColNames = Lists.newArrayList();
+for (ExprNodeDesc keyCol : pRS.getConf().getKeyCols()) {
+  String keyColName = keyCol.getExprString();
+  keyColNames.add(keyColName);
+}
+List fields = 
PlanUtils.getFieldSchemasFromColumnList(pRS.getConf().getKeyCols(),
+keyColNames, 0, "");
+TableDesc keyTable = PlanUtils.getReduceKeyTableDesc(fields, 
pRS.getConf().getOrder(),
+pRS.getConf().getNullOrder());
+ArrayList outputKeyCols = Lists.newArrayList();
+for (int i = 0; i < fields.size(); i++) {
+  outputKeyCols.add(fields.get(i).getName());
+}
+pRS.getConf().setOutputKeyColumnNames(outputKeyCols);
+pRS.getConf().setKeySerializeInfo(keyTable);
+
pRS.getConf().setNumDistributionKeys(cRS.getConf().getNumDistributionKeys());
   }

Review comment:
   Such case would arise only when both pRS keyCol is not empty and cRS 
keyCol is empty, In such cases wouldn't it be correct to return to true and go 
with the pRS values. I mean by the time the program pointer reaches here there 
would have been some merging of cRS to pRS would have happened upstream.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 476335)
Time Spent: 1h 10m  (was: 1h)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) 

[jira] [Work logged] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-08-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?focusedWorklogId=476334=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-476334
 ]

ASF GitHub Bot logged work on HIVE-18284:
-

Author: ASF GitHub Bot
Created on: 31/Aug/20 05:24
Start Date: 31/Aug/20 05:24
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on a change in pull request #1400:
URL: https://github.com/apache/hive/pull/1400#discussion_r479894128



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplicationUtils.java
##
@@ -181,6 +183,23 @@ public static boolean merge(HiveConf hiveConf, 
ReduceSinkOperator cRS, ReduceSin
 TableDesc keyTable = PlanUtils.getReduceKeyTableDesc(new 
ArrayList(), pRS
 .getConf().getOrder(), pRS.getConf().getNullOrder());
 pRS.getConf().setKeySerializeInfo(keyTable);
+  } else if (cRS.getConf().getKeyCols() != null && 
cRS.getConf().getKeyCols().size() > 0) {
+ArrayList keyColNames = Lists.newArrayList();
+for (ExprNodeDesc keyCol : pRS.getConf().getKeyCols()) {
+  String keyColName = keyCol.getExprString();
+  keyColNames.add(keyColName);
+}
+List fields = 
PlanUtils.getFieldSchemasFromColumnList(pRS.getConf().getKeyCols(),
+keyColNames, 0, "");
+TableDesc keyTable = PlanUtils.getReduceKeyTableDesc(fields, 
pRS.getConf().getOrder(),
+pRS.getConf().getNullOrder());
+ArrayList outputKeyCols = Lists.newArrayList();
+for (int i = 0; i < fields.size(); i++) {
+  outputKeyCols.add(fields.get(i).getName());
+}
+pRS.getConf().setOutputKeyColumnNames(outputKeyCols);
+pRS.getConf().setKeySerializeInfo(keyTable);
+
pRS.getConf().setNumDistributionKeys(cRS.getConf().getNumDistributionKeys());
   }

Review comment:
   Such case would arise only when both pRS keyCol is not empty and cRS 
keyCol is empty, In such cases wouldn't it be better to return to true and go 
with the pRS values. I mean by the time the program pointer reaches here there 
would have been some merging of cRS to pRS would have happened upstream.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 476334)
Time Spent: 1h  (was: 50m)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) 

[jira] [Commented] (HIVE-24090) NPE while SJ reduction due to missing null check for col stats

2020-08-30 Thread Vipin Vishvkarma (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17187424#comment-17187424
 ] 

Vipin Vishvkarma commented on HIVE-24090:
-

[~zabetak] [~jcamachorodriguez] Can you please review the PR.

> NPE while SJ reduction due to missing null check for col stats
> --
>
> Key: HIVE-24090
> URL: https://issues.apache.org/jira/browse/HIVE-24090
> Project: Hive
>  Issue Type: Bug
>Reporter: Vipin Vishvkarma
>Assignee: Vipin Vishvkarma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hitting NPE while SJ reduction due to missing col stats
> {code:java}
> Error(1647)) - FAILED: NullPointerException null 
> java.lang.NullPointerException at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.updateStats(StatsUtils.java:2111) 
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.removeSemijoinOptimizationByBenefit(TezCompiler.java:1629)
>  at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.semijoinRemovalBasedTransformations(TezCompiler.java:498)
>  at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:209)
>  at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:144) 
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12642)
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11960)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

2020-08-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-18284?focusedWorklogId=476329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-476329
 ]

ASF GitHub Bot logged work on HIVE-18284:
-

Author: ASF GitHub Bot
Created on: 31/Aug/20 04:39
Start Date: 31/Aug/20 04:39
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on a change in pull request #1400:
URL: https://github.com/apache/hive/pull/1400#discussion_r479883866



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplicationUtils.java
##
@@ -181,6 +183,23 @@ public static boolean merge(HiveConf hiveConf, 
ReduceSinkOperator cRS, ReduceSin
 TableDesc keyTable = PlanUtils.getReduceKeyTableDesc(new 
ArrayList(), pRS
 .getConf().getOrder(), pRS.getConf().getNullOrder());
 pRS.getConf().setKeySerializeInfo(keyTable);
+  } else if (cRS.getConf().getKeyCols() != null && 
cRS.getConf().getKeyCols().size() > 0) {

Review comment:
   setNumDistributionKeys is a subset of keycols, We enters this conditions 
only when NumDistributionKeys of pRS is null or <= 0. Hence checking for pRS 
doesn't make sense here since we anyhow want to go with cRS.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 476329)
Time Spent: 50m  (was: 40m)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> --
>
> Key: HIVE-18284
> URL: https://issues.apache.org/jira/browse/HIVE-18284
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Aki Tanaka
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_00_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at 

[jira] [Work started] (HIVE-22622) Hive allows to create a struct with duplicate attribute names

2020-08-30 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-22622 started by Krisztian Kasa.
-
> Hive allows to create a struct with duplicate attribute names
> -
>
> Key: HIVE-22622
> URL: https://issues.apache.org/jira/browse/HIVE-22622
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Krisztian Kasa
>Priority: Major
>
> When you create at table with a struct with twice the same attribute name, 
> hive allow you to create it.
> create table test_struct( duplicateColumn struct);
> You can insert data into it :
> insert into test_struct select named_struct("id",1,"id",1);
> But you can not read it :
> select * from test_struct;
> Return : java.io.IOException: java.io.IOException: Error reading file: 
> hdfs://.../test_struct/delta_001_001_/bucket_0 ,
> We can create and insert. but fail on read the Struct part of the tables. We 
> can still read all other columns (if we have more than one) but not the 
> struct anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24095) Load partitions in parallel in the bootstrap phase

2020-08-30 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi reassigned HIVE-24095:
--


> Load partitions in parallel in the bootstrap phase
> --
>
> Key: HIVE-24095
> URL: https://issues.apache.org/jira/browse/HIVE-24095
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24094) cast is not null, the results are different in cbo is true and false

2020-08-30 Thread zhaolong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaolong updated HIVE-24094:

Attachment: image-2020-08-31-10-02-39-154.png

> cast is not null, the results are different in cbo is true and false 
> -
>
> Key: HIVE-24094
> URL: https://issues.apache.org/jira/browse/HIVE-24094
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: zhaolong
>Priority: Major
> Attachments: image-2020-08-31-10-01-26-250.png, 
> image-2020-08-31-10-02-39-154.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24094) cast is not null, the results are different in cbo is true and false

2020-08-30 Thread zhaolong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaolong updated HIVE-24094:

Description: 
1.CREATE TABLE IF NOT EXISTS testa
( 
 SEARCHWORD STRING, 
 COUNT_NUM BIGINT, 
 WORDS STRING 
) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\27' 
STORED AS TEXTFILE; 

2.insert into testa values('searchword', 1, 'a');

3.set hive.cbo.enable=false;

4.SELECT 
CASE 
 WHEN CAST(searchword as bigint) IS NOT NULL THEN CAST(CAST(searchword as 
bigint) as String) 
 ELSE searchword 
END AS WORDS, 
searchword FROM testa;

!image-2020-08-31-10-01-26-250.png!

5.set hive.cbo.enable=true;

6.SELECT 
CASE 
 WHEN CAST(searchword as bigint) IS NOT NULL THEN CAST(CAST(searchword as 
bigint) as String) 
 ELSE searchword 
END AS WORDS, 
searchword FROM testa;

!image-2020-08-31-10-02-39-154.png!

> cast is not null, the results are different in cbo is true and false 
> -
>
> Key: HIVE-24094
> URL: https://issues.apache.org/jira/browse/HIVE-24094
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: zhaolong
>Priority: Major
> Attachments: image-2020-08-31-10-01-26-250.png, 
> image-2020-08-31-10-02-39-154.png
>
>
> 1.CREATE TABLE IF NOT EXISTS testa
> ( 
>  SEARCHWORD STRING, 
>  COUNT_NUM BIGINT, 
>  WORDS STRING 
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\27' 
> STORED AS TEXTFILE; 
> 2.insert into testa values('searchword', 1, 'a');
> 3.set hive.cbo.enable=false;
> 4.SELECT 
> CASE 
>  WHEN CAST(searchword as bigint) IS NOT NULL THEN CAST(CAST(searchword as 
> bigint) as String) 
>  ELSE searchword 
> END AS WORDS, 
> searchword FROM testa;
> !image-2020-08-31-10-01-26-250.png!
> 5.set hive.cbo.enable=true;
> 6.SELECT 
> CASE 
>  WHEN CAST(searchword as bigint) IS NOT NULL THEN CAST(CAST(searchword as 
> bigint) as String) 
>  ELSE searchword 
> END AS WORDS, 
> searchword FROM testa;
> !image-2020-08-31-10-02-39-154.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24094) cast is not null, the results are different in cbo is true and false

2020-08-30 Thread zhaolong (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaolong updated HIVE-24094:

Attachment: image-2020-08-31-10-01-26-250.png

> cast is not null, the results are different in cbo is true and false 
> -
>
> Key: HIVE-24094
> URL: https://issues.apache.org/jira/browse/HIVE-24094
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: zhaolong
>Priority: Major
> Attachments: image-2020-08-31-10-01-26-250.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23747) Increase the number of parallel tasks sent to daemons from am

2020-08-30 Thread Mustafa Iman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17187357#comment-17187357
 ] 

Mustafa Iman commented on HIVE-23747:
-

This was resolved as part of https://issues.apache.org/jira/browse/HIVE-23746

> Increase the number of parallel tasks sent to daemons from am
> -
>
> Key: HIVE-23747
> URL: https://issues.apache.org/jira/browse/HIVE-23747
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>
> The number of inflight tasks from AM to a single executor is hardcoded to 1 
> currently([https://github.com/apache/hive/blob/master/llap-client/src/java/org/apache/hadoop/hive/llap/tez/LlapProtocolClientProxy.java#L57]
>  ). It does not make sense to increase this right now as communication 
> between am and daemons happen synchronously anyway. After resolving 
> https://issues.apache.org/jira/browse/HIVE-23746 this must be increased to at 
> least number of execution slots per daemon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23747) Increase the number of parallel tasks sent to daemons from am

2020-08-30 Thread Mustafa Iman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman resolved HIVE-23747.
-
Resolution: Fixed

> Increase the number of parallel tasks sent to daemons from am
> -
>
> Key: HIVE-23747
> URL: https://issues.apache.org/jira/browse/HIVE-23747
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>
> The number of inflight tasks from AM to a single executor is hardcoded to 1 
> currently([https://github.com/apache/hive/blob/master/llap-client/src/java/org/apache/hadoop/hive/llap/tez/LlapProtocolClientProxy.java#L57]
>  ). It does not make sense to increase this right now as communication 
> between am and daemons happen synchronously anyway. After resolving 
> https://issues.apache.org/jira/browse/HIVE-23746 this must be increased to at 
> least number of execution slots per daemon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24093) Remove unused hive.debug.localtask

2020-08-30 Thread Mustafa Iman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman updated HIVE-24093:

Status: Patch Available  (was: Open)

> Remove unused hive.debug.localtask
> --
>
> Key: HIVE-24093
> URL: https://issues.apache.org/jira/browse/HIVE-24093
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> hive.debug.local.task was added in HIVE-1642. Even then, it was never used. 
> It was possibly a leftover from development/debugging. There are no 
> references to either HIVEDEBUGLOCALTASK or hive.debug.localtask in the 
> codebase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24093) Remove unused hive.debug.localtask

2020-08-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24093:
--
Labels: pull-request-available  (was: )

> Remove unused hive.debug.localtask
> --
>
> Key: HIVE-24093
> URL: https://issues.apache.org/jira/browse/HIVE-24093
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> hive.debug.local.task was added in HIVE-1642. Even then, it was never used. 
> It was possibly a leftover from development/debugging. There are no 
> references to either HIVEDEBUGLOCALTASK or hive.debug.localtask in the 
> codebase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24093) Remove unused hive.debug.localtask

2020-08-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24093?focusedWorklogId=476309=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-476309
 ]

ASF GitHub Bot logged work on HIVE-24093:
-

Author: ASF GitHub Bot
Created on: 31/Aug/20 00:20
Start Date: 31/Aug/20 00:20
Worklog Time Spent: 10m 
  Work Description: mustafaiman opened a new pull request #1445:
URL: https://github.com/apache/hive/pull/1445


   hive.debug.local.task was added in HIVE-1642. Even then, it was never used. 
It was possibly a leftover from development/debugging. There are no references 
to either HIVEDEBUGLOCALTASK or hive.debug.localtask in the codebase.
   
   Change-Id: I27a385c264c362f6507eee4e29caf52f46e7dcba
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 476309)
Remaining Estimate: 0h
Time Spent: 10m

> Remove unused hive.debug.localtask
> --
>
> Key: HIVE-24093
> URL: https://issues.apache.org/jira/browse/HIVE-24093
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> hive.debug.local.task was added in HIVE-1642. Even then, it was never used. 
> It was possibly a leftover from development/debugging. There are no 
> references to either HIVEDEBUGLOCALTASK or hive.debug.localtask in the 
> codebase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24093) Remove unused hive.debug.localtask

2020-08-30 Thread Mustafa Iman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman reassigned HIVE-24093:
---


> Remove unused hive.debug.localtask
> --
>
> Key: HIVE-24093
> URL: https://issues.apache.org/jira/browse/HIVE-24093
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Minor
>
> hive.debug.local.task was added in HIVE-1642. Even then, it was never used. 
> It was possibly a leftover from development/debugging. There are no 
> references to either HIVEDEBUGLOCALTASK or hive.debug.localtask in the 
> codebase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24091) Replace multiple constraints call with getAllTableConstraints api call in query planner

2020-08-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24091:
--
Labels: pull-request-available  (was: )

> Replace multiple constraints call with getAllTableConstraints api call in 
> query planner
> ---
>
> Key: HIVE-24091
> URL: https://issues.apache.org/jira/browse/HIVE-24091
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Inorder get all the constraints of table i.e. PrimaryKey, ForeignKey, 
> UniqueConstraint ,NotNullConstraint ,DefaultConstraint ,CheckConstraint. We 
> have to do 6 different metastore call. Replace these call with one  
> getAllTableConstraints  api which provide all the constraints at once



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24091) Replace multiple constraints call with getAllTableConstraints api call in query planner

2020-08-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24091?focusedWorklogId=476272=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-476272
 ]

ASF GitHub Bot logged work on HIVE-24091:
-

Author: ASF GitHub Bot
Created on: 30/Aug/20 18:03
Start Date: 30/Aug/20 18:03
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma opened a new pull request #1444:
URL: https://github.com/apache/hive/pull/1444


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 476272)
Remaining Estimate: 0h
Time Spent: 10m

> Replace multiple constraints call with getAllTableConstraints api call in 
> query planner
> ---
>
> Key: HIVE-24091
> URL: https://issues.apache.org/jira/browse/HIVE-24091
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Inorder get all the constraints of table i.e. PrimaryKey, ForeignKey, 
> UniqueConstraint ,NotNullConstraint ,DefaultConstraint ,CheckConstraint. We 
> have to do 6 different metastore call. Replace these call with one  
> getAllTableConstraints  api which provide all the constraints at once



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-22782) Consolidate metastore call to fetch constraints

2020-08-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22782?focusedWorklogId=476227=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-476227
 ]

ASF GitHub Bot logged work on HIVE-22782:
-

Author: ASF GitHub Bot
Created on: 30/Aug/20 11:38
Start Date: 30/Aug/20 11:38
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #1419:
URL: https://github.com/apache/hive/pull/1419#discussion_r479632452



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
##
@@ -2811,6 +2811,26 @@ public GetFieldsResponse 
getFieldsRequest(GetFieldsRequest req)
 return client.get_check_constraints(req).getCheckConstraints();
   }
 
+  @Override
+  public SQLAllTableConstraints 
getAllTableConstraints(AllTableConstraintsRequest req)

Review comment:
   https://issues.apache.org/jira/browse/HIVE-24091





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 476227)
Time Spent: 1h 40m  (was: 1.5h)

> Consolidate metastore call to fetch constraints
> ---
>
> Key: HIVE-22782
> URL: https://issues.apache.org/jira/browse/HIVE-22782
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently separate calls are made to metastore to fetch constraints like Pk, 
> fk, not null etc. Since planner always retrieve these constraints we should 
> retrieve all of them in one call.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)