[jira] [Work logged] (HIVE-24009) Support partition pruning and other physical transformations for EXECUTE statement

2020-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24009?focusedWorklogId=489736=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-489736
 ]

ASF GitHub Bot logged work on HIVE-24009:
-

Author: ASF GitHub Bot
Created on: 23/Sep/20 18:16
Start Date: 23/Sep/20 18:16
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 merged pull request #1472:
URL: https://github.com/apache/hive/pull/1472


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 489736)
Time Spent: 2h  (was: 1h 50m)

> Support partition pruning and other physical transformations for EXECUTE 
> statement 
> ---
>
> Key: HIVE-24009
> URL: https://issues.apache.org/jira/browse/HIVE-24009
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Current partition pruning (compile time) isn't kicked in for EXECUTE 
> statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24009) Support partition pruning and other physical transformations for EXECUTE statement

2020-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24009?focusedWorklogId=489735=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-489735
 ]

ASF GitHub Bot logged work on HIVE-24009:
-

Author: ASF GitHub Bot
Created on: 23/Sep/20 18:14
Start Date: 23/Sep/20 18:14
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on a change in pull request 
#1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r493793444



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
##
@@ -63,19 +63,19 @@
 
   private VectorizationContext taskVectorizationContext;
 
-  protected transient JobConf jc;
-  private transient boolean inputFileChanged = false;
+  protected JobConf jc;

Review comment:
   I have updated HIVE-24005 to investigate this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 489735)
Time Spent: 1h 50m  (was: 1h 40m)

> Support partition pruning and other physical transformations for EXECUTE 
> statement 
> ---
>
> Key: HIVE-24009
> URL: https://issues.apache.org/jira/browse/HIVE-24009
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Current partition pruning (compile time) isn't kicked in for EXECUTE 
> statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24009) Support partition pruning and other physical transformations for EXECUTE statement

2020-09-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24009?focusedWorklogId=488953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-488953
 ]

ASF GitHub Bot logged work on HIVE-24009:
-

Author: ASF GitHub Bot
Created on: 23/Sep/20 04:22
Start Date: 23/Sep/20 04:22
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r492539012



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
##
@@ -63,19 +63,19 @@
 
   private VectorizationContext taskVectorizationContext;
 
-  protected transient JobConf jc;
-  private transient boolean inputFileChanged = false;
+  protected JobConf jc;

Review comment:
   Let's create a follow-up to explore whether some of them may be made 
transient again and discuss over there.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 488953)
Time Spent: 1h 40m  (was: 1.5h)

> Support partition pruning and other physical transformations for EXECUTE 
> statement 
> ---
>
> Key: HIVE-24009
> URL: https://issues.apache.org/jira/browse/HIVE-24009
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Current partition pruning (compile time) isn't kicked in for EXECUTE 
> statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24009) Support partition pruning and other physical transformations for EXECUTE statement

2020-09-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24009?focusedWorklogId=488035=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-488035
 ]

ASF GitHub Bot logged work on HIVE-24009:
-

Author: ASF GitHub Bot
Created on: 22/Sep/20 07:54
Start Date: 22/Sep/20 07:54
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r492539012



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
##
@@ -63,19 +63,19 @@
 
   private VectorizationContext taskVectorizationContext;
 
-  protected transient JobConf jc;
-  private transient boolean inputFileChanged = false;
+  protected JobConf jc;

Review comment:
   Let's create a follow-up to explore whether some of them may be made 
transient again and discuss over there.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 488035)
Time Spent: 1.5h  (was: 1h 20m)

> Support partition pruning and other physical transformations for EXECUTE 
> statement 
> ---
>
> Key: HIVE-24009
> URL: https://issues.apache.org/jira/browse/HIVE-24009
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Current partition pruning (compile time) isn't kicked in for EXECUTE 
> statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24009) Support partition pruning and other physical transformations for EXECUTE statement

2020-09-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24009?focusedWorklogId=487618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-487618
 ]

ASF GitHub Bot logged work on HIVE-24009:
-

Author: ASF GitHub Bot
Created on: 22/Sep/20 03:21
Start Date: 22/Sep/20 03:21
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on a change in pull request 
#1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r492224561



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
##
@@ -63,19 +63,19 @@
 
   private VectorizationContext taskVectorizationContext;
 
-  protected transient JobConf jc;
-  private transient boolean inputFileChanged = false;
+  protected JobConf jc;

Review comment:
   I actually tried not keeping these fields but I was running into all 
sorts of issues like unable to serialize/de-serialize or plan generating 
without metadata etc. 
   I am not sure if we need to keep all of these fields or we can selectively 
choose, I went by almost all in interest of time. If Gopal or Rajesh thinks 
that this may cause performance issue I can open a follow-up to investigate and 
choose fields selectively.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -387,6 +387,12 @@
   protected volatile boolean disableJoinMerge = false;
   protected final boolean defaultJoinMerge;
 
+  /*
+   * This is used by prepare/execute statement
+   * Prepare/Execute requires operators to be copied and cached
+   */
+  protected Map topOpsCopy = null;

Review comment:
   Original operator tree shape is changed when going through physical 
transformations and task generation (don't know why though), as a result this 
operator tree can not be used later to regenerate tasks or re-running physical 
transformations. Therefore we make a copy and cache it after operator tree is 
generated.
   I will leave a comment.

##
File path: ql/src/test/results/clientpositive/llap/constprog_dpp.q.out
##
@@ -84,12 +84,13 @@ Stage-0
 Select Operator [SEL_40] (rows=1 width=4)
   Output:["_col0"]
   TableScan [TS_24] (rows=1 width=4)
-Output:["id"]
+default@tb2,tb2,Tbl:COMPLETE,Col:NONE,Output:["id"]

Review comment:
   Yeah I think this is likely side effect of some changes in w.r.t 
serialization/de-serialization. Although this is positive side effect now that 
we have more information in explain plan.

##
File path: ql/src/test/results/clientpositive/llap/constprog_dpp.q.out
##
@@ -84,12 +84,13 @@ Stage-0
 Select Operator [SEL_40] (rows=1 width=4)
   Output:["_col0"]
   TableScan [TS_24] (rows=1 width=4)
-Output:["id"]
+default@tb2,tb2,Tbl:COMPLETE,Col:NONE,Output:["id"]
   <-Map 6 [CONTAINS] vectorized, llap
 Reduce Output Operator [RS_45]
   Limit [LIM_44] (rows=1 width=2)
 Number of rows:1
 Select Operator [SEL_43] (rows=1 width=0)
   Output:["_col0"]
   TableScan [TS_29] (rows=1 width=0)
+default@tb2,tb2,Tbl:PARTIAL,Col:COMPLETE

Review comment:
   I confirmed that this is expected. I compared this plan against master 
(with explain.user set to false) and there is no difference in the plan.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 487618)
Time Spent: 1h 20m  (was: 1h 10m)

> Support partition pruning and other physical transformations for EXECUTE 
> statement 
> ---
>
> Key: HIVE-24009
> URL: https://issues.apache.org/jira/browse/HIVE-24009
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Current partition pruning (compile time) isn't kicked in for EXECUTE 
> statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24009) Support partition pruning and other physical transformations for EXECUTE statement

2020-09-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24009?focusedWorklogId=487584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-487584
 ]

ASF GitHub Bot logged work on HIVE-24009:
-

Author: ASF GitHub Bot
Created on: 22/Sep/20 03:18
Start Date: 22/Sep/20 03:18
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r491751273



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ExecuteStatementAnalyzer.java
##
@@ -253,14 +191,17 @@ public void analyzeInternal(ASTNode root) throws 
SemanticException {
 String queryName = getQueryName(root);
 if (ss.getPreparePlans().containsKey(queryName)) {
   // retrieve cached plan from session state
-  BaseSemanticAnalyzer cachedPlan = ss.getPreparePlans().get(queryName);
+  SemanticAnalyzer cachedPlan = ss.getPreparePlans().get(queryName);
 
   // make copy of the plan
-  createTaskCopy(cachedPlan);
+  //createTaskCopy(cachedPlan);

Review comment:
   Can remove line commented out.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/PrepareStatementAnalyzer.java
##
@@ -54,6 +58,21 @@ private void savePlan(String queryName) throws 
SemanticException{
 ss.getPreparePlans().put(queryName, this);
   }
 
+  private  T makeCopy(final Object task, Class objClass) {
+ByteArrayOutputStream baos = new ByteArrayOutputStream();

Review comment:
   Can we leave a comment on this method to understand what it is trying to 
do?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
##
@@ -63,19 +63,19 @@
 
   private VectorizationContext taskVectorizationContext;
 
-  protected transient JobConf jc;
-  private transient boolean inputFileChanged = false;
+  protected JobConf jc;

Review comment:
   Do we need to keep all these fields for the plan cache in the operator, 
table, etc.? I am wondering about the implications of keeping them when the 
operator plan is serialized (i.e., whether that could have an performance 
impact). @t3rmin4t0r , @rbalamohan , could you comment on this?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ExecuteStatementAnalyzer.java
##
@@ -286,6 +227,24 @@ public void analyzeInternal(ASTNode root) throws 
SemanticException {
   this.acidFileSinks.addAll(cachedPlan.getAcidFileSinks());
   this.initCtx(cachedPlan.getCtx());
   this.ctx.setCboInfo(cachedPlan.getCboInfo());
+  this.setLoadFileWork(cachedPlan.getLoadFileWork());
+  this.setLoadTableWork(cachedPlan.getLoadTableWork());
+
+  this.setQB(cachedPlan.getQB());
+
+  ParseContext pctxt = this.getParseContext();
+  // partition pruner
+  Transform ppr = new PartitionPruner();
+  ppr.transform(pctxt);
+
+  //pctxt.setQueryProperties(this.queryProperties);
+  if (!ctx.getExplainLogical()) {
+TaskCompiler compiler = TaskCompilerFactory.getCompiler(conf, pctxt);
+compiler.init(queryState, console, db);
+compiler.compile(pctxt, rootTasks, inputs, outputs);
+fetchTask = pctxt.getFetchTask();
+//fetchTask = makeCopy(cachedPlan.getFetchTask(), 
cachedPlan.getFetchTask().getClass());

Review comment:
   This comment too.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ExecuteStatementAnalyzer.java
##
@@ -286,6 +227,24 @@ public void analyzeInternal(ASTNode root) throws 
SemanticException {
   this.acidFileSinks.addAll(cachedPlan.getAcidFileSinks());
   this.initCtx(cachedPlan.getCtx());
   this.ctx.setCboInfo(cachedPlan.getCboInfo());
+  this.setLoadFileWork(cachedPlan.getLoadFileWork());
+  this.setLoadTableWork(cachedPlan.getLoadTableWork());
+
+  this.setQB(cachedPlan.getQB());
+
+  ParseContext pctxt = this.getParseContext();
+  // partition pruner
+  Transform ppr = new PartitionPruner();
+  ppr.transform(pctxt);
+
+  //pctxt.setQueryProperties(this.queryProperties);

Review comment:
   Same, can be removed?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -387,6 +387,12 @@
   protected volatile boolean disableJoinMerge = false;
   protected final boolean defaultJoinMerge;
 
+  /*
+   * This is used by prepare/execute statement
+   * Prepare/Execute requires operators to be copied and cached
+   */
+  protected Map topOpsCopy = null;

Review comment:
   Why do you need to keep a copy instead of using the original operators? 
Could you leave a comment on that?

##
File path: ql/src/test/results/clientpositive/llap/constprog_dpp.q.out
##
@@ -84,12 +84,13 @@ Stage-0
 Select Operator [SEL_40] (rows=1 width=4)
   Output:["_col0"]
   

[jira] [Work logged] (HIVE-24009) Support partition pruning and other physical transformations for EXECUTE statement

2020-09-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24009?focusedWorklogId=487125=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-487125
 ]

ASF GitHub Bot logged work on HIVE-24009:
-

Author: ASF GitHub Bot
Created on: 21/Sep/20 17:48
Start Date: 21/Sep/20 17:48
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on a change in pull request 
#1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r492240115



##
File path: ql/src/test/results/clientpositive/llap/constprog_dpp.q.out
##
@@ -84,12 +84,13 @@ Stage-0
 Select Operator [SEL_40] (rows=1 width=4)
   Output:["_col0"]
   TableScan [TS_24] (rows=1 width=4)
-Output:["id"]
+default@tb2,tb2,Tbl:COMPLETE,Col:NONE,Output:["id"]
   <-Map 6 [CONTAINS] vectorized, llap
 Reduce Output Operator [RS_45]
   Limit [LIM_44] (rows=1 width=2)
 Number of rows:1
 Select Operator [SEL_43] (rows=1 width=0)
   Output:["_col0"]
   TableScan [TS_29] (rows=1 width=0)
+default@tb2,tb2,Tbl:PARTIAL,Col:COMPLETE

Review comment:
   I confirmed that this is expected. I compared this plan against master 
(with explain.user set to false) and there is no difference in the plan.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 487125)
Time Spent: 1h  (was: 50m)

> Support partition pruning and other physical transformations for EXECUTE 
> statement 
> ---
>
> Key: HIVE-24009
> URL: https://issues.apache.org/jira/browse/HIVE-24009
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Current partition pruning (compile time) isn't kicked in for EXECUTE 
> statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24009) Support partition pruning and other physical transformations for EXECUTE statement

2020-09-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24009?focusedWorklogId=487109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-487109
 ]

ASF GitHub Bot logged work on HIVE-24009:
-

Author: ASF GitHub Bot
Created on: 21/Sep/20 17:30
Start Date: 21/Sep/20 17:30
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on a change in pull request 
#1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r492229758



##
File path: ql/src/test/results/clientpositive/llap/constprog_dpp.q.out
##
@@ -84,12 +84,13 @@ Stage-0
 Select Operator [SEL_40] (rows=1 width=4)
   Output:["_col0"]
   TableScan [TS_24] (rows=1 width=4)
-Output:["id"]
+default@tb2,tb2,Tbl:COMPLETE,Col:NONE,Output:["id"]

Review comment:
   Yeah I think this is likely side effect of some changes in w.r.t 
serialization/de-serialization. Although this is positive side effect now that 
we have more information in explain plan.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 487109)
Time Spent: 50m  (was: 40m)

> Support partition pruning and other physical transformations for EXECUTE 
> statement 
> ---
>
> Key: HIVE-24009
> URL: https://issues.apache.org/jira/browse/HIVE-24009
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Current partition pruning (compile time) isn't kicked in for EXECUTE 
> statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24009) Support partition pruning and other physical transformations for EXECUTE statement

2020-09-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24009?focusedWorklogId=487107=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-487107
 ]

ASF GitHub Bot logged work on HIVE-24009:
-

Author: ASF GitHub Bot
Created on: 21/Sep/20 17:29
Start Date: 21/Sep/20 17:29
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on a change in pull request 
#1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r492229067



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -387,6 +387,12 @@
   protected volatile boolean disableJoinMerge = false;
   protected final boolean defaultJoinMerge;
 
+  /*
+   * This is used by prepare/execute statement
+   * Prepare/Execute requires operators to be copied and cached
+   */
+  protected Map topOpsCopy = null;

Review comment:
   Original operator tree shape is changed when going through physical 
transformations and task generation (don't know why though), as a result this 
operator tree can not be used later to regenerate tasks or re-running physical 
transformations. Therefore we make a copy and cache it after operator tree is 
generated.
   I will leave a comment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 487107)
Time Spent: 40m  (was: 0.5h)

> Support partition pruning and other physical transformations for EXECUTE 
> statement 
> ---
>
> Key: HIVE-24009
> URL: https://issues.apache.org/jira/browse/HIVE-24009
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Current partition pruning (compile time) isn't kicked in for EXECUTE 
> statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24009) Support partition pruning and other physical transformations for EXECUTE statement

2020-09-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24009?focusedWorklogId=487101=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-487101
 ]

ASF GitHub Bot logged work on HIVE-24009:
-

Author: ASF GitHub Bot
Created on: 21/Sep/20 17:22
Start Date: 21/Sep/20 17:22
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on a change in pull request 
#1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r492224561



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
##
@@ -63,19 +63,19 @@
 
   private VectorizationContext taskVectorizationContext;
 
-  protected transient JobConf jc;
-  private transient boolean inputFileChanged = false;
+  protected JobConf jc;

Review comment:
   I actually tried not keeping these fields but I was running into all 
sorts of issues like unable to serialize/de-serialize or plan generating 
without metadata etc. 
   I am not sure if we need to keep all of these fields or we can selectively 
choose, I went by almost all in interest of time. If Gopal or Rajesh thinks 
that this may cause performance issue I can open a follow-up to investigate and 
choose fields selectively.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 487101)
Time Spent: 0.5h  (was: 20m)

> Support partition pruning and other physical transformations for EXECUTE 
> statement 
> ---
>
> Key: HIVE-24009
> URL: https://issues.apache.org/jira/browse/HIVE-24009
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Current partition pruning (compile time) isn't kicked in for EXECUTE 
> statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24009) Support partition pruning and other physical transformations for EXECUTE statement

2020-09-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24009?focusedWorklogId=486726=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-486726
 ]

ASF GitHub Bot logged work on HIVE-24009:
-

Author: ASF GitHub Bot
Created on: 21/Sep/20 00:30
Start Date: 21/Sep/20 00:30
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r491751273



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ExecuteStatementAnalyzer.java
##
@@ -253,14 +191,17 @@ public void analyzeInternal(ASTNode root) throws 
SemanticException {
 String queryName = getQueryName(root);
 if (ss.getPreparePlans().containsKey(queryName)) {
   // retrieve cached plan from session state
-  BaseSemanticAnalyzer cachedPlan = ss.getPreparePlans().get(queryName);
+  SemanticAnalyzer cachedPlan = ss.getPreparePlans().get(queryName);
 
   // make copy of the plan
-  createTaskCopy(cachedPlan);
+  //createTaskCopy(cachedPlan);

Review comment:
   Can remove line commented out.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/PrepareStatementAnalyzer.java
##
@@ -54,6 +58,21 @@ private void savePlan(String queryName) throws 
SemanticException{
 ss.getPreparePlans().put(queryName, this);
   }
 
+  private  T makeCopy(final Object task, Class objClass) {
+ByteArrayOutputStream baos = new ByteArrayOutputStream();

Review comment:
   Can we leave a comment on this method to understand what it is trying to 
do?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java
##
@@ -63,19 +63,19 @@
 
   private VectorizationContext taskVectorizationContext;
 
-  protected transient JobConf jc;
-  private transient boolean inputFileChanged = false;
+  protected JobConf jc;

Review comment:
   Do we need to keep all these fields for the plan cache in the operator, 
table, etc.? I am wondering about the implications of keeping them when the 
operator plan is serialized (i.e., whether that could have an performance 
impact). @t3rmin4t0r , @rbalamohan , could you comment on this?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ExecuteStatementAnalyzer.java
##
@@ -286,6 +227,24 @@ public void analyzeInternal(ASTNode root) throws 
SemanticException {
   this.acidFileSinks.addAll(cachedPlan.getAcidFileSinks());
   this.initCtx(cachedPlan.getCtx());
   this.ctx.setCboInfo(cachedPlan.getCboInfo());
+  this.setLoadFileWork(cachedPlan.getLoadFileWork());
+  this.setLoadTableWork(cachedPlan.getLoadTableWork());
+
+  this.setQB(cachedPlan.getQB());
+
+  ParseContext pctxt = this.getParseContext();
+  // partition pruner
+  Transform ppr = new PartitionPruner();
+  ppr.transform(pctxt);
+
+  //pctxt.setQueryProperties(this.queryProperties);
+  if (!ctx.getExplainLogical()) {
+TaskCompiler compiler = TaskCompilerFactory.getCompiler(conf, pctxt);
+compiler.init(queryState, console, db);
+compiler.compile(pctxt, rootTasks, inputs, outputs);
+fetchTask = pctxt.getFetchTask();
+//fetchTask = makeCopy(cachedPlan.getFetchTask(), 
cachedPlan.getFetchTask().getClass());

Review comment:
   This comment too.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/ExecuteStatementAnalyzer.java
##
@@ -286,6 +227,24 @@ public void analyzeInternal(ASTNode root) throws 
SemanticException {
   this.acidFileSinks.addAll(cachedPlan.getAcidFileSinks());
   this.initCtx(cachedPlan.getCtx());
   this.ctx.setCboInfo(cachedPlan.getCboInfo());
+  this.setLoadFileWork(cachedPlan.getLoadFileWork());
+  this.setLoadTableWork(cachedPlan.getLoadTableWork());
+
+  this.setQB(cachedPlan.getQB());
+
+  ParseContext pctxt = this.getParseContext();
+  // partition pruner
+  Transform ppr = new PartitionPruner();
+  ppr.transform(pctxt);
+
+  //pctxt.setQueryProperties(this.queryProperties);

Review comment:
   Same, can be removed?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##
@@ -387,6 +387,12 @@
   protected volatile boolean disableJoinMerge = false;
   protected final boolean defaultJoinMerge;
 
+  /*
+   * This is used by prepare/execute statement
+   * Prepare/Execute requires operators to be copied and cached
+   */
+  protected Map topOpsCopy = null;

Review comment:
   Why do you need to keep a copy instead of using the original operators? 
Could you leave a comment on that?

##
File path: ql/src/test/results/clientpositive/llap/constprog_dpp.q.out
##
@@ -84,12 +84,13 @@ Stage-0
 Select Operator [SEL_40] (rows=1 width=4)
   Output:["_col0"]
   

[jira] [Work logged] (HIVE-24009) Support partition pruning and other physical transformations for EXECUTE statement

2020-09-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24009?focusedWorklogId=484202=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-484202
 ]

ASF GitHub Bot logged work on HIVE-24009:
-

Author: ASF GitHub Bot
Created on: 14/Sep/20 20:41
Start Date: 14/Sep/20 20:41
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on a change in pull request 
#1472:
URL: https://github.com/apache/hive/pull/1472#discussion_r488207028



##
File path: ql/src/test/queries/clientnegative/prepare_execute_1.q
##
@@ -1,3 +0,0 @@
---! qt:dataset:src
-prepare query1 from select count(*) from src where key > ? and value < ? group 
by ?;
-execute query1 using 1,100,1;

Review comment:
   This query no longer fails. I have opened a follow-up to fix this 
https://issues.apache.org/jira/browse/HIVE-24164





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 484202)
Remaining Estimate: 0h
Time Spent: 10m

> Support partition pruning and other physical transformations for EXECUTE 
> statement 
> ---
>
> Key: HIVE-24009
> URL: https://issues.apache.org/jira/browse/HIVE-24009
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Current partition pruning (compile time) isn't kicked in for EXECUTE 
> statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)