[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=269511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269511
 ]

ASF GitHub Bot logged work on HIVE-21867:
-

Author: ASF GitHub Bot
Created on: 28/Jun/19 20:27
Start Date: 28/Jun/19 20:27
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #687: HIVE-21867
URL: https://github.com/apache/hive/pull/687
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269511)
Time Spent: 1h 40m  (was: 1.5h)

> Sort semijoin conditions to accelerate query processing
> ---
>
> Key: HIVE-21867
> URL: https://issues.apache.org/jira/browse/HIVE-21867
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Affects Versions: 4.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, 
> HIVE-21867.04.patch, HIVE-21867.05.patch, HIVE-21867.05.patch, 
> HIVE-21867.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The problem was tackled for CBO in HIVE-21857. Semijoin filters are 
> introduced later in the planning phase. Follow similar approach to sort them, 
> trying to accelerate filter evaluation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing

2019-06-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=268842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-268842
 ]

ASF GitHub Bot logged work on HIVE-21867:
-

Author: ASF GitHub Bot
Created on: 27/Jun/19 21:19
Start Date: 27/Jun/19 21:19
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on issue #687: HIVE-21867
URL: https://github.com/apache/hive/pull/687#issuecomment-506514522
 
 
   @vineetgarg02 , I updated the PR. Note that ```hybridgrace_hashjoin_2.q``` 
issue will be tackled in https://issues.apache.org/jira/browse/HIVE-21928 so I 
think we can proceed with this issue. In the latest patch I also added some 
simple logic to ```SharedWorkOptimizer``` to remove some duplicate filter 
expressions remaining in the plan because of change in the shape of the 
expression (problem was existing, patch just exposed it).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 268842)
Time Spent: 1.5h  (was: 1h 20m)

> Sort semijoin conditions to accelerate query processing
> ---
>
> Key: HIVE-21867
> URL: https://issues.apache.org/jira/browse/HIVE-21867
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, 
> HIVE-21867.04.patch, HIVE-21867.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The problem was tackled for CBO in HIVE-21857. Semijoin filters are 
> introduced later in the planning phase. Follow similar approach to sort them, 
> trying to accelerate filter evaluation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing

2019-06-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=268841=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-268841
 ]

ASF GitHub Bot logged work on HIVE-21867:
-

Author: ASF GitHub Bot
Created on: 27/Jun/19 21:19
Start Date: 27/Jun/19 21:19
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on issue #687: HIVE-21867
URL: https://github.com/apache/hive/pull/687#issuecomment-506514522
 
 
   @vineetgarg02 , I updated the PR. Note that ```hybridgrace_hashjoin_2.q``` 
issue will be tackled in HIVE-21928 so I think we can proceed with this issue. 
In the latest patch I also added some simple logic to ```SharedWorkOptimizer``` 
to remove some duplicate filter expressions remaining in the plan because of 
change in the shape of the expression (problem was existing, patch just exposed 
it).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 268841)
Time Spent: 1h 20m  (was: 1h 10m)

> Sort semijoin conditions to accelerate query processing
> ---
>
> Key: HIVE-21867
> URL: https://issues.apache.org/jira/browse/HIVE-21867
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, 
> HIVE-21867.04.patch, HIVE-21867.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The problem was tackled for CBO in HIVE-21857. Semijoin filters are 
> introduced later in the planning phase. Follow similar approach to sort them, 
> trying to accelerate filter evaluation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing

2019-06-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=268837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-268837
 ]

ASF GitHub Bot logged work on HIVE-21867:
-

Author: ASF GitHub Bot
Created on: 27/Jun/19 21:15
Start Date: 27/Jun/19 21:15
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on issue #687: HIVE-21867
URL: https://github.com/apache/hive/pull/687#issuecomment-506514522
 
 
   @vineetgarg02 , I updated the PR. Note that ```hybridgrace_hashjoin_2.q``` 
issue will be tackled in HIVE-20260 so I think we can proceed with this issue. 
In the latest patch I also added some simple logic to ```SharedWorkOptimizer``` 
to remove some duplicate filter expressions remaining in the plan because of 
change in the shape of the expression (problem was existing, patch just exposed 
it).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 268837)
Time Spent: 1h 10m  (was: 1h)

> Sort semijoin conditions to accelerate query processing
> ---
>
> Key: HIVE-21867
> URL: https://issues.apache.org/jira/browse/HIVE-21867
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, 
> HIVE-21867.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The problem was tackled for CBO in HIVE-21857. Semijoin filters are 
> introduced later in the planning phase. Follow similar approach to sort them, 
> trying to accelerate filter evaluation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing

2019-06-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=267802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267802
 ]

ASF GitHub Bot logged work on HIVE-21867:
-

Author: ASF GitHub Bot
Created on: 26/Jun/19 17:29
Start Date: 26/Jun/19 17:29
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #687: HIVE-21867
URL: https://github.com/apache/hive/pull/687#discussion_r297783212
 
 

 ##
 File path: ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out
 ##
 @@ -1421,7 +1421,7 @@ STAGE PLANS:
   outputColumnNames: _col1
   input vertices:
 1 Map 5
-  Statistics: Num rows: 25 Data size: 2225 Basic 
stats: COMPLETE Column stats: COMPLETE
+  Statistics: Num rows: 4 Data size: 356 Basic stats: 
COMPLETE Column stats: COMPLETE
 
 Review comment:
   Good catch, thanks! This is unexpected indeed... Taking a look now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267802)
Time Spent: 50m  (was: 40m)

> Sort semijoin conditions to accelerate query processing
> ---
>
> Key: HIVE-21867
> URL: https://issues.apache.org/jira/browse/HIVE-21867
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, 
> HIVE-21867.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The problem was tackled for CBO in HIVE-21857. Semijoin filters are 
> introduced later in the planning phase. Follow similar approach to sort them, 
> trying to accelerate filter evaluation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing

2019-06-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=267803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267803
 ]

ASF GitHub Bot logged work on HIVE-21867:
-

Author: ASF GitHub Bot
Created on: 26/Jun/19 17:30
Start Date: 26/Jun/19 17:30
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #687: HIVE-21867
URL: https://github.com/apache/hive/pull/687#discussion_r297783357
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
 ##
 @@ -1766,6 +1774,59 @@ private void 
removeSemijoinOptimizationByBenefit(OptimizeTezProcContext procCtx)
   GenTezUtils.removeBranch(rs);
   GenTezUtils.removeSemiJoinOperator(procCtx.parseContext, rs, ts);
 }
+
+for (Entry> e : 
globalReductionFactorMap.asMap().entrySet()) {
 
 Review comment:
   Good idea, will do.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267803)
Time Spent: 1h  (was: 50m)

> Sort semijoin conditions to accelerate query processing
> ---
>
> Key: HIVE-21867
> URL: https://issues.apache.org/jira/browse/HIVE-21867
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, 
> HIVE-21867.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The problem was tackled for CBO in HIVE-21857. Semijoin filters are 
> introduced later in the planning phase. Follow similar approach to sort them, 
> trying to accelerate filter evaluation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=267051=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267051
 ]

ASF GitHub Bot logged work on HIVE-21867:
-

Author: ASF GitHub Bot
Created on: 25/Jun/19 21:28
Start Date: 25/Jun/19 21:28
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on pull request #687: HIVE-21867
URL: https://github.com/apache/hive/pull/687#discussion_r297401637
 
 

 ##
 File path: ql/src/test/results/clientpositive/llap/mergejoin.q.out
 ##
 @@ -41,8 +41,8 @@ STAGE PLANS:
 Filter Vectorization:
 className: VectorFilterOperator
 native: true
-predicateExpression: FilterExprAndExpr(children: 
SelectColumnIsNotNull(col 0:string), FilterExprAndExpr(children: 
FilterStringColumnBetweenDynamicValue(col 0:string, left NULL, right NULL), 
VectorInBloomFilterColDynamicValue))
-predicate: (key is not null and (key BETWEEN 
DynamicValue(RS_7_b_key_min) AND DynamicValue(RS_7_b_key_max) and 
in_bloom_filter(key, DynamicValue(RS_7_b_key_bloom_filter (type: boolean)
 
 Review comment:
   Positive change :thumbsup:
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267051)
Time Spent: 0.5h  (was: 20m)

> Sort semijoin conditions to accelerate query processing
> ---
>
> Key: HIVE-21867
> URL: https://issues.apache.org/jira/browse/HIVE-21867
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, 
> HIVE-21867.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The problem was tackled for CBO in HIVE-21857. Semijoin filters are 
> introduced later in the planning phase. Follow similar approach to sort them, 
> trying to accelerate filter evaluation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=267050=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267050
 ]

ASF GitHub Bot logged work on HIVE-21867:
-

Author: ASF GitHub Bot
Created on: 25/Jun/19 21:28
Start Date: 25/Jun/19 21:28
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on pull request #687: HIVE-21867
URL: https://github.com/apache/hive/pull/687#discussion_r297397916
 
 

 ##
 File path: ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out
 ##
 @@ -1421,7 +1421,7 @@ STAGE PLANS:
   outputColumnNames: _col1
   input vertices:
 1 Map 5
-  Statistics: Num rows: 25 Data size: 2225 Basic 
stats: COMPLETE Column stats: COMPLETE
+  Statistics: Num rows: 4 Data size: 356 Basic stats: 
COMPLETE Column stats: COMPLETE
 
 Review comment:
   I wonder why stats estimation changed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267050)
Time Spent: 20m  (was: 10m)

> Sort semijoin conditions to accelerate query processing
> ---
>
> Key: HIVE-21867
> URL: https://issues.apache.org/jira/browse/HIVE-21867
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, 
> HIVE-21867.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The problem was tackled for CBO in HIVE-21857. Semijoin filters are 
> introduced later in the planning phase. Follow similar approach to sort them, 
> trying to accelerate filter evaluation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=267052=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267052
 ]

ASF GitHub Bot logged work on HIVE-21867:
-

Author: ASF GitHub Bot
Created on: 25/Jun/19 21:28
Start Date: 25/Jun/19 21:28
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on pull request #687: HIVE-21867
URL: https://github.com/apache/hive/pull/687#discussion_r297401189
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
 ##
 @@ -1766,6 +1774,59 @@ private void 
removeSemijoinOptimizationByBenefit(OptimizeTezProcContext procCtx)
   GenTezUtils.removeBranch(rs);
   GenTezUtils.removeSemiJoinOperator(procCtx.parseContext, rs, ts);
 }
+
+for (Entry> e : 
globalReductionFactorMap.asMap().entrySet()) {
 
 Review comment:
   Creating separate method for this and adding comments to explain why we are 
doing it will be nice.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 267052)
Time Spent: 40m  (was: 0.5h)

> Sort semijoin conditions to accelerate query processing
> ---
>
> Key: HIVE-21867
> URL: https://issues.apache.org/jira/browse/HIVE-21867
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, 
> HIVE-21867.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The problem was tackled for CBO in HIVE-21857. Semijoin filters are 
> introduced later in the planning phase. Follow similar approach to sort them, 
> trying to accelerate filter evaluation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing

2019-06-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=266784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266784
 ]

ASF GitHub Bot logged work on HIVE-21867:
-

Author: ASF GitHub Bot
Created on: 25/Jun/19 15:22
Start Date: 25/Jun/19 15:22
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #687: HIVE-21867
URL: https://github.com/apache/hive/pull/687
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 266784)
Time Spent: 10m
Remaining Estimate: 0h

> Sort semijoin conditions to accelerate query processing
> ---
>
> Key: HIVE-21867
> URL: https://issues.apache.org/jira/browse/HIVE-21867
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21867.02.patch, HIVE-21867.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The problem was tackled for CBO in HIVE-21857. Semijoin filters are 
> introduced later in the planning phase. Follow similar approach to sort them, 
> trying to accelerate filter evaluation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)