[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing
[ https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=269511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269511 ] ASF GitHub Bot logged work on HIVE-21867: - Author: ASF GitHub Bot Created on: 28/Jun/19 20:27 Start Date: 28/Jun/19 20:27 Worklog Time Spent: 10m Work Description: asfgit commented on pull request #687: HIVE-21867 URL: https://github.com/apache/hive/pull/687 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 269511) Time Spent: 1h 40m (was: 1.5h) > Sort semijoin conditions to accelerate query processing > --- > > Key: HIVE-21867 > URL: https://issues.apache.org/jira/browse/HIVE-21867 > Project: Hive > Issue Type: New Feature > Components: Physical Optimizer >Affects Versions: 4.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, > HIVE-21867.04.patch, HIVE-21867.05.patch, HIVE-21867.05.patch, > HIVE-21867.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The problem was tackled for CBO in HIVE-21857. Semijoin filters are > introduced later in the planning phase. Follow similar approach to sort them, > trying to accelerate filter evaluation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing
[ https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=268842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-268842 ] ASF GitHub Bot logged work on HIVE-21867: - Author: ASF GitHub Bot Created on: 27/Jun/19 21:19 Start Date: 27/Jun/19 21:19 Worklog Time Spent: 10m Work Description: jcamachor commented on issue #687: HIVE-21867 URL: https://github.com/apache/hive/pull/687#issuecomment-506514522 @vineetgarg02 , I updated the PR. Note that ```hybridgrace_hashjoin_2.q``` issue will be tackled in https://issues.apache.org/jira/browse/HIVE-21928 so I think we can proceed with this issue. In the latest patch I also added some simple logic to ```SharedWorkOptimizer``` to remove some duplicate filter expressions remaining in the plan because of change in the shape of the expression (problem was existing, patch just exposed it). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 268842) Time Spent: 1.5h (was: 1h 20m) > Sort semijoin conditions to accelerate query processing > --- > > Key: HIVE-21867 > URL: https://issues.apache.org/jira/browse/HIVE-21867 > Project: Hive > Issue Type: New Feature > Components: Physical Optimizer >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, > HIVE-21867.04.patch, HIVE-21867.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > The problem was tackled for CBO in HIVE-21857. Semijoin filters are > introduced later in the planning phase. Follow similar approach to sort them, > trying to accelerate filter evaluation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing
[ https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=268841=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-268841 ] ASF GitHub Bot logged work on HIVE-21867: - Author: ASF GitHub Bot Created on: 27/Jun/19 21:19 Start Date: 27/Jun/19 21:19 Worklog Time Spent: 10m Work Description: jcamachor commented on issue #687: HIVE-21867 URL: https://github.com/apache/hive/pull/687#issuecomment-506514522 @vineetgarg02 , I updated the PR. Note that ```hybridgrace_hashjoin_2.q``` issue will be tackled in HIVE-21928 so I think we can proceed with this issue. In the latest patch I also added some simple logic to ```SharedWorkOptimizer``` to remove some duplicate filter expressions remaining in the plan because of change in the shape of the expression (problem was existing, patch just exposed it). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 268841) Time Spent: 1h 20m (was: 1h 10m) > Sort semijoin conditions to accelerate query processing > --- > > Key: HIVE-21867 > URL: https://issues.apache.org/jira/browse/HIVE-21867 > Project: Hive > Issue Type: New Feature > Components: Physical Optimizer >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, > HIVE-21867.04.patch, HIVE-21867.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The problem was tackled for CBO in HIVE-21857. Semijoin filters are > introduced later in the planning phase. Follow similar approach to sort them, > trying to accelerate filter evaluation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing
[ https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=268837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-268837 ] ASF GitHub Bot logged work on HIVE-21867: - Author: ASF GitHub Bot Created on: 27/Jun/19 21:15 Start Date: 27/Jun/19 21:15 Worklog Time Spent: 10m Work Description: jcamachor commented on issue #687: HIVE-21867 URL: https://github.com/apache/hive/pull/687#issuecomment-506514522 @vineetgarg02 , I updated the PR. Note that ```hybridgrace_hashjoin_2.q``` issue will be tackled in HIVE-20260 so I think we can proceed with this issue. In the latest patch I also added some simple logic to ```SharedWorkOptimizer``` to remove some duplicate filter expressions remaining in the plan because of change in the shape of the expression (problem was existing, patch just exposed it). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 268837) Time Spent: 1h 10m (was: 1h) > Sort semijoin conditions to accelerate query processing > --- > > Key: HIVE-21867 > URL: https://issues.apache.org/jira/browse/HIVE-21867 > Project: Hive > Issue Type: New Feature > Components: Physical Optimizer >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, > HIVE-21867.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > The problem was tackled for CBO in HIVE-21857. Semijoin filters are > introduced later in the planning phase. Follow similar approach to sort them, > trying to accelerate filter evaluation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing
[ https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=267802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267802 ] ASF GitHub Bot logged work on HIVE-21867: - Author: ASF GitHub Bot Created on: 26/Jun/19 17:29 Start Date: 26/Jun/19 17:29 Worklog Time Spent: 10m Work Description: jcamachor commented on pull request #687: HIVE-21867 URL: https://github.com/apache/hive/pull/687#discussion_r297783212 ## File path: ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out ## @@ -1421,7 +1421,7 @@ STAGE PLANS: outputColumnNames: _col1 input vertices: 1 Map 5 - Statistics: Num rows: 25 Data size: 2225 Basic stats: COMPLETE Column stats: COMPLETE + Statistics: Num rows: 4 Data size: 356 Basic stats: COMPLETE Column stats: COMPLETE Review comment: Good catch, thanks! This is unexpected indeed... Taking a look now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 267802) Time Spent: 50m (was: 40m) > Sort semijoin conditions to accelerate query processing > --- > > Key: HIVE-21867 > URL: https://issues.apache.org/jira/browse/HIVE-21867 > Project: Hive > Issue Type: New Feature > Components: Physical Optimizer >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, > HIVE-21867.patch > > Time Spent: 50m > Remaining Estimate: 0h > > The problem was tackled for CBO in HIVE-21857. Semijoin filters are > introduced later in the planning phase. Follow similar approach to sort them, > trying to accelerate filter evaluation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing
[ https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=267803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267803 ] ASF GitHub Bot logged work on HIVE-21867: - Author: ASF GitHub Bot Created on: 26/Jun/19 17:30 Start Date: 26/Jun/19 17:30 Worklog Time Spent: 10m Work Description: jcamachor commented on pull request #687: HIVE-21867 URL: https://github.com/apache/hive/pull/687#discussion_r297783357 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ## @@ -1766,6 +1774,59 @@ private void removeSemijoinOptimizationByBenefit(OptimizeTezProcContext procCtx) GenTezUtils.removeBranch(rs); GenTezUtils.removeSemiJoinOperator(procCtx.parseContext, rs, ts); } + +for (Entry> e : globalReductionFactorMap.asMap().entrySet()) { Review comment: Good idea, will do. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 267803) Time Spent: 1h (was: 50m) > Sort semijoin conditions to accelerate query processing > --- > > Key: HIVE-21867 > URL: https://issues.apache.org/jira/browse/HIVE-21867 > Project: Hive > Issue Type: New Feature > Components: Physical Optimizer >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, > HIVE-21867.patch > > Time Spent: 1h > Remaining Estimate: 0h > > The problem was tackled for CBO in HIVE-21857. Semijoin filters are > introduced later in the planning phase. Follow similar approach to sort them, > trying to accelerate filter evaluation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing
[ https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=267051=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267051 ] ASF GitHub Bot logged work on HIVE-21867: - Author: ASF GitHub Bot Created on: 25/Jun/19 21:28 Start Date: 25/Jun/19 21:28 Worklog Time Spent: 10m Work Description: vineetgarg02 commented on pull request #687: HIVE-21867 URL: https://github.com/apache/hive/pull/687#discussion_r297401637 ## File path: ql/src/test/results/clientpositive/llap/mergejoin.q.out ## @@ -41,8 +41,8 @@ STAGE PLANS: Filter Vectorization: className: VectorFilterOperator native: true -predicateExpression: FilterExprAndExpr(children: SelectColumnIsNotNull(col 0:string), FilterExprAndExpr(children: FilterStringColumnBetweenDynamicValue(col 0:string, left NULL, right NULL), VectorInBloomFilterColDynamicValue)) -predicate: (key is not null and (key BETWEEN DynamicValue(RS_7_b_key_min) AND DynamicValue(RS_7_b_key_max) and in_bloom_filter(key, DynamicValue(RS_7_b_key_bloom_filter (type: boolean) Review comment: Positive change :thumbsup: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 267051) Time Spent: 0.5h (was: 20m) > Sort semijoin conditions to accelerate query processing > --- > > Key: HIVE-21867 > URL: https://issues.apache.org/jira/browse/HIVE-21867 > Project: Hive > Issue Type: New Feature > Components: Physical Optimizer >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, > HIVE-21867.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > The problem was tackled for CBO in HIVE-21857. Semijoin filters are > introduced later in the planning phase. Follow similar approach to sort them, > trying to accelerate filter evaluation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing
[ https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=267050=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267050 ] ASF GitHub Bot logged work on HIVE-21867: - Author: ASF GitHub Bot Created on: 25/Jun/19 21:28 Start Date: 25/Jun/19 21:28 Worklog Time Spent: 10m Work Description: vineetgarg02 commented on pull request #687: HIVE-21867 URL: https://github.com/apache/hive/pull/687#discussion_r297397916 ## File path: ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out ## @@ -1421,7 +1421,7 @@ STAGE PLANS: outputColumnNames: _col1 input vertices: 1 Map 5 - Statistics: Num rows: 25 Data size: 2225 Basic stats: COMPLETE Column stats: COMPLETE + Statistics: Num rows: 4 Data size: 356 Basic stats: COMPLETE Column stats: COMPLETE Review comment: I wonder why stats estimation changed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 267050) Time Spent: 20m (was: 10m) > Sort semijoin conditions to accelerate query processing > --- > > Key: HIVE-21867 > URL: https://issues.apache.org/jira/browse/HIVE-21867 > Project: Hive > Issue Type: New Feature > Components: Physical Optimizer >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, > HIVE-21867.patch > > Time Spent: 20m > Remaining Estimate: 0h > > The problem was tackled for CBO in HIVE-21857. Semijoin filters are > introduced later in the planning phase. Follow similar approach to sort them, > trying to accelerate filter evaluation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing
[ https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=267052=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-267052 ] ASF GitHub Bot logged work on HIVE-21867: - Author: ASF GitHub Bot Created on: 25/Jun/19 21:28 Start Date: 25/Jun/19 21:28 Worklog Time Spent: 10m Work Description: vineetgarg02 commented on pull request #687: HIVE-21867 URL: https://github.com/apache/hive/pull/687#discussion_r297401189 ## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ## @@ -1766,6 +1774,59 @@ private void removeSemijoinOptimizationByBenefit(OptimizeTezProcContext procCtx) GenTezUtils.removeBranch(rs); GenTezUtils.removeSemiJoinOperator(procCtx.parseContext, rs, ts); } + +for (Entry> e : globalReductionFactorMap.asMap().entrySet()) { Review comment: Creating separate method for this and adding comments to explain why we are doing it will be nice. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 267052) Time Spent: 40m (was: 0.5h) > Sort semijoin conditions to accelerate query processing > --- > > Key: HIVE-21867 > URL: https://issues.apache.org/jira/browse/HIVE-21867 > Project: Hive > Issue Type: New Feature > Components: Physical Optimizer >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21867.02.patch, HIVE-21867.03.patch, > HIVE-21867.patch > > Time Spent: 40m > Remaining Estimate: 0h > > The problem was tackled for CBO in HIVE-21857. Semijoin filters are > introduced later in the planning phase. Follow similar approach to sort them, > trying to accelerate filter evaluation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (HIVE-21867) Sort semijoin conditions to accelerate query processing
[ https://issues.apache.org/jira/browse/HIVE-21867?focusedWorklogId=266784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-266784 ] ASF GitHub Bot logged work on HIVE-21867: - Author: ASF GitHub Bot Created on: 25/Jun/19 15:22 Start Date: 25/Jun/19 15:22 Worklog Time Spent: 10m Work Description: jcamachor commented on pull request #687: HIVE-21867 URL: https://github.com/apache/hive/pull/687 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 266784) Time Spent: 10m Remaining Estimate: 0h > Sort semijoin conditions to accelerate query processing > --- > > Key: HIVE-21867 > URL: https://issues.apache.org/jira/browse/HIVE-21867 > Project: Hive > Issue Type: New Feature > Components: Physical Optimizer >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21867.02.patch, HIVE-21867.patch > > Time Spent: 10m > Remaining Estimate: 0h > > The problem was tackled for CBO in HIVE-21857. Semijoin filters are > introduced later in the planning phase. Follow similar approach to sort them, > trying to accelerate filter evaluation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)