[Impala-ASF-CR] IMPALA-6689: Speed up point lookup for Kudu primary key

Tim Armstrong (Code Review) Mon, 24 Feb 2020 17:34:00 -0800

Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15250 )


Change subject: IMPALA-6689: Speed up point lookup for Kudu primary key
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15250/6/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

http://gerrit.cloudera.org:8080/#/c/15250/6/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@397
PS6, Line 397:         if ((predicate instanceof BinaryPredicate) &&
> This new code is called only when function  tryConvertBinaryKuduPredicate()
I should have seen that - I rushed the code review on Friday afternoon. Thanks 
for explaining and your patience.


http://gerrit.cloudera.org:8080/#/c/15250/7/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

http://gerrit.cloudera.org:8080/#/c/15250/7/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@492
PS7, Line 492:       default: break;
I think the approach is somewhat brittle and could break if a couple of 
assumptions changed. I think we need to guard against that. I have one example 
that would break if another (reasonable) change was made to this code.

The below query has duplicate predicates assigned to the Kudu scan. It exploits 
that the planner only deduplicates = and "is not distinct from" predicates if 
expression rewrites are enabled. I think this would cause a problem if 
getKuduOperator() added support for "is not distinct from".

  [localhost:21000] default> set enable_expr_rewrites=0; set explain_level=2; 
explain select count(*) from tpch_kudu.lineitem where l_orderkey = 1 and 
l_orderkey is not distinct from 1 and l_partkey = 1 and l_suppkey = 1;
  ENABLE_EXPR_REWRITES set to 0
  EXPLAIN_LEVEL set to 2
  Query: explain select count(*) from tpch_kudu.lineitem where l_orderkey = 1 
and l_orderkey is not distinct from 1 and l_partkey = 1 and l_suppkey = 1
  
+------------------------------------------------------------------------------------------------------------------+
  | Explain String                                                              
                                     |
  
+------------------------------------------------------------------------------------------------------------------+
  | Max Per-Host Resource Reservation: Memory=0B Threads=3                      
                                     |
  | Per-Host Resource Estimates: Memory=21MB                                    
                                     |
  | Analyzed query: SELECT count(*) FROM tpch_kudu.lineitem WHERE l_orderkey =  
                                     |
  | CAST(1 AS BIGINT) AND l_orderkey IS NOT DISTINCT FROM CAST(1 AS BIGINT) AND 
                                     |
  | l_partkey = CAST(1 AS BIGINT) AND l_suppkey = CAST(1 AS BIGINT)             
                                     |
  |                                                                             
                                     |
  | F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                       
                                     |
  | |  Per-Host Resources: mem-estimate=10.02MB mem-reservation=0B 
thread-reservation=1                              |
  | PLAN-ROOT SINK                                                              
                                     |
  | |  output exprs: count(*)                                                   
                                     |
  | |  mem-estimate=0B mem-reservation=0B thread-reservation=0                  
                                     |
  | |                                                                           
                                     |
  | 03:AGGREGATE [FINALIZE]                                                     
                                     |
  | |  output: count:merge(*)                                                   
                                     |
  | |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB 
thread-reservation=0                              |
  | |  tuple-ids=1 row-size=8B cardinality=1                                    
                                     |
  | |  in pipelines: 03(GETNEXT), 01(OPEN)                                      
                                     |
  | |                                                                           
                                     |
  | 02:EXCHANGE [UNPARTITIONED]                                                 
                                     |
  | |  mem-estimate=16.00KB mem-reservation=0B thread-reservation=0             
                                     |
  | |  tuple-ids=1 row-size=8B cardinality=1                                    
                                     |
  | |  in pipelines: 01(GETNEXT)                                                
                                     |
  | |                                                                           
                                     |
  | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                              
                                     |
  | Per-Host Resources: mem-estimate=11.12MB mem-reservation=0B 
thread-reservation=2                                 |
  | 01:AGGREGATE                                                                
                                     |
  | |  output: count(*)                                                         
                                     |
  | |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB 
thread-reservation=0                              |
  | |  tuple-ids=1 row-size=8B cardinality=1                                    
                                     |
  | |  in pipelines: 01(GETNEXT), 00(OPEN)                                      
                                     |
  | |                                                                           
                                     |
  | 00:SCAN KUDU [tpch_kudu.lineitem]                                           
                                     |
  |    predicates: l_orderkey IS NOT DISTINCT FROM CAST(1 AS BIGINT)            
                                     |
  |    kudu predicates: l_orderkey = CAST(1 AS BIGINT), l_partkey = CAST(1 AS 
BIGINT), l_suppkey = CAST(1 AS BIGINT) |
  |    mem-estimate=1.12MB mem-reservation=0B thread-reservation=1              
                                     |
  |    tuple-ids=0 row-size=8B cardinality=1                                    
                                     |
  |    in pipelines: 00(GETNEXT)                                                
                                     |
  
+------------------------------------------------------------------------------------------------------------------+
  Fetched 37 row(s) in 0.01s


I think I'd prefer if we just kept track of the set of primary keys that are in 
equality predicates instead of the count. It would be easier for me to convince 
myself that it's correct (and not going to be broken by future changes).



--
To view, visit http://gerrit.cloudera.org:8080/15250
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4631cd4d1a528a1152b5cdcb268426f2ba1a0c08
Gerrit-Change-Number: 15250
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Anurag Mantripragada <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Wenzhe Zhou <[email protected]>
Gerrit-Comment-Date: Tue, 25 Feb 2020 01:33:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6689: Speed up point lookup for Kudu primary key

Reply via email to