Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15250 )
Change subject: IMPALA-6689: Speed up point lookup for Kudu primary key ...................................................................... Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/15250/6/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: http://gerrit.cloudera.org:8080/#/c/15250/6/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@397 PS6, Line 397: if ((predicate instanceof BinaryPredicate) && > This new code is called only when function tryConvertBinaryKuduPredicate() I should have seen that - I rushed the code review on Friday afternoon. Thanks for explaining and your patience. http://gerrit.cloudera.org:8080/#/c/15250/7/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: http://gerrit.cloudera.org:8080/#/c/15250/7/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@492 PS7, Line 492: default: break; I think the approach is somewhat brittle and could break if a couple of assumptions changed. I think we need to guard against that. I have one example that would break if another (reasonable) change was made to this code. The below query has duplicate predicates assigned to the Kudu scan. It exploits that the planner only deduplicates = and "is not distinct from" predicates if expression rewrites are enabled. I think this would cause a problem if getKuduOperator() added support for "is not distinct from". [localhost:21000] default> set enable_expr_rewrites=0; set explain_level=2; explain select count(*) from tpch_kudu.lineitem where l_orderkey = 1 and l_orderkey is not distinct from 1 and l_partkey = 1 and l_suppkey = 1; ENABLE_EXPR_REWRITES set to 0 EXPLAIN_LEVEL set to 2 Query: explain select count(*) from tpch_kudu.lineitem where l_orderkey = 1 and l_orderkey is not distinct from 1 and l_partkey = 1 and l_suppkey = 1 +------------------------------------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=0B Threads=3 | | Per-Host Resource Estimates: Memory=21MB | | Analyzed query: SELECT count(*) FROM tpch_kudu.lineitem WHERE l_orderkey = | | CAST(1 AS BIGINT) AND l_orderkey IS NOT DISTINCT FROM CAST(1 AS BIGINT) AND | | l_partkey = CAST(1 AS BIGINT) AND l_suppkey = CAST(1 AS BIGINT) | | | | F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | | | Per-Host Resources: mem-estimate=10.02MB mem-reservation=0B thread-reservation=1 | | PLAN-ROOT SINK | | | output exprs: count(*) | | | mem-estimate=0B mem-reservation=0B thread-reservation=0 | | | | | 03:AGGREGATE [FINALIZE] | | | output: count:merge(*) | | | mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0 | | | tuple-ids=1 row-size=8B cardinality=1 | | | in pipelines: 03(GETNEXT), 01(OPEN) | | | | | 02:EXCHANGE [UNPARTITIONED] | | | mem-estimate=16.00KB mem-reservation=0B thread-reservation=0 | | | tuple-ids=1 row-size=8B cardinality=1 | | | in pipelines: 01(GETNEXT) | | | | | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 | | Per-Host Resources: mem-estimate=11.12MB mem-reservation=0B thread-reservation=2 | | 01:AGGREGATE | | | output: count(*) | | | mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0 | | | tuple-ids=1 row-size=8B cardinality=1 | | | in pipelines: 01(GETNEXT), 00(OPEN) | | | | | 00:SCAN KUDU [tpch_kudu.lineitem] | | predicates: l_orderkey IS NOT DISTINCT FROM CAST(1 AS BIGINT) | | kudu predicates: l_orderkey = CAST(1 AS BIGINT), l_partkey = CAST(1 AS BIGINT), l_suppkey = CAST(1 AS BIGINT) | | mem-estimate=1.12MB mem-reservation=0B thread-reservation=1 | | tuple-ids=0 row-size=8B cardinality=1 | | in pipelines: 00(GETNEXT) | +------------------------------------------------------------------------------------------------------------------+ Fetched 37 row(s) in 0.01s I think I'd prefer if we just kept track of the set of primary keys that are in equality predicates instead of the count. It would be easier for me to convince myself that it's correct (and not going to be broken by future changes). -- To view, visit http://gerrit.cloudera.org:8080/15250 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4631cd4d1a528a1152b5cdcb268426f2ba1a0c08 Gerrit-Change-Number: 15250 Gerrit-PatchSet: 6 Gerrit-Owner: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Anurag Mantripragada <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Tue, 25 Feb 2020 01:33:31 +0000 Gerrit-HasComments: Yes
