[Impala-ASF-CR] IMPALA-6689: Speed up point lookup for Kudu primary key

Tim Armstrong (Code Review) Thu, 20 Feb 2020 07:52:11 -0800

Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15250 )


Change subject: IMPALA-6689: Speed up point lookup for Kudu primary key
......................................................................


Patch Set 2:

It'd be worth looking at the HDFS PK stuff.

I think we might already get it approximately right for single-column PKs. The 
input cardinality for HDFS is still all the rows in the table in the worst case 
(since the table is not clustered by primary key). That might be an 
overestimate if you have min/max stats or column indices, but yeah.

The output cardinality may also be about right with stats for single columns, 
since for equality predicates we should be estimating the cardinality as #rows 
/ NDV, which should be ~1.

For multi-column pks the cardinality estimates will be a lot less accurate...

Of course, my assumptions here might be wrong and this might not work for 
subtle reasons. I'll add Anurag so he has visibility into the kind of 
optimisations we might do with PK info.


--
To view, visit http://gerrit.cloudera.org:8080/15250
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4631cd4d1a528a1152b5cdcb268426f2ba1a0c08
Gerrit-Change-Number: 15250
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 20 Feb 2020 15:51:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6689: Speed up point lookup for Kudu primary key

Reply via email to