Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15250 )
Change subject: IMPALA-6689: Speed up point lookup for Kudu primary key ...................................................................... Patch Set 2: It'd be worth looking at the HDFS PK stuff. I think we might already get it approximately right for single-column PKs. The input cardinality for HDFS is still all the rows in the table in the worst case (since the table is not clustered by primary key). That might be an overestimate if you have min/max stats or column indices, but yeah. The output cardinality may also be about right with stats for single columns, since for equality predicates we should be estimating the cardinality as #rows / NDV, which should be ~1. For multi-column pks the cardinality estimates will be a lot less accurate... Of course, my assumptions here might be wrong and this might not work for subtle reasons. I'll add Anurag so he has visibility into the kind of optimisations we might do with PK info. -- To view, visit http://gerrit.cloudera.org:8080/15250 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4631cd4d1a528a1152b5cdcb268426f2ba1a0c08 Gerrit-Change-Number: 15250 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 20 Feb 2020 15:51:36 +0000 Gerrit-HasComments: No
