Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )
Change subject: KUDU-1644 hash-partition based in-list predicate optimization ...................................................................... Patch Set 19: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/16674/18//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16674/18//COMMIT_MSG@16 PS18, Line 16: Before: : To each tablet, time complexity to complete hash-key based in-list query is: : LOG(V) * N : : After: : Complexity becomes: : LOG(V/P) * N > https://docs.google.com/document/d/1WO4TT2ZqGsvlgogyKOsChpinEeupZCkxn9OI5xu Thanks for the experiments! Interpreting the results a bit, for larger lists, the improvement seems less pronounced, as more of the time is spent in copying the larger results set compared to evaluating the predicate. I would expect that for lists that are on the same order of magnitude as the number of hash buckets (in this case, 5-10), that the improvement may be even greater, as we would short circuit some tablets completely. -- To view, visit http://gerrit.cloudera.org:8080/16674 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2 Gerrit-Change-Number: 16674 Gerrit-PatchSet: 19 Gerrit-Owner: wangning <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: wangning <[email protected]> Gerrit-Comment-Date: Thu, 12 Nov 2020 06:06:03 +0000 Gerrit-HasComments: Yes
