Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9570 )
Change subject: IMPALA-6621: Improve set lookup performance for in-predicate evaluation ...................................................................... Patch Set 1: (1 comment) Out of curiousity I tried out this alternative hash map: https://github.com/greg7mdp/sparsepp It was actually slower for decimal (700ms vs 500ms). I also concluded that google's dense_hash_set was somewhat tricky to make work, since it requires providing a sentinel value to represent empty entries. http://gerrit.cloudera.org:8080/#/c/9570/1/be/src/exprs/in-predicate.h File be/src/exprs/in-predicate.h: http://gerrit.cloudera.org:8080/#/c/9570/1/be/src/exprs/in-predicate.h@359 PS1, Line 359: state->val_set.insert(GetVal<T, SetType>(state->type, *arg)); We should change this function to use the bulk insert API to avoid N^2 behaviour with flat_set: http://www.boost.org/doc/libs/1_56_0/doc/html/boost/container/flat_set.html#idp30015536-bb -- To view, visit http://gerrit.cloudera.org:8080/9570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd1627d779d10a16468cc3c2d0bc26a497e048df Gerrit-Change-Number: 9570 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Mon, 12 Mar 2018 20:40:02 +0000 Gerrit-HasComments: Yes
