Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/18433 )
Change subject: IMPALA-11141: Use exact data types in IN-list filter ...................................................................... Patch Set 2: (17 comments) Looks very good! http://gerrit.cloudera.org:8080/#/c/18433/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18433/2//COMMIT_MSG@12 PS2, Line 12: class with different implementations for different data types. nit. native implementations http://gerrit.cloudera.org:8080/#/c/18433/2//COMMIT_MSG@26 PS2, Line 26: in nit at http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/exec/filter-context.cc File be/src/exec/filter-context.cc: http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/exec/filter-context.cc@451 PS2, Line 451: break; DCHECK(false) http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter-ir.cc File be/src/util/in-list-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter-ir.cc@32 PS2, Line 32: always_true_ UNLIKELY http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter-ir.cc@41 PS2, Line 41: always_true_ = true; \ probably can be moved to a reset() method. http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter-ir.cc@85 PS2, Line 85: always_true_ = true; \ : contains_null_ = false; \ : values_.clear(); \ : new_values_.clear(); \ : total_entries_ = 0; should be part of a reset() method. http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter-test.cc File be/src/util/in-list-filter-test.cc: http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter-test.cc@173 PS2, Line 173: char str_buffer[] = "aabbccddeeff"; : const char* ptr = str_buffer; : // Insert 3 values first : for (int i = 0; i < 3; ++i) { : filter->Insert(ptr); : ptr += 2; : } Some code duplication between CHAR test case and VARCHAR/STRING test case. Maybe we can remove it by passing ss1 and ss2 to TestStringInListFilter(). In this way, we only need the code to build ss1 and ss2 for CHAR and VARCHAR/STRING first. http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.h File be/src/util/in-list-filter.h: http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.h@70 PS2, Line 70: not including nit: excluding http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.h@113 PS2, Line 113: return !always_true_ && !contains_null_ && values_.empty(); I wonder if this can be moved to the base class with the modification: bool AlwaysFalse() override { return !always_true_ && !contains_null_ && total_entries == 0; } http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.h@145 PS2, Line 145: bool AlwaysFalse() override { : return !always_true_ && !contains_null_ && total_entries_ == 0; : } See the above comment about moving the method to base class. http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.h@163 PS2, Line 163: inserted nit. transferred http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.h@165 PS2, Line 165: new_values_ nit. maybe newly_inserted_values_ to make it clear. http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.h@165 PS2, Line 165: boost::unordered_set<StringValue> new_values_; : size_t new_values_total_len_ = 0; These two data members can be combined into a new struct struct SetOfStringsWithTotalLenT { boost::unordered_set<StringValue> new_values_; size_t total_len_; } And then define two such objects SetOfStringsWithTotalLenT values_; SetOfStringsWithTotalLenT newly_inserted_values_; http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.cc File be/src/util/in-list-filter.cc: http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.cc@41 PS2, Line 41: InListFilterImpl cool! http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.cc@65 PS2, Line 65: default: nit. At some point of time in future, we may need to support float and double, like in min/max filter (see util/min-max-filter.h). http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.cc@143 PS2, Line 143: } nit. May add a comment here: total_entries_ is updated during insert(). http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.cc@144 PS2, Line 144: new_values_.clear(); : new_values_total_len_ = 0; this could be part of a method (say reset()) on the struct SetOfStringsWIthLengthT -- To view, visit http://gerrit.cloudera.org:8080/18433 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id434a542b2ced64efa3bfc974cb565b94a4193e9 Gerrit-Change-Number: 18433 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Fri, 22 Apr 2022 15:15:17 +0000 Gerrit-HasComments: Yes
