Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18433 )

Change subject: IMPALA-11141: Use exact data types in IN-list filter
......................................................................


Patch Set 2:

(17 comments)

Looks very good!

http://gerrit.cloudera.org:8080/#/c/18433/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18433/2//COMMIT_MSG@12
PS2, Line 12: class with different implementations for different data types.
nit. native implementations


http://gerrit.cloudera.org:8080/#/c/18433/2//COMMIT_MSG@26
PS2, Line 26: in
nit at


http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/exec/filter-context.cc@451
PS2, Line 451: break;
DCHECK(false)


http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter-ir.cc
File be/src/util/in-list-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter-ir.cc@32
PS2, Line 32: always_true_
UNLIKELY


http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter-ir.cc@41
PS2, Line 41:         always_true_ = true;                                      
                    \
probably can be moved to a reset() method.


http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter-ir.cc@85
PS2, Line 85:          always_true_ = true;                                     
          \
            :           contains_null_ = false;                                 
           \
            :           values_.clear();                                        
           \
            :           new_values_.clear();                                    
           \
            :           total_entries_ = 0;
should be part of a reset() method.


http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter-test.cc
File be/src/util/in-list-filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter-test.cc@173
PS2, Line 173: char str_buffer[] = "aabbccddeeff";
             :   const char* ptr = str_buffer;
             :   // Insert 3 values first
             :   for (int i = 0; i < 3; ++i) {
             :     filter->Insert(ptr);
             :     ptr += 2;
             :   }
Some code duplication between CHAR test case and VARCHAR/STRING test case.

Maybe we can remove it by passing ss1 and ss2 to TestStringInListFilter(). In 
this way, we only need the code to build ss1 and ss2 for CHAR and 
VARCHAR/STRING first.


http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.h
File be/src/util/in-list-filter.h:

http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.h@70
PS2, Line 70: not including
nit: excluding


http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.h@113
PS2, Line 113: return !always_true_ && !contains_null_ && values_.empty();
I wonder if this can be moved to the base class with the modification:

 bool AlwaysFalse() override {
    return !always_true_ && !contains_null_ && total_entries == 0;
}


http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.h@145
PS2, Line 145:   bool AlwaysFalse() override {
             :     return !always_true_ && !contains_null_ && total_entries_ == 
0;
             :   }
See the above comment about moving the method to base class.


http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.h@163
PS2, Line 163: inserted
nit. transferred


http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.h@165
PS2, Line 165: new_values_
nit. maybe newly_inserted_values_ to make it clear.


http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.h@165
PS2, Line 165:   boost::unordered_set<StringValue> new_values_;
             :   size_t new_values_total_len_ = 0;
These two data members can be combined into a new struct

struct SetOfStringsWithTotalLenT {
 boost::unordered_set<StringValue> new_values_;
 size_t total_len_;
}

And then define two such objects

SetOfStringsWithTotalLenT values_;
SetOfStringsWithTotalLenT newly_inserted_values_;


http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.cc
File be/src/util/in-list-filter.cc:

http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.cc@41
PS2, Line 41: InListFilterImpl
cool!


http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.cc@65
PS2, Line 65: default:
nit. At some point of time in future, we may need to support float and double, 
like in min/max filter (see util/min-max-filter.h).


http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.cc@143
PS2, Line 143:   }
nit. May add a comment here: total_entries_ is updated during insert().


http://gerrit.cloudera.org:8080/#/c/18433/2/be/src/util/in-list-filter.cc@144
PS2, Line 144: new_values_.clear();
             :   new_values_total_len_ = 0;
this could be part of a method (say reset()) on the struct 
SetOfStringsWIthLengthT



--
To view, visit http://gerrit.cloudera.org:8080/18433
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id434a542b2ced64efa3bfc974cb565b94a4193e9
Gerrit-Change-Number: 18433
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Fri, 22 Apr 2022 15:15:17 +0000
Gerrit-HasComments: Yes

Reply via email to