Dan Burkert has posted comments on this change.

Change subject: KUDU-1363: Add in-list predicates for extracting a set of 

Patch Set 7:


File src/kudu/client/predicate-test.cc:

Line 124:     int count = 0;
I guess if you don't remove dupes below you would have to remove dupes 
internally here.  My main concern is that we are testing what happens when:

a) the user provides duplicates in a single in list
b) the user provides multiple in lists for a single column

Line 301:       vector<T> deduped_test_values(test_values);
Removing the duplicates shouldn't be necessary since the implementation does 
this internally anyway, right?

File src/kudu/client/scan_predicate-internal.h:

Line 104:     std::vector<KuduValue*> val_cloned(vals_.begin(), vals_.end());
is this explicit copy necessary?

Line 112:   std::vector<KuduValue*> vals_;
is it possible to make this a vector<unique_ptr<KuduValue>>?  That would make 
it a lot easier to track ownership; none of the stl_util stuff is necessary at 
that point.

File src/kudu/client/scan_predicate.cc:

Line 107:   if (vals_.size() > 1) {
Should we allow in predicates without at least 2 values?

Line 110:          [](KuduValue* a, KuduValue* b) { return *a < *b; });
I think both of these would be simpler with unqiue_ptr too, since you wouldn't 
have to pass in the closure to explicitly dereference.

Line 111:     vals_.erase(unique(vals_.begin(), vals_.end(),
Is this leaking the erased values?

Line 126:     RETURN_NOT_OK(value->data_->CheckTypeAndGetPointer(col_.name(),
I'm a little bit confused about the ownership of the value's data here.  It 
looks like somehow CheckTypeAndGetPointer is transferring ownership to the 
void* ?  It looks like the ComparisonPredicate is doing the same thing above, 
but I can't tell how it works.  Could definitely do with a comment about why 
this doesn't leak the data.

File src/kudu/common/column_predicate.cc:

Line 230:       // At least one value of the InList has to be in the range
I don't think this comment is true; Simplify will turn the predicate into a 
None if none of the values are in the range.

Line 232:       for (auto idx = values_.begin(); idx != values_.end();) {
I think this would be simpler with std::remove_if

Line 291:       for (auto idx = values_.begin(); idx != values_.end();) {
std::remove_if will simplify

Line 433:       stringstream ss;
this can be done with a normal string and string::append, I think we typically 
try to avoid the C++ stream library.

File src/kudu/common/column_predicate.h:

Line 167:   const std::vector<void*>* raw_values() const {
could this return a const reference instead of const *?

File src/kudu/common/key_util.cc:

Line 143:   // Tracks whether the last pushed predicate is an equality 
or InList.

Line 147:   // the first range predicate.
range or InList

Line 172:       if (predicate->raw_values() != nullptr) {
I think if you change this to return a reference this won't be necessary any 

Line 173:         memcpy(row->mutable_cell_ptr(*col_idx_it), 
*(predicate->raw_values()->rbegin()), size);
I think the second arg would be clearer as `predicate->raw_values()->back()`

Line 232:         memcpy(row->mutable_cell_ptr(*col_idx_it), 
*(predicate->raw_values()->begin()), size);

File src/kudu/common/scan_spec.cc:

Line 160:         // InList predicates should not be removed
Could you expand on this?  When I first saw this I wondered why before really 
digging into the key_util code.  Maybe something like:

InList predicates can not be removed, since the full constraints imposed by the 
list values can not be translated into only a single set of lower and upper 
bound primary keys.

File src/kudu/common/types.h:

Line 65:   bool CheckValueInList(const vector<void*>* values_lkp, const void* 
value) const;
Could these be defined in predicate.h?  We typically reserve this generic type 
class for operations which need to be implemented differently across the types, 
but I don't think that's the case here.

Line 120: static bool GenericCheckValueInList(const vector<void*>* values_lkp,
could this use std::binary_search?  I'm not 100% sure it would be faster for 
small lists, but we do know that the lists are sorted, right?

To view, visit http://gerrit.cloudera.org:8080/2986
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I986cb13097f1cf4af2b752f58c4c38f412a6a598
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sameer Abhyankar <abhyan...@gmail.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <d...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sameer Abhyankar <abhyan...@gmail.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to