Dan Burkert has posted comments on this change. Change subject: KUDU-1363: Add in-list predicates for extracting a set of equalities. ......................................................................
Patch Set 7: (21 comments) http://gerrit.cloudera.org:8080/#/c/2986/7/src/kudu/client/predicate-test.cc File src/kudu/client/predicate-test.cc: Line 124: int count = 0; I guess if you don't remove dupes below you would have to remove dupes internally here. My main concern is that we are testing what happens when: a) the user provides duplicates in a single in list b) the user provides multiple in lists for a single column Line 301: vector<T> deduped_test_values(test_values); Removing the duplicates shouldn't be necessary since the implementation does this internally anyway, right? http://gerrit.cloudera.org:8080/#/c/2986/7/src/kudu/client/scan_predicate-internal.h File src/kudu/client/scan_predicate-internal.h: Line 104: std::vector<KuduValue*> val_cloned(vals_.begin(), vals_.end()); is this explicit copy necessary? Line 112: std::vector<KuduValue*> vals_; is it possible to make this a vector<unique_ptr<KuduValue>>? That would make it a lot easier to track ownership; none of the stl_util stuff is necessary at that point. http://gerrit.cloudera.org:8080/#/c/2986/7/src/kudu/client/scan_predicate.cc File src/kudu/client/scan_predicate.cc: Line 107: if (vals_.size() > 1) { Should we allow in predicates without at least 2 values? Line 110: [](KuduValue* a, KuduValue* b) { return *a < *b; }); I think both of these would be simpler with unqiue_ptr too, since you wouldn't have to pass in the closure to explicitly dereference. Line 111: vals_.erase(unique(vals_.begin(), vals_.end(), Is this leaking the erased values? Line 126: RETURN_NOT_OK(value->data_->CheckTypeAndGetPointer(col_.name(), I'm a little bit confused about the ownership of the value's data here. It looks like somehow CheckTypeAndGetPointer is transferring ownership to the void* ? It looks like the ComparisonPredicate is doing the same thing above, but I can't tell how it works. Could definitely do with a comment about why this doesn't leak the data. http://gerrit.cloudera.org:8080/#/c/2986/7/src/kudu/common/column_predicate.cc File src/kudu/common/column_predicate.cc: Line 230: // At least one value of the InList has to be in the range I don't think this comment is true; Simplify will turn the predicate into a None if none of the values are in the range. Line 232: for (auto idx = values_.begin(); idx != values_.end();) { I think this would be simpler with std::remove_if Line 291: for (auto idx = values_.begin(); idx != values_.end();) { std::remove_if will simplify Line 433: stringstream ss; this can be done with a normal string and string::append, I think we typically try to avoid the C++ stream library. http://gerrit.cloudera.org:8080/#/c/2986/7/src/kudu/common/column_predicate.h File src/kudu/common/column_predicate.h: Line 167: const std::vector<void*>* raw_values() const { could this return a const reference instead of const *? http://gerrit.cloudera.org:8080/#/c/2986/7/src/kudu/common/key_util.cc File src/kudu/common/key_util.cc: Line 143: // Tracks whether the last pushed predicate is an equality predicate. or InList. Line 147: // the first range predicate. range or InList Line 172: if (predicate->raw_values() != nullptr) { I think if you change this to return a reference this won't be necessary any more. Line 173: memcpy(row->mutable_cell_ptr(*col_idx_it), *(predicate->raw_values()->rbegin()), size); I think the second arg would be clearer as `predicate->raw_values()->back()` Line 232: memcpy(row->mutable_cell_ptr(*col_idx_it), *(predicate->raw_values()->begin()), size); predicate->raw_values()->front() http://gerrit.cloudera.org:8080/#/c/2986/7/src/kudu/common/scan_spec.cc File src/kudu/common/scan_spec.cc: Line 160: // InList predicates should not be removed Could you expand on this? When I first saw this I wondered why before really digging into the key_util code. Maybe something like: InList predicates can not be removed, since the full constraints imposed by the list values can not be translated into only a single set of lower and upper bound primary keys. http://gerrit.cloudera.org:8080/#/c/2986/7/src/kudu/common/types.h File src/kudu/common/types.h: Line 65: bool CheckValueInList(const vector<void*>* values_lkp, const void* value) const; Could these be defined in predicate.h? We typically reserve this generic type class for operations which need to be implemented differently across the types, but I don't think that's the case here. Line 120: static bool GenericCheckValueInList(const vector<void*>* values_lkp, could this use std::binary_search? I'm not 100% sure it would be faster for small lists, but we do know that the lists are sorted, right? -- To view, visit http://gerrit.cloudera.org:8080/2986 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I986cb13097f1cf4af2b752f58c4c38f412a6a598 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sameer Abhyankar <abhyan...@gmail.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Dan Burkert <d...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sameer Abhyankar <abhyan...@gmail.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes