Adar Dembo has posted comments on this change.

Change subject: KUDU-1363: Add in-list predicates for extracting a set of 

Patch Set 5:


OK, finished reviewing everything except some of the hairier predicate logic; 
Dan will take a look at that.

One high level comment: why are we using std::vector as the backing store for 
IN? Is the order of the values actually important? If not, could we use 
std::unordered_set instead? With a hash-based set we'll have O(1) lookups and 
O(n) intersections (instead of O(n) and O(n^2) which I believe we have with 
File src/kudu/client/client.h:

PS5, Line 468: KuduValue(s)
Nit: since this is a list, I think it's safe to assume plural here and just say 
"the KuduValues".

PS5, Line 468: with its
             :   // value
Nit: singular/plural confusion here? "its value" in a discussion about multiple 
File src/kudu/client/

Line 303:       vector<KuduValue*>* vals = new vector<KuduValue*>();
With the new swap-based approach to taking value ownership, there's no need to 
allocate 'vals' on the heap. You could stack allocate it, pass it by pointer 
into NewInListPredicate(), and avoid the delete call below.

Same goes for the other changes to
File src/kudu/client/scan_predicate-internal.h:

Line 95: class InListPredicateData: public KuduPredicate::Data {
Nit: InListPredicateData : public KuduPredicate::Data {

(separate InListPredicateData from the colon with a space)

PS5, Line 105: std::vector<KuduValue*>* val_cloned = new 
Why do you need to do this on the heap? Can't you just do:

  std::vector<KuduValue*> val_cloned(vals_.begin(), vals_.end();
  return new InListPredicateData(col_, val_cloned);
File src/kudu/client/

Line 127:   vector<void*>* vals_list = new vector<void*>();
Nit: we know what the size of the vector should be up-front and can pass that 
into the constructor.

Line 129:   for (auto idx : vals_) {
Nit: it's not really an index, it's an actual value, isn't it?
File src/kudu/common/

PS5, Line 70: ColumnPredicate ColumnPredicate::InList(ColumnSchema column,
            :                                               vector<void*>* 
values) {
            :   CHECK(values != nullptr);
            :   ColumnPredicate pred(PredicateType::InList, move(column), 
            :   pred.Simplify();
            :   return pred;
            : }
Got some more tabs here. Could you double check the whole patch?

Line 160: void ColumnPredicate::Merge(const ColumnPredicate& other) {
Nit: could you restore the empty line that separated these two methods?
File src/kudu/common/column_predicate.h:

Line 226:   std::vector<void*>* values_;
> Currently the vector is owned by PredicateData.. sort of similar to how the
I see what you're saying, but ownership of the vector isn't really what's 
interesting here; it's ownership of the values themselves. In that sense, you 
are already consistent: lower_ and upper_ are owned by PredicateData, just as 
the elements in values_ are. However, the ownership of values_ itself isn't 
interesting, and if you embedded it directly into ColumnPredicate rather than 
store a pointer, you could simplify PredicateData quite a bit as you wouldn't 
need to use an AutoReleasePool to own this vector, and you could revert all of 
the AutoReleasePool plumbing.
File src/kudu/common/column_predicate.h:

Line 33: 
Nit: could you revert this empty line?

Line 195:   // Merge another predicate into this Range predicate.
Nit: could you restore the empty line that separated these two methods?
File src/kudu/common/

Line 173:         memcpy(row->mutable_cell_ptr(*col_idx_it), 
Nit: , size

(separate comma from size with a space)
File src/kudu/common/types.h:

Line 33: #include "kudu/util/memory/arena.h"
Nit: should come before slice.h.

Line 123:   for (auto idx = values_lkp->begin(); idx != values_lkp->end(); 
++idx) {
Can this be rewritten as:

  for (auto val_lkp : values_lkp) {
    if (GetTypeInfo(Type)->Compare(val_lkp, value)) == 0)

Line 124:     if (GetTypeInfo(Type)->Compare((*idx), value) == 0)
Can we move GetTypeInfo(Type) out of the loop?

Line 136:     if (!GetTypeInfo(Type)->CheckValueInList(from, *idx)) {
Let's move GetTypeInfo() out of the loop.
File src/kudu/common/

Line 31: #include "kudu/common/column_predicate.h"
Nit: sort alphabetically.
File src/kudu/common/

Line 333: void CopyPredicateBoundToPB(const ColumnSchema& col,
As below, can you pick a different name to avoid overloads?

Line 388: static Status CopyPredicateBoundFromPB(const ColumnSchema& schema,
This is in an anonymous namespace; it doesn't need to be declared static.

Also, the google style guide discourages overloading, so can you pick a 
different name for this function?

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I986cb13097f1cf4af2b752f58c4c38f412a6a598
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Sameer Abhyankar <>
Gerrit-Reviewer: Adar Dembo <>
Gerrit-Reviewer: Dan Burkert <>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sameer Abhyankar <>
Gerrit-Reviewer: Todd Lipcon <>
Gerrit-HasComments: Yes

Reply via email to