Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15122 )
Change subject: [client] KUDU-2483 Add IN Bloom filter predicate to C++ client ...................................................................... Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/15122/3/src/kudu/client/client.h File src/kudu/client/client.h: http://gerrit.cloudera.org:8080/#/c/15122/3/src/kudu/client/client.h@1114 PS3, Line 1114: KuduValue* lower_bound_inclusive = NULL, //NOLINT(modernize-use-nullptr) : KuduValue* upper_bound_exclusive = NULL); //NOLINT(modernize-use-nullptr) Nit: // NOLINT (space between slashes and NOLINT). http://gerrit.cloudera.org:8080/#/c/15122/1/src/kudu/client/client.h File src/kudu/client/client.h: http://gerrit.cloudera.org:8080/#/c/15122/1/src/kudu/client/client.h@1104 PS1, Line 1104: /// must correspond to the value of the column to which the predicate is : /// to be applied. : /// @return Raw pointer to an IN Bloom filter predicate. The caller owns the : /// predicate until it is passed into KuduScanner::AddConjunctPredicate(). : /// In the case of > This is from the original implementation of the wire protocol for bloom fil Could you investigate it further as a follow-on item of work? We have an opportunity to change the bloom filter predicate wire protocol in a destructive fashion now, before it ships in a release. I'm also asking because this isn't as expressive an API as the existing comparison predicate, which allows inclusive/exclusive on either value. http://gerrit.cloudera.org:8080/#/c/15122/3/src/kudu/client/client.cc File src/kudu/client/client.cc: http://gerrit.cloudera.org:8080/#/c/15122/3/src/kudu/client/client.cc@982 PS3, Line 982: // Empty vector of bloom filters will select all rows. Hence disallowed. But that's only if lower_bound_inclusive/upper_bound_exclusive are null, right? Don't we want to honor those, even if the set of bfs is empty? http://gerrit.cloudera.org:8080/#/c/15122/3/src/kudu/client/predicate-test.cc File src/kudu/client/predicate-test.cc: http://gerrit.cloudera.org:8080/#/c/15122/3/src/kudu/client/predicate-test.cc@1260 PS3, Line 1260: // Create random unique values. Some of these functions (or variations of them) seem like they could live in random_util.{h,cc}. What do you think? http://gerrit.cloudera.org:8080/#/c/15122/3/src/kudu/client/predicate-test.cc@1273 PS3, Line 1273: std::vector Using for this, and for unordered_set. http://gerrit.cloudera.org:8080/#/c/15122/3/src/kudu/client/scan_predicate.h File src/kudu/client/scan_predicate.h: http://gerrit.cloudera.org:8080/#/c/15122/3/src/kudu/client/scan_predicate.h@111 PS3, Line 111: static Status NewBloomFilter(int num_keys, This might be difficult to extend/change in the future. What do you think of adding a PIMPL'ed builder class? Then we could add new options as new methods in the builder, which is backwards compatible. http://gerrit.cloudera.org:8080/#/c/15122/3/src/kudu/client/scan_predicate.h@125 PS3, Line 125: /// Clone the Bloom filter. : /// : /// @return Raw pointer to the cloned Bloom filter. Caller owns the Bloom filter till : /// it's passed to @c KuduTable::NewInBloomFilterPredicate() : KuduBloomFilter* Clone() const; Why should this be part of the public API? When would a third party application use this? -- To view, visit http://gerrit.cloudera.org:8080/15122 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4aa235a4c933ebce0bf3ec7fcb135098eccc4ea4 Gerrit-Change-Number: 15122 Gerrit-PatchSet: 3 Gerrit-Owner: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 31 Jan 2020 23:44:33 +0000 Gerrit-HasComments: Yes
